Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure

Kai Ming Ting, Ye Zhu, Mark Carman, Yue Zhu, Zhi-Hua Zhou

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    56 Citations (Scopus)

    Abstract

    This paper introduces the first generic version of data dependent dissimilarity and shows that it provides a better closest match than distance measures for three existing algorithms in clustering, anomaly detection and multi-label classification. For each algorithm, we show that by simply replacing the distance measure with the data dependent dissimilarity measure, it overcomes a key weakness of the otherwise unchanged algorithm.

    Original languageEnglish
    Title of host publicationProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016)
    Subtitle of host publicationAugust 13-17, 2016, San Francisco, CA, USA
    EditorsAlex Smola, Charu Aggarwal, Dou Shen, Rajeev Rastogi
    Place of PublicationNew York, New York
    PublisherAssociation for Computing Machinery (ACM)
    Pages1205-1214
    Number of pages10
    ISBN (Electronic)9781450342322
    DOIs
    Publication statusPublished - 13 Aug 2016
    EventACM International Conference on Knowledge Discovery and Data Mining 2016 - Hilton San Francisco Union Square, San Francisco, United States of America
    Duration: 13 Aug 201617 Aug 2016
    Conference number: 22nd
    http://www.kdd.org/kdd2016/
    https://dl.acm.org/doi/proceedings/10.1145/2939672

    Conference

    ConferenceACM International Conference on Knowledge Discovery and Data Mining 2016
    Abbreviated titleKDD 2016
    Country/TerritoryUnited States of America
    CitySan Francisco
    Period13/08/1617/08/16
    OtherKDD 2016, a premier interdisciplinary conference, brings together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data.
    Internet address

    Keywords

    • Data dependent dissimilarity
    • Distance-based neighbourhood
    • Distance measure
    • K nearest neighbours
    • Probability-mass-based neighbourhood

    Cite this