On the intuitiveness of common discretization methods

Mario Boley, Ankit Kariryaa

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

Abstract

Data discretization methods are usually evaluated in terms of technical criteria that are related to some specific data analysis goal like the preservation of variable interactions. In
this paper, we provide a different evaluation principle that assesses the quality of a chosen discretization as the degree to which it coincides with human intuition. This is motivated from the setting of interactive exploratory data analysis where discretizations should be simple, self-explanatory, and fix across results in order to reduce the cognitive load on the user. We present a study design for measuring the intuitive discretization choices of a general human population for a set of discretization problems and present the results of a study trial that we performed with 153 respondents and four problem classes—each using the categories “low”, “normal”, and “high”. Through this trial, we evaluated eight discretization methods from three families: range-based discretization, count-based discretization, and clustering-based discretization. Our results partially confirm results from Cognitive Linguistics that assume prototype-based categorization, which is most closely resembled by clustering-based methods, as a predominant human discretization mechanism. They also show, however, an affinity of participants to sometimes compromise cluster quality in favor of approximating certain category proportions.
Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics, IDEA 2016
EditorsPolo Chau, Jilles Vreeken, Matthijs van Leeuwen, Dafna Shahaf, Christos Faloutsos
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages22-29
Number of pages8
Publication statusPublished - 2016
Externally publishedYes
EventKDD 2016 Workshop on Interactive Data Exploration and Analytics - San Francisco, United States of America
Duration: 14 Aug 201614 Aug 2016
http://poloclub.gatech.edu/idea2016/

Conference

ConferenceKDD 2016 Workshop on Interactive Data Exploration and Analytics
Abbreviated titleKDD 2016
Country/TerritoryUnited States of America
CitySan Francisco
Period14/08/1614/08/16
Internet address

Cite this