Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection

Yuan Sun, Wei Wang, Michael Kirley, Xiaodong Li, Jeffrey Chan

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch


Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
Subtitle of host publicationAAAI Technical Track on Machine Learning
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Number of pages8
ISBN (Print)9781577358350
Publication statusPublished - 3 Apr 2020
Externally publishedYes
EventAAAI Conference on Artificial Intelligence 2020 - New York, United States of America
Duration: 7 Feb 202012 Feb 2020
Conference number: 34th (Website)

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAAAI Press
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468


ConferenceAAAI Conference on Artificial Intelligence 2020
Abbreviated titleAAAI-20
CountryUnited States of America
CityNew York
OtherThe Thirty-Fourth AAAI Conference on Artificial Intelligence was held on February 7–12, 2020 in New York, New York, USA. The surge in public interest in AI technologies, which we have witnessed over the past few years, continued to accelerate in 2019–2020, with the societal and economic impact of AI becoming a central point of public and government discussion worldwide. AAAI-20 saw submissions and attendance numbers that were records in the history of the AAAI series of conferences and continued its tradition of attracting top-quality papers from all areas of AI. We were excited to see increases in submissions across almost all areas.

The AAAI-20 program consisted of a core technical program of original research presentations, including a special track on AI for social impact and a sister conference track. It additionally featured a broad range of tutorials, workshops, invited talks, panels, student abstracts, a debate, and presentations by senior members. The program was rounded out by technical demonstrations, exhibits, an AI job fair, the AI in Practice program, a student outreach program, and a game night. The conference also continued its tradition of colocating with the long-running IAAI conference and the EAAI symposium, as well as the newer conference on AI, Ethics, and Society.
Internet address

Cite this