Point estimation using the kullback-leibler loss function and MML

David L. Dowe, Rohan A. Baxter, Jonathan J. Oliver, Chris S. Wallace

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    24 Citations (Scopus)


    We consider the problem of point estimation of an unknown continuous parameter within a Bayesian setting, Given data-space X, data x ∈ X, likelihood function p(x|θO) and prior h(θ), we estimate 0 using a loss function, C(O, θ). We are interested in the case where the cost function is unknown at the time of point estimation. Invariance and other properties of the Kullback-Leibler distance lead us to consider it as a ‘reference" loss function. This involves finding the θ which minimises the expected Kultback-Leibler distance:(Formula Presented) The second term is not dependent on θ and can be omitted from the expression to be minimized. Swapping the order of integration(Formula Presented) The term π(θ|x) is the posterior probability of the parameter θ given the data x: and the term in square brackets is prob(y|z), the probability of future data y given our current data. x. Given our current data, x, and parameter estimate, θ, equation (1) gives the expected log-likelihood of future data, y. We consider a minimum Expected Kullback-Leibter (EKL) estimate, the value of θ which minimises the expression in Equation (i). Analytical minimisation of this expression (1) is relatively difficult for all but the simplest of cases. However, we can numerically generate simulated future data, y, by generating θ from the posterior (in θ given x) and then generating y from p(y|θ). The maximum likelihood estimate of this would-be future sample will converge in the limit to our desired minimum EKL estimate. A related invaxiant technique is the information-theoretic Minimum.Message Length (MML) method [WF87], which entails the minimisatioa of a two-part message transmitting a hypothesis, θ, and the data, x, in light of θ. The MML estimator is a quadratic approximation to the Strict MML (SMML) estimator [WB75, WF87], which maps partitions of the discrete data-space to estimator values. Like the Kullback-Leibler distance, the SMML and MML estimators are invariant under parameter transformations. The MML estimator has been shown to return a relatively small Kullback-Leibler distance from the true distribution for various models, including circular distributions[WD93] and factor analysis. The MML estimator minimises a two-part message length of transmitting θ and the current data, x, in light of θ. The minimum EKL estimator minimises the expected length of the transmission of future data. We now present an argument as to why these two methods are similar. First, an approximation to SMML called Fairly Strict MML (FSMML) maps regions from the parameter space to point estimates. These regions, or coding blocks, can then be used to generate synthetic data by convolving h(θ) with p(x|θ) over the coding block. Taking the Maximum Likelihood estimate for this synthetic data gives the FSMML estimate. θ for the coding block. Minimum EKL takes a prior, h(θ), and data, x, to produce a posterior, π (θ|x). Whether we analytically convolve the posterior with p(x|θ) or whether we numerically sample a θ from the posterior and then a datum, y, from p(x|θ), we obtain a population of expected future data. The minimum EKL estimator is the maximum likelihood estimator on the expected future data. These two methods are very similar, and only differ insofar as the minimum EKL uses the posterior whereas FSMML uses the prior over the coding block. Given that we do not know exactly where our coding block regions might be in FSMML, we can average the location of the coding. This leads to something very akin to the posterior. It has been shown [Wal96] that the SMML estimator behaves locally very similarly to the posterior, the similarity increasing with the dimension of the parameter space. We note similarities between MML (or Strict MML) estimators and rain EKL estimators for a variety of problems where both estimators are defined. We also note the case of the uniform distribution, for which the Strict MML estimator is defined but the min EKL estimator is not defined. Finally, in contrast with the successes of the Bayesian SMML and min EKL estimation techniques described above, we conjecture (Dowe, 1997) with slight evidence that no classical estimation technique can always be invariant and statistically consistent while providing internally consistent parameter estimates.

    Original languageEnglish
    Title of host publicationResearch and Development in Knowledge Discovery and Data Mining - 2nd Pacific-Asia Conference, PAKDD 1998, Proceedings
    EditorsXindong Wu, Ramamohanarao Kotagiri, Kevin B. Korb
    Number of pages9
    ISBN (Print)3540643834, 9783540643838
    Publication statusPublished - 1998
    EventPacific-Asia Conference on Knowledge Discovery and Data Mining 1998 - Melbourne, Australia
    Duration: 15 Apr 199817 Apr 1998
    Conference number: 2nd
    https://link.springer.com/book/10.1007/3-540-64383-4 (Proceedings)

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349


    ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 1998
    Abbreviated titlePAKDD 1988
    Internet address


    • Algorithmic complexity
    • Bayesian and statistical learning methods
    • Induction in KDD
    • Machine learning
    • Minimum message length
    • Noise handling

    Cite this