Abstract
We consider the problem of point estimation of an unknown continuous parameter within a Bayesian setting, Given dataspace X, data x ∈ X, likelihood function p(xθO) and prior h(θ), we estimate 0 using a loss function, C(O, θ). We are interested in the case where the cost function is unknown at the time of point estimation. Invariance and other properties of the KullbackLeibler distance lead us to consider it as a ‘reference" loss function. This involves finding the θ which minimises the expected KultbackLeibler distance:(Formula Presented) The second term is not dependent on θ and can be omitted from the expression to be minimized. Swapping the order of integration(Formula Presented) The term π(θx) is the posterior probability of the parameter θ given the data x: and the term in square brackets is prob(yz), the probability of future data y given our current data. x. Given our current data, x, and parameter estimate, θ, equation (1) gives the expected loglikelihood of future data, y. We consider a minimum Expected KullbackLeibter (EKL) estimate, the value of θ which minimises the expression in Equation (i). Analytical minimisation of this expression (1) is relatively difficult for all but the simplest of cases. However, we can numerically generate simulated future data, y, by generating θ from the posterior (in θ given x) and then generating y from p(yθ). The maximum likelihood estimate of this wouldbe future sample will converge in the limit to our desired minimum EKL estimate. A related invaxiant technique is the informationtheoretic Minimum.Message Length (MML) method [WF87], which entails the minimisatioa of a twopart message transmitting a hypothesis, θ, and the data, x, in light of θ. The MML estimator is a quadratic approximation to the Strict MML (SMML) estimator [WB75, WF87], which maps partitions of the discrete dataspace to estimator values. Like the KullbackLeibler distance, the SMML and MML estimators are invariant under parameter transformations. The MML estimator has been shown to return a relatively small KullbackLeibler distance from the true distribution for various models, including circular distributions[WD93] and factor analysis. The MML estimator minimises a twopart message length of transmitting θ and the current data, x, in light of θ. The minimum EKL estimator minimises the expected length of the transmission of future data. We now present an argument as to why these two methods are similar. First, an approximation to SMML called Fairly Strict MML (FSMML) maps regions from the parameter space to point estimates. These regions, or coding blocks, can then be used to generate synthetic data by convolving h(θ) with p(xθ) over the coding block. Taking the Maximum Likelihood estimate for this synthetic data gives the FSMML estimate. θ for the coding block. Minimum EKL takes a prior, h(θ), and data, x, to produce a posterior, π (θx). Whether we analytically convolve the posterior with p(xθ) or whether we numerically sample a θ from the posterior and then a datum, y, from p(xθ), we obtain a population of expected future data. The minimum EKL estimator is the maximum likelihood estimator on the expected future data. These two methods are very similar, and only differ insofar as the minimum EKL uses the posterior whereas FSMML uses the prior over the coding block. Given that we do not know exactly where our coding block regions might be in FSMML, we can average the location of the coding. This leads to something very akin to the posterior. It has been shown [Wal96] that the SMML estimator behaves locally very similarly to the posterior, the similarity increasing with the dimension of the parameter space. We note similarities between MML (or Strict MML) estimators and rain EKL estimators for a variety of problems where both estimators are defined. We also note the case of the uniform distribution, for which the Strict MML estimator is defined but the min EKL estimator is not defined. Finally, in contrast with the successes of the Bayesian SMML and min EKL estimation techniques described above, we conjecture (Dowe, 1997) with slight evidence that no classical estimation technique can always be invariant and statistically consistent while providing internally consistent parameter estimates.
Original language  English 

Title of host publication  Research and Development in Knowledge Discovery and Data Mining  2nd PacificAsia Conference, PAKDD 1998, Proceedings 
Editors  Xindong Wu, Ramamohanarao Kotagiri, Kevin B. Korb 
Publisher  Springer 
Pages  8795 
Number of pages  9 
ISBN (Print)  3540643834, 9783540643838 
DOIs  
Publication status  Published  1998 
Event  PacificAsia Conference on Knowledge Discovery and Data Mining 1998  Melbourne, Australia Duration: 15 Apr 1998 → 17 Apr 1998 Conference number: 2nd https://link.springer.com/book/10.1007/3540643834 (Proceedings) 
Publication series
Name  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 

Volume  1394 
ISSN (Print)  03029743 
ISSN (Electronic)  16113349 
Conference
Conference  PacificAsia Conference on Knowledge Discovery and Data Mining 1998 

Abbreviated title  PAKDD 1988 
Country/Territory  Australia 
City  Melbourne 
Period  15/04/98 → 17/04/98 
Internet address 

Keywords
 Algorithmic complexity
 Bayesian and statistical learning methods
 Induction in KDD
 Machine learning
 Minimum message length
 Noise handling