BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING: 25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering
803(2005); http://dx.doi.org/10.1063/1.2149776View Description Hide Description
This tutorial gives a basic overview of Bayesian methodology, from its axiomatic foundation through the conventional development of data analysis and model selection to its rôle in quantum mechanics, and ending with some comments on inference in general human affairs. The central theme is that probability calculus is the unique language within which we can develop models of our surroundings that have predictive capability. These models are patterns of belief; there is no need to claim external reality.
1. Logic and probability
2. Probability and inference
3. Probability and model selection
4. Prior probabilities
5. Probability and frequency
6. Probability and quantum mechanics
7. Probability and fundamentalism
8. Probability and deception
9. Prediction and truth
803(2005); http://dx.doi.org/10.1063/1.2149777View Description Hide Description
We discuss the formulation of discrete maximum entropy problems given upper and lower bounds on moments and probabilities. We show that with bounds on discrete probabilities, and bounds on cumulative probabilities, the solution is invariant to any additive concave objective function. This observation simplifies the analysis of the problem and unifies the solution of several generalized entropy expressions. We use this invariance result to provide an exact graphical solution to the maximum entropy distribution between upper and lower cumulative probability bounds. We also discuss the maximum entropy joint distribution with bounds on marginal probabilities and provide a graphical solution to the problem using properties of the entropy expression.
803(2005); http://dx.doi.org/10.1063/1.2149778View Description Hide Description
In order to both reduce the amount of storage space required to store the coded images and reduce the on‐line computational time of the probabilistic search algorithm of Knowledge‐driven Information Mining system, this article presents ideas of a lossless data compression of the coded image and a new Kullback‐Leibler similarity search.
803(2005); http://dx.doi.org/10.1063/1.2149779View Description Hide Description
Ed Jaynes’s view of probability, brilliantly clear, iconoclastic to many and still not sufficiently appreciated can transform thinking and lives. My professional life shows his seminal influence. I discuss his admiration of Laplace, and both the difficulty and joy of finding original references. Jayne’s defense of and original work on Laplace’s law of succession demonstrates how he has followed in the footsteps of his idol. Finally, Jaynes’s theoretical and experimental investigations into Bertrand’s Paradox illustrate the fundamental nature of his thought. Our loss of his presence is only compensated by the appreciation of his contribution.
803(2005); http://dx.doi.org/10.1063/1.2149780View Description Hide Description
We introduce and discuss the use of the exponential spline family for Bayesian nonparametric function estimation. Exponential splines span the range of shapes between the limiting cases of traditional cubic spline and piecewise linear interpolation. They are therefore particularly suited for problems where both, smooth and rapid function changes occur.
803(2005); http://dx.doi.org/10.1063/1.2149781View Description Hide Description
We present a Bayesian approach to testing for an underlying uniform distribution, given a sample of observations, based on a comparison with the alternatives defined by the maximum entropy principle when faced with limited moments information. The procedure generalises readily to the case of a periodic function.
803(2005); http://dx.doi.org/10.1063/1.2149782View Description Hide Description
The geometric theory of ignorance suggests new criteria for model selection. One example is to choose model M minimizing, where (x 1,…,xN ) is a sample of N iid observations, p̂ ∈ M is the mle, d = dim(M) is the dimension of the model M, V = Vol(M) is its information volume and R = Ricci(M) is the Ricci scalar evaluated at the mle. I study the performance of CIC for the problem of segmentation of bit streams defined as follows: Find n from N iid samples of a complete dag of n bits. The CIC criterion outperforms AIC and BIC by orders of magnitude when n > 3 and it is just better for the cases n = 2, 3.
803(2005); http://dx.doi.org/10.1063/1.2149783View Description Hide Description
This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure ‘the fitness’ for each decision tree given a set of observed data. Our method, in fact, is analogous to a well‐known Bayesian model selection method for interpolating noisy continuous‐value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham’s razor, and hence reasonably deals with the issue of underfitting‐overfitting tradeoff.
803(2005); http://dx.doi.org/10.1063/1.2149784View Description Hide Description
A simple algorithm finds the partition of a data interval optimizing the fitness of a model that represents the underlying signal as constant over the elements of the partition. Using dynamic programming the exponentially large space of partitions of N data points is implicitly but exhaustively searched in time O(N 2). This paper also describes an extension to optimal partitions of higher dimensional data spaces, with application to multivariate signal processing, image processing, cluster analysis, density estimation in 3‐dimensional redshift surveys, etc. The algorithm finds the exact global optimum, automatically determines the model order (the number of segments), and has a convenient real‐time mode.
803(2005); http://dx.doi.org/10.1063/1.2149785View Description Hide Description
The Bayesian setting for inverse problems provides a rigorous foundation for inference from noisy data and uncertain forward models, a natural mechanism for incorporating prior information, and a quantitative assessment of uncertainty in the inferred results. Obtaining useful information from the posterior density—e.g., computing expectations via Markov Chain Monte Carlo (MCMC)—may be a computationally expensive undertaking, however. For complex and high‐dimensional forward models, such as those that arise in inverting systems of PDEs, the cost of likelihood evaluations may render MCMC simulation prohibitive.
We explore the use of polynomial chaos (PC) expansions for spectral representation of stochastic model parameters in the Bayesian context. The PC construction employs orthogonal polynomials in i.i.d. random variables as a basis for the space of square‐integrable random variables. We use a Galerkin projection of the forward operator onto this basis to obtain a PC expansion for the outputs of the forward problem. Evaluation of integrals over the parameter space is recast as Monte Carlo sampling of the random variables underlying the PC expansion.
We evaluate the utility of this technique on a transient diffusion problem arising in contaminant source inversion. The accuracy of posterior estimates is examined with respect to the order of the PC representation and the decomposition of the support of the prior. We contrast the computational cost of the new scheme with that of direct sampling.
803(2005); http://dx.doi.org/10.1063/1.2149786View Description Hide Description
A great many systems can be modeled in the nonlinear dynamical systems framework, as ẋ = f(x) + ξ(t), where f() is the potential function for the system, and ξ is the excitation noise. Modeling the potential using a set of basis functions, we derive the posterior for the basis coefficients. A more challenging problem is to determine the set of basis functions that are required to model a particular system. We use the Bayesian Information Criteria (BIC) to rank models, together with the beam search to search the space of models. We show that we can accurately determine the structure of simple nonlinear dynamical system models, and the structure of the coupling between nonlinear dynamical systems where the individual systems are known. This last case has important ecological applications.
803(2005); http://dx.doi.org/10.1063/1.2149787View Description Hide Description
The Fully Bayesian Significance Test (FBST) is a coherent Bayesian significance test for sharp hypotheses. This paper proposes the FBST as a model selection tool for general mixture models, and compares its performance with Mclust, a model‐based clustering software. The FBST robust performance strongly encourages further developments and investigations.
803(2005); http://dx.doi.org/10.1063/1.2149788View Description Hide Description
Given sequential data from a target system whose description is not available, this study attempts to perform online change detection by (i) using parameter/hyperparameter dynamics driven by the available data; (ii) examining the time dependency of the marginal likelihood; and (iii) implementing the scheme via Particle Filter.
803(2005); http://dx.doi.org/10.1063/1.2149789View Description Hide Description
Precision radial velocity data for HD 208487 has been re‐analyzed using a Bayesian nonlinear model fitting program that employs a parallel tempering Markov chain Monte Carlo algorithm with a novel statistical control system. We confirm the previously reported orbit (Tinney et al. 2005) of 130 days. In addition, we conclude there is strong evidence for a second planet with a period of days, an eccentricity of , and an .
803(2005); http://dx.doi.org/10.1063/1.2149790View Description Hide Description
I describe a work in progress that uses Bayes’ theorem, model selection, and marginalization in the analysis of photon count data frames from a stellar optical interferometer (the Navy Prototype Optical Interferometer). These data frames in general have between 1–6 stellar fringes (baselines) present. I show how Bayes factors provide a direct way of determining the number of fringes that are present in each data frame. I describe briefly the traditional Fourier‐based technique for computing optical interferometry data products. A Bayesian approach, in addition to providing model selection directly from the data frames, also provides a way of combining the computed data products from each data frame in a manner that intrinsically handles the varying SNRs between the data frames. I use simulated data to show comparisons between my Bayesian approach and the traditional Fourier‐based technique for the analysis of such data.
803(2005); http://dx.doi.org/10.1063/1.2149791View Description Hide Description
White Dwarfs are stars near the end of their lives which may pulsate with periods of a few minutes, and which can be observed to brighten and dim as they pulsate.
Observations of white dwarf light curves are characterised by broken time series (where days intervene between nighttime observations). Moreover the pulsation frequencies show appreciable amplitude harmonics (i.e. where frequency f appears so to do frequencies 2f, 3f, etc.) as well as frequency coupling (if frequencies f 1 and f 2 are observed, so too is f 1 + f 2).
In this paper the Bayesian spectral analysis approach due to Bretthorst is applied to the analysis of the light curves of white dwarfs. The method yields estimates of the dominant pulsation frequencies which are highly accurate and superior to estimates obtained from classical Fourier techniques. We discuss some of the particular problems which arise in the analysis of white dwarf light curves.
803(2005); http://dx.doi.org/10.1063/1.2149792View Description Hide Description
A Bayesian mixture modeling method was applied to Chandra Deep Field South (CDF‐S) to find faint extended sources at high redshift.
The probabilistic two‐component mixture model allows the separation of the diffuse background from celestial sources within a one‐step algorithm without data censoring. The background is modeled with a thin‐plate spline.The source and background estimation method was extended to allow the flux of celestial objects to be inverse‐Gamma distributed. In addition, all the detected sources are automatically parameterized to produce a list of source positions, count rates and morphological parameters.
The present analysis is applied to the CDF‐S. With its 940 ksec of exposure time, CDF‐S is one of the deepest X‐ray observations performed. We analyze the 0.5–2 keV energy band to search for clusters or groups of galaxies. Point‐like and extended sources are separated incorporating the knowledge of the observatory’s point spread function (PSF).
Combining the Bayesian mixture modeling technique with the angular resolution (≃ 1 arcsec) of the CDF‐S data, we can provide information about rare objects, such as clusters of galaxies, in the distant Universe.
803(2005); http://dx.doi.org/10.1063/1.2149793View Description Hide Description
What’s there, and how much? Given some data, which of an object’s possible components are present, and at what quantity?
By their nature, digital ON/OFF problems can lack continuity, and the likelihood function may well have awkward properties that inhibit many of today’s algorithms. A solution to some of these problems is to use nested sampling, which explores the state‐space more directly, and can be rather easier to implement as well as having more power.
Exploring the plausible patterns of switching involves digital choices, which can be interpreted in terms of a competition between transition engines that supply plausible exploratory steps. When quantitation is included, it is usually helpful to marginalise over the quantities being changed.
803(2005); http://dx.doi.org/10.1063/1.2149794View Description Hide Description
We introduce a method for making approximate Bayesian inference based on quantizing the hypothesis space and repartitioning it as observations become available. The method relies on approximating an optimal inference by using a probability distribution for quantized intervals of the unknown quantity, and by adjusting the intervals so as to obtain higher resolution in regions of higher probability, and vice versa.
We repartition the hypothesis space adaptively with the aim of maximizing the mutual information between the approximate distribution and the exact distribution. It is shown that this approach is equivalent to maximizing the entropy of the approximate distribution, and we provide low‐complexity algorithms for approximating multi‐dimensional posterior distributions with tunable complexity/performance.
The resulting quantized distribution for a one‐dimensional case can be visualized as a histogram where each bar has equal area, but in general unequal width. The method can be used to provide adaptive quantization of arbitrary data sequences, or to approximate the posterior expectation of for instance some loss function by summing over a pre‐specified number of terms.