Bayesian Inference and Maximum Entropy Methods In Science and Engineering
872(2006); http://dx.doi.org/10.1063/1.2423255View Description Hide Description
In this tutorial talk, we will first review the main established tools of probability and information theories. Then, we will consider the following main questions which arise in any inference method: i) Assigning a (prior) probability law to a quantity to represent our knowledge about it, ii) Updating the probability laws when there is new piece of information, and iii) Extracting quantitative estimates from a (posterior) probabilty law.
For the first, the main tool is the Maximum Entropy Principle (MEP). For the second, we have two tools: i) Minimising the relative entropy (the Kullbak‐Leibler discrepency measure), and ii) The Bayes rule. We will make precise the appropriate situations to use them as well as their possible links. For the third problem, we will see that, even if it can be handeled through decision theory, the choice of an utility function may depend on the two previous tools used to arrive at that posterior probability. Finally, these points will be illustrated through examples of inference methods for some inverse problems such as image restoration or blind source separation.
872(2006); http://dx.doi.org/10.1063/1.2423256View Description Hide Description
The literature is full of Bayesian interpretations of frequentist p‐values and confidence levels. All the attempts to rectify these interpretations have been a loosing battle. In fact such interpretations suggest that most users are likely to be Bayesian “without knowing it” and really want to make a different kind of inference.
872(2006); http://dx.doi.org/10.1063/1.2423257View Description Hide Description
A common objective of molecular simulations in chemistry and biology is to calculate the free energy difference between systems of interest. We propose to improve estimates of these free energies by modeling the underlying probability distribution as a the square of a “wave function”, which is a linear combination of Gram‐Charlier polynomials. The number of terms, N, in this expansion is determined by calculating the posterior probability, P(N | X), where X stands for all energy differences sampled in a simulation. The method offers significantly improved free energy estimates when the probability distribution is broad and non‐Gaussian, which makes it applicable to challenging problems, such as protein‐drug interactions.
872(2006); http://dx.doi.org/10.1063/1.2423258View Description Hide Description
We show that Skilling’s method of induction leads to a unique general theory of inductive inference, the method of Maximum relative Entropy (ME). The main tool for updating probabilities is the logarithmic relative entropy; other entropies such as those of Renyi or Tsallis are ruled out. We also show that Bayes updating is a special case of ME updating and thus, that the two are completely compatible.
872(2006); http://dx.doi.org/10.1063/1.2423259View Description Hide Description
The analysis of discrimination, feature and model selection conduct to the discussion of the relationships between Support Vector Machine (SVM), Bayesian and Maximum Entropy (MaxEnt) formalisms. MaxEnt discrimination can be seen as a particular case of Bayesian inference, which at its turn can be seen as a regularization approach applicable to SVM. Probability measures can be attached to each feature vector, thus feature selection can be described by a discriminative model over the feature space. Further the probabilistic SVM allows to define a posterior probability model for a classifier. In addition, the similarities with the kernels based on Kullback‐Leibler divergence can be deduced, thus returning with MaxEnt similarity.
872(2006); http://dx.doi.org/10.1063/1.2423260View Description Hide Description
The evolution of a quantum system, appropriate to describe nano‐magnets, can be mapped on a Markov process, continuous in β. The mapping implies a probability assignment that can be used to study the probability density (PDF) of the magnetization. This procedure is not the common way to assign probabilities, usually an assignment that is compatible with the von Neumann entropy is made. Making these two assignments for the same system and comparing both PDFs, we see that they differ numerically. In other words the assignments lead to different PDFs for the same observable within the same model for the dynamics of the system. Using the maximum entropy principle we show that the assignment resulting from the mapping on the Markov process makes less assumptions than the other one.
Using a stochastic queue model that can be mapped on a quantum statistical model, we control both assignments on compatibility with the Gibbs procedure for systems in thermal equilibrium and argue that the assignment resulting from the mapping on the Markov process satisfies the compatibility requirements.
872(2006); http://dx.doi.org/10.1063/1.2423261View Description Hide Description
A continuum of coupled oscillators is considered, described by a continuous tensor product of Hilbert spaces. The mode position U x and the mode momentum U p are operators which act collectively on all oscillators. They obey equations of motion which are very similar to those of a harmonic oscillator. Q‐functionals are used to introduce entropic quantities that describe correlations among the oscillators. If the system is in an entangled state, the formalism can be used to quantify concepts like the location of entanglement; and the speed with which the entanglement propagates.
872(2006); http://dx.doi.org/10.1063/1.2423262View Description Hide Description
We represent the general approach to the description of influence of defects on critical behaviors of system. The structure of the crystal with defects is represented as graph. Results of possible ways to determine a distribution function of defects are discussed. The phenomenological theory of phase transitions in crystals with defects with any distribution of defects is presented. It is shown, that temperature dependences of thermodynamic functions essentially depend on a concrete kind of distribution of defects.
872(2006); http://dx.doi.org/10.1063/1.2423263View Description Hide Description
We develop a new technique for blind separation of potentially non independent components in astrophysical images. Given a set of linearly mixed images, corresponding to different measurement channels, we estimate the original electromagnetic radiation sources in a blind fashion. Specifically, we investigate the separation of cosmic microwave background (CMB), thermal dust and galactic synchrotron emissions without imposing any assumption on the mixing matrix. In our approach, we use the Gaussian and non‐Gaussian features of astrophysical sources and we assume that CMB‐dust and CMB‐synchrotron are uncorrelated pairs while dust and synchrotron are correlated which is in agreement with theory. These assumptions allow us to develop an algorithm which associates the Minimum Entropy solutions with the non‐Gaussian sources (thermal dust and galactic synchrotron emissions) and the Maximum Entropy solution as the only Gaussian source which is the CMB. This new method is more appropriate than ICA algorithms because independence between sources is not imposed which is a more realistic situation. We investigate two specific measures associated with entropy: Gaussianity Measure (GM) and Shannon Entropy (SE) and we compare them. Finally, we present a complete set of examples of separation using these two measures validating our approach and showing that it performs better than FastICA algorithm. The experimental results presented here were performed on an image database that simulates the measurements expected from the instruments that will operate onboard ESA’s Planck Surveyor Satellite to measure the CMB anisotropies all over the celestial sphere.
872(2006); http://dx.doi.org/10.1063/1.2423264View Description Hide Description
In this paper we introduce a new class of manifolds, generalized flag manifolds , for the complex and subspace ICA problems. A generalized flag manifold is a manifold consisting of subspaces which are orthogonal to each other. The class of generalized flag manifolds include the class of Grassmann manifolds. We extend the Riemannian optimization method to include this new class of manifolds by deriving the formulas for the natural gradient and geodesics on these manifolds. We show how the complex and subspace ICA problems can be solved by optimization of cost functions on a generalized flag manifold. Computer simulations demonstrate our algorithm gives good performance compared with the ordinary gradient descent method.
Electrode Selection for Noninvasive Fetal Electrocardiogram Extraction using Mutual Information Criteria872(2006); http://dx.doi.org/10.1063/1.2423265View Description Hide Description
Blind source separation (BSS) techniques have revealed to be promising approaches for the noninvasive extraction of fetal cardiac signals from maternal abdominal recordings. From previous studies, it is now believed that a carefully selected array of electrodes well‐placed over the abdomen of a pregnant woman contains the required ‘information’ for BSS, to extract the complete fetal components. Based on this idea, previous works have involved array recording systems and sensor selection strategies based on the Mutual Information (MI) criterion. In this paper the previous works have been extended, by considering the 3‐dimensional aspects of the cardiac electrical activity. The proposed method has been tested on simulated and real maternal abdominal recordings. The results show that the new sensor selection strategy together with the MI criterion, can be effectively used to select the channels containing the most ‘information’ concerning the fetal ECG components from an array of 72 recordings. The method is hence believed to be useful for the selection of the most informative channels in online applications, considering the different fetal positions and movements.
872(2006); http://dx.doi.org/10.1063/1.2423266View Description Hide Description
We recently proposed a quasi‐efficient maximum likelihood approach for blindly separating Markovian time series. In the present paper, we extend this idea to bi‐dimensional sources (in particular images), where the spatial autocorrelation of each source is described using a second‐order Markov random field. The experimental results using artificial and real images prove the advantage of the method with respect to the maximum likelihood approaches which do not take into account the source autocorrelation, and the autocorrelation‐based methods which ignore the source non‐Gaussianity.
872(2006); http://dx.doi.org/10.1063/1.2423267View Description Hide Description
The surface of Mars is currently being mapped with an unprecedented spatial resolution. This high resolution, and its spectral range, give the ability to pinpoint chemical species on Mars more accurately than before. The subject of this paper is to present a method to extract informations on chemicals using hyperspectral images. We propose to combine spatial Independent Component Analysis (ICA) and spectral Bayesian Positive Source Separation (BPSS). The basic idea is to use spatial ICA yielding a rough classification of pixels, which allows selection of small, but relevant, number of pixels. BPSS is then applied for the estimation of the source spectra using this reduced set of pixels. Finally, the abundances of the components is assessed on the whole pixels of the images. Results of this approach are shown and evaluated by comparison with reference spectra.
872(2006); http://dx.doi.org/10.1063/1.2423268View Description Hide Description
We introduce a new iterative algorithm for Sparse Component Analysis (SCA). The algorithm, which we call Iterative Detection‐Estimation (IDE), is essentially a method to find sufficiently sparse solutions of underdetermined linear systems of equations. In the SCA context, this solves the source separation part of the problem. Each iteration of IDE consists of two steps. In the detection step, starting with a previously known estimate of the sparse solution vector, we detect which components of the solution are (possibly) active, i.e., having a considerable value. Then, in the estimation step, we compute the new estimate by finding a solution of the system which is the closest to the subspace specified by the detection step. This is called projection into the activity subspace. We will compare the solution obtained by the proposed algorithm against the minimum 1‐norm solution obtained by Linear Programming (LP). It is shown by experiment that, with proper choice of parameters, the proposed algorithm is about two orders of magnitude faster than state‐of‐the‐art interior‐point LP solvers, while providing the same (or better) accuracy. The algorithm may then be considered as an alternative to the LP approach for large‐scale problems.
872(2006); http://dx.doi.org/10.1063/1.2423269View Description Hide Description
Recovering a set of independent sources which are linearly mixed is the main task of the blind source separation. Utilizing different methods such as infomax principle, mutual information and maximum likelihood leads to simple iterative procedures such as natural gradient algorithms. These algorithms depend on a nonlinear function (known as score or activation function) of source distributions. Since there is no prior knowledge of source distributions, the optimality of the algorithms is based on the choice of a suitable parametric density model. In this paper, we propose an adaptive optimal score function based on the fractional moments of the sources. In order to obtain a parametric model for the source distributions, we use a few sampled fractional moments to construct the maximum entropy probability density function (PDF) estimation . By applying an optimization method we can obtain the optimal fractional moments that best fit the source distributions. Using the fractional moments (FM) instead of the integer moments causes the maximum entropy estimated PDF to converge to the true PDF much faster . The simulation results show that unlike the most previous proposed models for the nonlinear score function, which are limited to some sorts of source families such as sub‐gaussian and super‐gaussian or some forms of source distribution models such as generalized gaussian distribution, our new model achieves better results for every source signal without any prior assumption for its randomness behavior.
872(2006); http://dx.doi.org/10.1063/1.2423270View Description Hide Description
We consider detection and estimation of a periodic signal with an additive disturbance. We study estimation of both the frequency and the shape of the waveform and develop a method based on Fourier series modelling. The method has an advantage over time domain methods such as epoch folding, in that the hypothesis space becomes continuous. Using uninformative priors, the noise variance and the signal shape can be marginalised analytically, and we show that the resulting expression can be evaluated in real time when the data is evenly sampled and does not contain any low frequencies.
We compare our method with other frequency domain methods. Although derived in various different ways, most of these, including our method, have in common that the so called harmogram plays a central role in the estimation. But there are important differences. Most notable are the different penalty terms on the number of harmonic frequencies. In our case, these enter the equations automatically through the use of probability theory, while in previous methods they need to be introduced in an ad hoc manner. The Bayesian approach in combination with the chosen model structure also allow us to build in prior information about the waveform shape, improving the accuracy of the estimate when such knowledge is available.
872(2006); http://dx.doi.org/10.1063/1.2423271View Description Hide Description
The present contribution discusses a Riemannian‐gradient‐based algorithm and a projection‐based learning algorithm over a curved parameter space for single‐neuron learning. We consider the ‘blind deconvolution’ signal processing problem. The learning rule naturally arises from a criterion‐function minimization over the unitary hyper‐sphere setting. We consider the blind deconvolution performances of the two algorithms as well as their computational burden and numerical features.
872(2006); http://dx.doi.org/10.1063/1.2423272View Description Hide Description
The geometric theory of ignorance produced the so called δ‐priors where 0 ⩽ δ ⩽ 1. It is shown here that the 1‐priors, even though they are not conjugate, they are more ignorant than the 0‐priors. The 1‐priors, just like the 0‐priors can be thought as based on α > 0 virtual observations. However, where the 0‐priors add the α virtual observations to the actual n sample points, the 1‐priors subtract the α from the n!. I call this anti‐data since α of these points annihilate α of the observations leaving us with a total of n − α. True ignorance, that claims only the model and the observed data, has a price. To build the prior we must spend some of the information cash in hand. No free lunches.
872(2006); http://dx.doi.org/10.1063/1.2423273View Description Hide Description
Assume a undirected graph G on a finite domain X and a probability distribution P on X. Graph entropy, defined in terms of the vertex packing polytope, is recast as , with suitably defined plausibility Pl (R) wrt probability distribution R on the independent subsets of G.
The plausibility which results from the minimising R is called the plausibility wrt P on X serves to define the graph information distance for two distributions Q and P on X, given the graph structure G.
One verifies the usual properties of additivity and subadditivity wrt weak products of the supporting graphs. The method of GraphMaxEnt can be formulated accordingly. It is postulated that it admits an axiomatisation akin to that for MaxEnt.
Applied to probability kinematics, it permits interpreting a wide range of probability reassignments as the result of minimisation of graph information distance. This leads to defining Jeffrey rules for all these reassignments, along with inverse Jeffrey conditioning.
MaxGraphEnt is often successful in including new conditional information in situations when the standard MaxEnt may produce unsatisfactory and counterintuitive results. Such resolution, using entropies, of ‘Private Benjamin problem’ is presented.