BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING

An appreciation of Richard Threlkeld Cox
View Description Hide DescriptionRichard T. Cox’s contributions to the foundations of probability theory and inductive logic are not generally appreciated or understood. This paper reviews his life and accomplishments, especially those in his book The Algebra of Probable Inference and his final publication Inference and Inquiry which, in this author’s opinion, has the potential to influence in a significant way the design and analysis of self organizing systems which learn from experience. A simple application to the simulation of a neuron is presented as an example of the power of Cox’s contribution.

Bayesian estimation of time series lags and structure
View Description Hide DescriptionThis paper derives practical algorithms, based on Bayesian inference methods, for several data analysis problems common in time series analysis of astronomical and other data. One problem is the determination of the lag between two time series, for which the crosscorrelation function is a sufficient statistic. The second problem is the estimation of structure in a time series of measurements which are a weighted integral over a finite range of the independent variable.

Penalized maximum likelihood for multivariate Gaussian mixture
View Description Hide DescriptionIn this paper, we first consider the parameter estimation of a multivariate random process distribution using multivariate Gaussian mixture law. The labels of the mixture are allowed to have a general probability law which gives the possibility to modelize a temporal structure of the process under study. We generalize the case of univariate Gaussian mixture in [1] to show that the likelihood is unbounded and goes to infinity when one of the covariance matrices approaches the boundary of singularity of the non negative definite matrices set. We characterize the parameter set of these singularities. As a solution to this degeneracy problem, we show that the penalization of the likelihood by an Inverse Wishart prior on covariance matrices results to a penalized or maximum a posteriori criterion which is bounded. Then, the existence of positive definite matrices optimizing this criterion can be guaranteed. We also show that with a modified EM procedure or with a Bayesian sampling scheme, we can constrain covariance matrices to belong to a particular subclass of covariance matrices. Finally, we study degeneracies in the source separation problem where the characterization of parameter singularity set is more complex. We show, however, that Inverse Wishart prior on covariance matrices eliminates the degeneracies in this case too.

Statistical basis for multispectral infrared seeker trade studies
View Description Hide DescriptionCharacterization of the utility of information derived from a sensor tasked with performing the function of target object identification requires the consideration of a large number of system parameters including: • Possible object classes under observation • Measurementrelated properties of these objects • Observation geometry • Sensor modality/waveband(s) • Sensor noise characteristics • Uncertainties in sensor noise characteristics It is important to consider these parameters in a systematic way in order to objectively assess their relative effects on target selection. Since each of the listed parameters—with the exception of the spectral band(s)—is probabilistic in nature, it makes sense to cast the identification problem in a logically comprehensive way which statistically captures their interdependency. Such a formulation will lend itself to the evaluation and optimization of various aspects of measurement fidelity. This paper discusses a novel approach to systematically incorporating each of the parameters listed above in order to provide a basis for performing a number of system trade studies. The process will be illustrated in the context of a search for the optimal band pair to be used in a twocolor infrared (IR) sensor.

Bayesian analysis of single trial cortical eventrelated components
View Description Hide DescriptionA common technique in neurophysiology is the recording of electric potentials generated by cortical neuronal ensembles in relation to a specific event. The understanding of eventrelated potentials requires the identification of signals that are relatively phaselocked to a stimulus or event onset (eventrelated potentials) as well as nonphase locked activities. It is now widely accepted that the recorded phaselocked signal itself is not a homogeneous signal, but instead a combination of different components, which can vary in amplitude and latency from trial to trial. We approach the problem of identifying eventrelated component waveforms and their trialtotrial variability from a Bayesian perspective. We employ a signal model consisting of a set of unknown source waveforms each with their own set of trialtotrial amplitudes and latencies. Differential variability of the sources from trial to trial aids significantly in the identification of the component waveforms. The posterior probability density is derived for a specified number of eventrelated components using data from single or multiple sensors. The Maximum A Posteriori (MAP) solution is used to obtain the eventrelated component waveforms and their single trial parameters. The approach is demonstrated using a data set consisting of intracortically recorded local field potentials (LFP) in monkeys performing a visuomotor pattern discrimination task.

Separation of mixed hidden Markov model sources
View Description Hide DescriptionIn this contribution, we consider the problem of source separation in the case of noisy instantaneous mixtures. In a previous work [1], sources have been modeled by a mixture of Gaussians leading to an hierarchical Bayesian model by considering the labels of the mixture as hidden variables. However, in that work, labels have been assumed to be i.i.d. We extend this modelization to incorporate a Markovian structure for the labels. This extension is important for practical applications which are abundant: unsupervised classification and segmentation, pattern recognition, speech signal processing,… In order to estimate the mixing matrix and the a priori model parameters, we consider observations as incomplete data. The missing data are sources and labels: sources are missing data for observations and labels are missing data for incomplete missing sources. This hierarchical modelization leads to specific restoration maximization type algorithms. Restoration step can be held in three different manners: (i) Complete likelihood is estimated by its conditional expectation. This leads to the EM (expectationmaximization) algorithm [2], (ii) Missing data are estimated by their maximum a posteriori. This leads to JMAP (Joint maximum a posteriori) algorithm [3], (iii) Missing data are sampled from their a posteriori distributions. This leads to the SEM (stochastic EM) algorithm [4]. A Gibbs sampling scheme is implemented to generate missing data.

Sequential MCMC for spatial signal separation and restoration from an array of sensors
View Description Hide DescriptionThis paper addresses the implementation of sequential Markov Chain Monte Carlo (MCMC) estimation, also known as particle filtering, to signal separation and restoration problems, using a passive array of sensors. This proposed method offers significant advantages: 1) the signals mixed at the array can be wellseparated in space and restored in an online fashion, 2) the assumption of a stationary environment over the interval can be relaxed, 3) the estimated joint posterior distribution of all the unknown parameters can be used for statistical inference, and 4) the method can also be used to dynamically detect the number of signals throughout the observation period. The signals used in the simulation were mixed by a highlynonlinear but structured steeringvector matrix. Simulation results demonstrated the effectiveness of the method in such a way that the true and restored signals were clearly separated and restored by the sequential MCMC method.

Bayesian source separation and system data fusion methodology
View Description Hide DescriptionThe probability of correctly selecting a target object from among many objects is a measure of how well one can discriminate. If more than one system modality for object discrimination is available, then one can fuse respective information derived from multiple systems. In this case, performance is dependent upon the accurate association of object tracks seen by one system with common object tracks seen by another system and can be viewed in terms of answering the question: “Which objects seen by one system are associated with which objects seen by another system?” Because discrimination performance is dependent upon how accurately track data from the various systems is associated, the association question has bearing on the discrimination question, i.e., the association question must be answered to facilitate answering the discrimination question. The purpose of this paper is to address the association question using the logical question formalism advocated by Richard Cox instead of the standard approach of random variables. Biases result from random and common object track errors from each system. An association matrix correlates each object track seen by one system with object tracks seen by another system. While estimation of the common bias is essential to robust track association, most current association algorithms do not jointly estimate the association matrix and the common bias. The essential problem is analogous to that of blind source separation [2]. A combined MonN track association matrix and common bias inferencing algorithm using a Bayesian source separation methodology is described with a sample 2on2 track association problem. While the described Bayesian algorithm deals with common translation biases and currently uses only metric information in the likelihood function, the same algorithmic approach can also effectively deal with errors having the form of any common affine transformation and can be extended to exploit features and any other available track information. Although its effectiveness has some dependence on track positions and relative sizes of the random and common errors which should be further investigated, the algorithm is both statistically efficient in its optimal exploitation of the likelihood information and exhaustive in its delineation of and search over [4] all possible association configurations.

Bayesian blind component separation for cosmic microwave background observations
View Description Hide DescriptionWe present a technique based on the ExpectationMaximization (EM) algorithm for the separation of the components of noisy mixtures in the Fourier plane. We perform a semiblind joint estimation of components, mixing coefficients and noise rms levels. A priori information for the spatial spectrum of the components and for the mixing coefficients can be naturally included in the algorithm. This method is applied to the separation of distinct astrophysical emissions on simulations of future observations with the High Frequency Instrument of the Planck space mission, due to be launched in 2007. The simulations include a mixture of astrophysical emissions and instrumental white noise at the levels expected for this instrument. We have obtained good preliminary results with this technique, being able to blindly separate noisy mixtures with 3 components.

Physical mixture modeling with unknown number of components
View Description Hide DescriptionMeasured physical spectra often comprise an unknown number of components of known parametric family. A reversible jump Markov chain Monte Carlo (RJMCMC) technique is applied to the problem of estimating the number of components evident in the data jointly with the parameters of the components. The physical model consists of a mixture of components, an additive background, and a convolution with a blurring apparatus transfer function. The results were compared with the deconvolution of a formfree distribution. By calculating marginal posterior probability density distributions from the RJMCMC sample for the most probable number of components we estimated the parameters and their uncertainties. The method was applied to a benchmark test of Rutherford backscattering spectroscopy on a system consisting of a thin Cu film where we know that Cu consists of two isotopes.

Quantitative analysis of multicomponent mass spectra
View Description Hide DescriptionWe present a method for the decomposition of mass spectra of gas mixtures together with the relevant calibration measurements. Only the consistent usage of calibration measurements, though noisy, provides sufficient information to overcome this otherwise highly under determined problem. For the example of the mixture of three carbon hydrates the feasibility of the procedure will be demonstrated, exploiting singular value decomposition of the cracking matrices. Knowing neither the cracking pattern nor the concentrations of the contributing molecules, the algorithm provides both in the form of expectation values together with error margins.

Bayesian blocks in two or more dimensions: Image segmentation and cluster analysis
View Description Hide DescriptionI describe an extension, to higher dimensions, of the Bayesian Blocks algorithm for estimating signals in noisy time series data [17,18]. We seek the partition of the data space with the maximum posterior for a model consisting of a homogeneous Poisson process in each partition element. Model attributing the data within region n of the data space to a Poisson process with a fixed event rate has a global posterior depending on only N, the number of data points in the region, and its volume: Note that does not appear, since it has been marginalized, using a flat, improper prior. Other priors yield similar formulas. This expression is valid for a data space of any dimension. Suppose two regions, described by and are candidates for being merged into one. The Bayes merge factor, giving the posterior ratio for merged and not merged, respectively, is: Then collect data points into blocks with this cell coalescence algorithm: (1) Identify each cell of the Voronoi tessellation of the data as a block (2) Iteratively merge the pair of blocks with the largest merge factor (3) Halt when the maximum merge factor falls below 1 In many applications it convenient to restrict mergers to neighboring blocks. This algorithm partitions the space into a set of relatively few blocks, each having a density equal to the number of its data points divided by its volume. Adjacent highdensity blocks can be collected into clusters. This method allows detection of clusters in highdimensional data spaces, with the following properties: • The number of clusters is determined, not assumed • Clusters can have any shape:  Avoid the conventional Gaussian assumption  Shapes can include both concavities and convexities  Blocks and clusters do not even have to be simply connected • Cluster density profiles are estimated, not just the boundaries • Any slowly varying background is automatically identified • No binning of the raw data is necessary

On some properties of KozachenkoLeonenko estimates and maximum entropy principle in goodness of fit tests construction
View Description Hide DescriptionThe KozachenkoLeonenko type of entropy estimates are considered. It is proved the consistency and some other properties of this estimates. Tests of goodness of fit for exponential distribution is build.

Reconstruction of transition probabilities by maximum entropy in the mean
View Description Hide DescriptionWhen observing a random dynamical system, in many cases we can only see its stationary regime, that is, its stationary or equilibrium distribution and do not have access at the dynamics of the system. Thus it is important to have procedures to reconstruct a transition matrix from the knowledge of its equilibriun distribution. We recast the problem of reconstructing a transition probability as a linear, illposed, inverse problem with convex constraints. We show how to solve it using the method of maximum entropy in the mean and compare with other methods of reconstruction. We present some simple applications, but mention here that we have tried reconstructing matrices up to size without any problem. Describing these would take up more paper space.

Direct imaging of fractional oxygen in Hgbased high superconductors
View Description Hide DescriptionMaximum Entropy is applied to the crystallographic imaging of xray diffraction data in order to reveal reliable modelfree weak electron density features (if any) in newly discovered high superconductors. The use of suitably computed nonuniform priors turns out to be essential. The suggested maxentropic procedure shows that about .2 oxygen atoms [1.6 electrons] can unambiguously be evidenced near the much heavier mercury atoms [harboring 80 electrons each], and this from standard laboratory [nonsynchrotron] xray data.

Bayesian estimation of the hemodynamic response function in functional MRI
View Description Hide DescriptionFunctional MRI (fMRI) is a recent, noninvasive technique allowing for the evolution of brain processes to be dynamically followed in various cognitive or behavioral tasks. In BOLD fMRI, what is actually measured is only indirectly related to neuronal activity through a process that is still under investigation. A convenient way to analyze BOLD fMRI data consists of considering the whole brain as a system characterized by a transfer response function, called the Hemodynamic Response Function (HRF). Precise and robust estimation of the HRF has not been achieved yet: parametric methods tend to be robust but require too strong constraints on the shape of the HRF, whereas nonparametric models are not reliable since the problem is badly conditioned. We therefore propose a full Bayesian, nonparametric method that makes use of basic but relevant a priori knowledge about the underlying physiological process to make robust inference about the HRF. We show that this model is very robust to decreasing signaltonoise ratio and to the actual noise sampling distribution. We finally apply the method to real data, revealing a wide variety of HRF shapes.

Tomographic reconstruction from noisy data
View Description Hide DescriptionA generalized maximum entropy based approach to noisy inverse problems such as the Abel problem, tomography, or deconvolution is discussed and reviewed. Unlike the more traditional regularization approach, in the method discussed here, each unknown parameter (signal and noise) is redefined as a proper probability distribution within a certain prespecified support. Then, the joint entropies of both, the noise and signal probabilities, are maximized subject to the observed data. We use this method for tomographic reconstruction of the soft xray emissivity of hot fusion plasma.

Regularized wavelets for solving inverse Illposed problems
View Description Hide DescriptionThis paper describes regularized wavelets and numerical algorithms for a regularized waveletanalysis based on the bayes strategy. This program includes the investigation of a possibility for finding a basis in terms of a multiresolution analysis under condition that a scaling function would satisfy the properties of a regularization operator and an orthonormal basis simultaneously. Examples of application of the regularized wavelets in the differentiation of composite simulated spectra with a fractal noise are considered.

The structure of divergence(s) in stationary state of irreversible heat conduction processes and their partial differential equations of elliptic type
View Description Hide DescriptionIrreversible processes mean entropy production or simply energy dissipation. This is true for stationary states too. The Laplace’s equation for heat conduction as an elliptic linear second order partial differential equation does not express any energy dissipation in the conservative potential field according to the minimum principles. A new quasilinear elliptic type second order partial differential equation to stationary state heat conduction process was analyzed with the aid of minimum principles (and also on the base of the divergence term). Investigations made for Onsager [1,2] and Prigogine [3,4] principles showed the deciding role of local dissipation potentials. The existence of these potentials is basically a crucial point for real processes. The new quasilinear elliptic type partial differential equation of second order is in total agreement with Gyarmati’s [5] integral principle for stationary state too. Treating the above questions the proper Lagrange densities and the EulerLagrange differential equations must be applied [16] in the different representational pictures (to treat the variational problems). On the base of the new equation(s) the different nonequilibrium temperatures can be determined for steady state irreversible processes but it cannot be done for the Laplace’s equation. The structure of the divergence shows all these features. And what is more important one can find a connecting equation between the internal energy and the entropy (entropy production!) considering the steady state irreversible process. The new equation(s) interprets in a special way the results of the socalled dimensional analysis for nonlinear heat conduction in stationary state too. Boundary conditions were also taken into consideration. Discussion with heat reservoirs helps to expose the questions on the classical thermodynamic level too.

Entropic dynamics
View Description Hide DescriptionI explore the possibility that the laws of physics might be laws of inference rather than laws of nature. What sort of dynamics can one derive from wellestablished rules of inference? Specifically, I ask: Given relevant information codified in the initial and the final states, what trajectory is the system expected to follow? The answer follows from a principle of inference, the principle of maximum entropy, and not from a principle of physics. The entropic dynamics derived this way exhibits some remarkable formal similarities with other generally covariant theories such as general relativity.