BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING: 19th International Workshop

Nonuniform sampling: Bandwidth and aliasing
View Description Hide DescriptionFor spectroscopic measurements there are good reasons why one might consider using nonuniformly nonsimultaneously sampled complex data. The primary one is that the effective bandwidth, the largest spectral window free of aliases, can be much wider than with uniformly sampled data. In this paper we discuss nonuniformly nonsimultaneously sampled data, describe how these data are traditionally analyzed, analyze them using probability theory and show how probability theory generalizes the discrete Fourier transform: first for uniformly sampled data, then for nonuniformly sampled data and finally for nonuniformly nonsimultaneously sampled data. These generalizations demonstrate that aliases are not so much removed by nonuniform nonsimultaneous sampling as they are moved to much higher frequencies.

Optimal pulses
View Description Hide DescriptionThis paper is concerned with a problem of designing optimal waveforms. Specifically, the problem is to control the datageneration process so that data will be optimal toward answering a question, in the sense that the probabilities for answers to the question are maximally dispersed. The method presented here results in a variational principle in which the functional to be extremized is based upon the evidence for one hypothesis relative to another. As an example, the radar waveform that will be optimal toward discrimination between two targets is determined; the optimal waveform’s dependence on the targetimpulse response functions is quite revealing, but in accord with intuition.

Bayesian classification using an entropy prior on mixture models
View Description Hide DescriptionIn many classification problems, it is reasonable to base the analysis on a mixture model. A mixture model assumes that each sample is produced by first randomly selecting from a finite collection of data clusters and by then using the chosen cluster distribution to produce the class label and feature vector of the sample. If we know the set of model parameters, then when we observe a feature vector, we can predict the classification. When we do not know the parameters exactly, we must infer the model parameters from a training set of data samples. Taking the Bayesian approach, we want to determine the probability distribution for the parameters given the training data. Then when it comes time to predict the class label, given a feature vector, we integrate over the model parameter distribution. We argue that a good, objective choice for the prior distribution on the model parameters is based on the entropy of each mixture model. We show that this prior regularizes the model fit so that overfitting the training data has no adverse effects.

Model selection for inverse problems: Best choice of basis functions and model order selection
View Description Hide DescriptionA complete solution for an inverse problem needs five main steps: choice of basis functions for discretization, determination of the order of the model, estimation of the hyperparameters, estimation of the solution, and finally, characterization of the proposed solution. Many works have been done for the three last steps. The first two have been neglected for a while, in part due to the complexity of the problem. However, in many inverse problems, particularly when the number of data is very low, a good choice of the basis functions and a good selection of the order become primary. In this paper, we first propose a complete solution within a Bayesian framework. Then, we apply the proposed method to an inverse elastic electron scattering problem.

Optimal recovery of local truth
View Description Hide DescriptionProbability mass curves the data space with horizons!. Let f be a multivariate probability density function with continuous second order partial derivatives. Consider the problem of estimating the true value of at a single point z, from n independent observations. It is shown that, the fastest possible estimators (like the knearest neighbor and kernel) have minimum asymptotic meansquare errors when the space of observations is thought as conformally curved. The optimal metric is shown to be generated by the Hessian of f in the regions where the Hessian is definite. Thus, the peaks and valleys of f are surrounded by singular horizons when the Hessian changes signature from Riemannian to pseudoRiemannian. Adaptive estimators based on the optimal variable metric show considerable theoretical and practical improvements over traditional methods. The formulas simplify dramatically when the dimension of the data space is 4. The similarities with General Relativity are striking but possibly illusory at this point. However, these results suggest that nonparametric density estimation may have something new to say about current physical theory.

Axioms for probability from aggregatibility
View Description Hide DescriptionProbability is an aggregatible property of logical propositions; the theory of probability, including even its numerical representability, can be developed from the theory of aggregatibility. The new axiomatization given here improves the wellknown axiomatizations of Cox and of Kolmogorov. It explains why probabilities can be represented quantitatively; it shows how nonquantitative systems of probability differ from the standard system, and what has to be given up when one attempts to generalize; it shows that the path from propositional logic to probability theory can be surprisingly direct.

Entropy is only approximately aggregatible
View Description Hide DescriptionIn classical thermodynamics, entropy is treated much as other physical quantities. In particular, it is treated as if it were additive (and therefore aggregatible), at least during reversible processes. The more precise informationtheoretic definition of entropy due to Shannon is not aggregatible in general, as we show here. Just as aggregatibles arise naturally over Boolean algebras, entropies arise naturally over partition lattices. Partition lattices are more complicated than Boolean algebras, in general, and so entropies behave more complicatedly than aggregatibles. However, under certain circumstances common in physics, entropies become approximately aggregatible.

QDOE (Quantitative Design of Experiments): Some lessons from field experience and a connection to Bayesian methods
View Description Hide DescriptionThe purpose of this talk is to comment on the purpose of experiments and the nature of the risks associated with them. The connection of this topic to Bayesian methods is that the interpretation of the experiment is conditional on one’s prior belief about how the data arose. Factors subject to prior belief include the details of the items under test, the details of test conditions, and the details of sensor arrangement and performance. An incomplete range of alternatives in these areas of the prior will lead to misinterpretation. This talk will sketch out present methods of QDOE developed by the Navy and illustrate common pitfalls as experienced in major tests. The final box score on seven such tests, designed and conducted during the period 1981–1996, could be given as 313. That is, using QDOE we experienced three wins (i.e., foresaw a problem and avoided it), one loss (could not recover from a problem), and three ties (ran into a problem but recovered from it). Without QDOE, despite the fact that our goals would have been less ambitious, the probable outcome even for those lower goals would have been something like 241—many more losses.

Temperaturebased ascendancy derived from a cost or reward function
View Description Hide DescriptionUlanowicz in [1] defines ascendancy in terms of departure from maximumentropy (proportional) flow; however he does not explain what may cause this departure. Here the ascendancy is derived by minimizing a cost function where is the fraction of the total flow from input i to output j, is the corresponding cost of such flow, and α is a parameter (inversely proportional to temperature); the flow being subject to marginal flow constraints and Minimization of such a system is obtained by Evans in [2]. At high temperatures (small ) the first (min of negative entropy) term dominates, but as α increases (temperature decreases), the cost function dominates, causing a departure from maximum entropy, or ascendancy. Riverbed analogy: At high temperature (fast flows) the flow is mostly uniform (max entropy) across the river bed, but at low temperatures (limited flow), the structure of the riverbed (cost function) becomes more important, with some channels being cut off, or evaporated by too much sun, some flows being diverted by rocks, and so on. Also, if the total cost (or reward) term is held constant, the parameter can be considered a Lagrange multiplier, and the problem can be reduced (similar to a Legendre transformation) to a maximum entropy problem, subject to constraints.

Maxentropic reconstruction by first order splines
View Description Hide DescriptionWe present a way of combining maximum entropy in the mean with first order spline interpolation to obtain solutions to generalized moment problems or Fredholm integral equations of the first kind. An application to numerical inversion of Laplace transform is tried.

Bayesian background estimation
View Description Hide DescriptionThe ubiquitous problem of estimating the background of a measured spectrum is solved with Bayesian probability theory. A mixture model is used to capture the defining characteristics of the problem, namely that the background is smoother than the signal. The smoothness property is quantified in terms of a cubic spline basis where a variable degree of smoothness is attained by allowing the number of knots and the knot positions to be adaptively chosen on the basis of the data. The fully Bayesian approach taken provides a natural way to handle knot adaptivity, allows uncertainties in the background to be estimated and data points to be classified in groups containing only background and groups with additional signal contribution. Our technique is demonstrated on a PIXE spectrum from a geological sample and an Auger spectrum from an 10 monolayer iron film on tungsten.

Handling discordant data sets
View Description Hide DescriptionExperimental data from different sources may suffer from discordant calibrations and possibly covers different regions of the independent variables. A model function spanning the complete range has to account for all available data. In our approach one of the data sets is taken as correct on the absolute scale, while in the other data sets we allow for an unknown scale factor. Bayesian probability theory is employed to evaluate the unknown scale factors and the model parameters.

A Bayesian approach to source separation
View Description Hide DescriptionSource separation is one of signal processing’s main emerging domains. Many techniques such as maximum likelihood (ML), Infomax, cumulant matching, estimating function, etc. have been used to address this difficult problem. Unfortunately, up to now, many of these methods could not account completely for noise on the data, for different numbers of sources and sensors, for lack of spatial independence and for time correlation of the sources. Recently, the Bayesian approach has been used to push farther these limitations of the conventional methods. This paper proposes a unifying approach to source separation based on Bayesian estimation. We first show that this approach gives the possibility to explain easily the major known techniques in source separation as special cases. Then we propose new methods based on maximum a posteriori (MAP) estimation, either to estimate directly the sources, or the mixing matrices or even both.

Bayesian blocks
View Description Hide DescriptionIdentification of local structure in intensive data—such as time series, images, and higher dimensional processes—is an important problem in astronomy. Since the data are typically generated by an inhomogeneous Poisson process, an appropriate model is one that partitions the data space into cells, each of which is described by a homogeneous (constant event rate) Poisson process. It is key that the sizes and locations of the cells are determined by the data, and are not predefined or even constrained to be evenly spaced. For onedimensional time series, the method amounts to Bayesian changepoint detection. Three approaches to solving the multiple changepoint problem are sketched, based on: (1) divide and conquer with single changepoints, (2) maximum posterior for the number of changepoints, and (3) cell coalescence. The last method starts from the Voronoi tessellation of the data, and thus should easily generalize to spaces of higher dimension.

Estimating fish concentrations using trawl data
View Description Hide Description

Application of Bayesian inference to the study of hierarchical organization in selforganized complex adaptive systems
View Description Hide DescriptionWe consider the application of Bayesian inference to the study of selforganized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes’ Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello’s original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.

A Bayesian Markov chain Monte Carlo solution of the bilinear problem
View Description Hide DescriptionMany problems in imaging reduce to a desire to identify physically significant components within a set of images gathered during the variation of a parameter. We present a new method to identify physically meaningful regions in a series of images through the application of Bayesian statistics within a Markov chain Monte Carlo sampler. The method finds the physically meaningful bilinear solution appropriate to the problem.

Utilizing MRI to measure the transcytolemmal water exchange rate for the rat brain
View Description Hide DescriptionUnderstanding the exchange of water between the intra and extracellular compartments of the brain is important both for understanding basic physiology and for the interpretation of numerous MRI results. However, due to experimental difficulties, this basic property has proven difficult to measure in vivo. In our experiments, we will track overall changes in the relaxation rate constant of water in the rat brain following the administration of gadoteridol, a relaxation agent, to the extracellular compartment. From these changes, we will utilize probability theory and Markov Chain Monte Carlo simulations to infer the compartment specific water exchange and relaxation rate constants. Due to the correlated nature of these parameters and our inability to independently observe them, intelligent model selection is critical. Through analysis of simulated data sets, we refine our choice of model and method of data collection to optimize applicability to the in vivo situation.

Independence relationships implied by Dseparation in the Bayesian model of a causal tree are preserved by the maximum entropy model
View Description Hide DescriptionThe notion of conditional independence is fundamental to causal networks. In this paper, it is proved that the independence relationships inherent in the Bayesian model are preserved by the maximum entropy model.