BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING: 23rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering

Some Larger Significances of the Work of Edwin T. Jaynes
View Description Hide DescriptionEdwin T. Jaynes’ work touched on the lives of everyone who thought seriously about the meaning to be attached to the word “probability”. He explored the uses and abuses of the concept. He described, through numerous examples in many different fields, how we should define our problems, our terms and the evidence to be used in assigning values to probasbilities. He showed us that “probabilities” do not exist “out there” but rather are assigned. In that way he gave new interpretations to Shannon’s measure and created the Maximum Entropy Estimate.

Maximum Entropy Approach to the Theory of Simple Fluids
View Description Hide DescriptionWe explore the use of the method of Maximum Entropy (ME) as a technique to generate approximations. In a first use of the ME method the “exact” canonical probability distribution of a fluid is approximated by that of a fluid of hard spheres; ME is used to select an optimal value of the hard‐sphere diameter. These results coincide with the results obtained using the Bogoliubov variational method. A second more complete use of the ME method leads to a better descritption of the soft‐core nature of the interatomic potential in terms of a statistical mixture of distributions corresponding to hard spheres of different diameters. As an example, the radial distribution function for a Lennard‐Jones fluid (Argon) is compared with results from molecular dynamics simulations. There is a considerable improvement over the results obtained from the Bogoliubov principle.

An Easy Derivation of Logistic Regression from the Bayesian and Maximum Entropy Perspective
View Description Hide DescriptionLogistic regression is a popular data analytic technique, but a compelling rationale for the equations that appear is missing in the conventional explanations. However, if one approaches logistic regression from a combined Bayesian and Maximum Entropy viewpoint, the explanation of its origin is relatively simple and direct. The perspective given here proceeds in two major steps. First, formally manipulate the probability symbols to rearrange them into the desired format. Here we want the probability of a binary criterion variable conditioned on knowledge of some number of predictor variables. The formal manipulation is done by using the sum and product rules of probability theory. Second, assign the numerical values to the joint probabilities that appear in Bayes’s theorem by inserting information via Jaynes’s Maximum Entropy Principle. The logistic regression equation then appears after a few more simple steps. A detailed numerical example is given to show the correspondence between the derivation given here and the conventional results. This application is just one example of scientific inference following the directions given by Jaynes in his Probability Theory: The Logic of Science.

A Bayesian‐Maximum Entropy Approach to Subjective Voice Quality Testing
View Description Hide DescriptionIn order to assess the performance of Internet telephony, it is often necessary to translate network impairments (such as packet loss, delay and jitter) into human perceived quality (which is quantified in terms of subjective voice quality ratings). Subjective quality testing is expensive and typically involves a large number of questions and humans. It is therefore important to design simple and reliable subjective testing experiments. This paper presents a method to assess the subjective quality of a number of speech samples that have incurred various degrees of the same network impairment. Questions are asked according to an adaptive algorithm until all voice ratings are elicited within a desired accuracy. Our algorithm (i) uses information theory to minimize the expected number of questions needed and (ii) uses binary questions, which are simpler than the types of questions used by standard subjective testing procedures.

Using Thermodynamic Integration to Calculate the Posterior Probability in Bayesian Model Selection Problems
View Description Hide DescriptionThis paper gives an algorithm for calculating posterior probabilities using thermodynamic integration. The thermodynamic integration calculations are accomplished by annealing an ensemble of Markov chains with an adaptive schedule. The algorithm includes a method for determining “good” starting positions for the chains at each new value of the annealing parameter.

Divvy Economies Based On (An Abstract) Temperature
View Description Hide DescriptionThe Leontief Input‐Output economic system can provide a model for a one‐parameter family of economic systems based on an abstract temperature T. In particular, given a normalized input‐output matrix R and taking R= R(1), a family of economic systems R(1/T)=R(α) is developed that represents heating (T>1) and cooling (T<1) of the economy relative to T=1. .The economy for a given value of T represents the solution of a constrained maximum entropy problem.

Relative Entropy and Inductive Inference
View Description Hide DescriptionWe discuss how the method of maximum entropy, MaxEnt, can be extended beyond its original scope, as a rule to assign a probability distribution, to a full‐fledged method for inductive inference. The main concept is the (relative) entropy S[pq] which is designed as a tool to update from a prior probability distribution q to a posterior probability distribution p when new information in the form of a constraint becomes available. The extended method goes beyond the mere selection of a single posterior p, but also addresses the question of how much less probable other distributions might be. Our approach clarifies how the entropy S[pq] is used while avoiding the question of its meaning. Ultimately, entropy is a tool for induction which needs no interpretation. Finally, being a tool for generalization from special examples, we ask whether the functional form of the entropy depends on the choice of the examples and we find that it does. The conclusion is that there is no single general theory of inductive inference and that alternative expressions for the entropy are possible.

Maximum Entropy method with non‐linear moment constraints: challenges
View Description Hide DescriptionTraditionally, the Method of (Shannon‐Kullback’s) Relative Entropy Maximization (REM) is considered with linear moment constraints. Here, the method is studied with frequency moment constraints which are non‐linear in probabilities. The constraints challenge some justifications of REM: a) Probabilistic justification of REM via Conditioned Weak Law of Large Numbers cannot be invoked since the feasible set of distributions which is defined by frequency moment constraints admits several entropy maximizing distributions (I‐projections), b) Axiomatic justifications of REM are developed for linear moment constraints/convex sets. However, REM is not left completely unjustified in this setting, since Entropy Concentration Theorem and Maximum Probability Theorem can be applied.
Maximum Rényi/Tsallis’ entropy method (maxTent) is as well considered here due to non‐linearity of X‐frequency moment constraints which are used in Non‐extensive Thermodynamics. It is shown that under X‐frequency moment constraints maxTent distribution can be unique and different than the I‐projection. This implies that maxTent does not choose the most probable distribution and that the maxTent distribution is asymptotically conditionally improbable. Thus, what are adherents of maxTent accomplishing when they maximize Rényi’s or Tsallis’ entropy?

Approximate Inference based on Convex Set Sampling
View Description Hide DescriptionWe address the approximate inference problem by considering the set of all probability measures which satisfy the marginal and conditional constraints specified in a graphical model. The structure of this convex set is elucidated through the explicit enumeration of its extreme points for a number of important cases. This result generalizes the Birkhoff‐von Neumann theorem. For approximate inference calculations of marginals of interest, we present two non‐iterative algorithms based on sampling from this convex set. Results from numerical experiments support the accuracy of these approximations.

Entropies Old and New (and Both New and Old) and Their Characterizations
View Description Hide DescriptionA few entropies are enumerated that seem to be useful and have interesting properties. It is described how such properties characterize these entropies and some applications are mentioned. Some historical perspective is also given on how certain entropies were discovered and rediscovered.

An Information Theory for Preferences
View Description Hide DescriptionRecent literature in the Maximum Entropy workshop introduced an analogy between cumulative probability distributions and normalized utility functions. Based on this analogy, a utility density function is defined as the derivative of a normalized utility function. A utility density function has the same mathematical properties as a probability density function, and forms the basis of a mathematical correspondence between utility and probability. This paper presents several results that stem from this correspondence, and provides new interpretations to measures of information theory when applied to utility theory.

On Self‐Consistency of Cost Functions for Blind Signal Processing Based on Neural Bayesian Estimators
View Description Hide DescriptionIn some blind signal processing tasks, such as blind source deconvolution and blind source separation, the optimal signal processing structure is designed adaptively through cost function optimization. A class of cost functions known in the literature is based on pseudo‐error defined on the basis of Bayesian estimation of the source signals. The exact Bayesian estimators may rarely be computed, so that their neural approximations are often invoked. The present paper aims at investigating the self‐consistency of the cost functions based on such neural Bayesian estimators.

A new principle for macromolecular structure determination
View Description Hide DescriptionProtein NMR spectroscopy is a modern experimental technique for elucidating the three‐dimensional structure of biological macromolecules in solution. From the data‐analytical point of view, structure determination has always been considered an optimisation problem: much effort has been spent on the development of minimisation strategies; the underlying rationale, however, has not been revised. Conceptual difficulties with this approach arise since experiments only provide incomplete structural information: structure determination is an inference problem and demands for a probabilistic treatment. In order to generate realistic conformations, strong prior assumptions about physical interactions are indispensable. These interactions impose a complex structure on the posterior distribution making simulation of such models particularly difficult. We demonstrate, that posterior sampling is feasible using a combination of multiple Markov Chain Monte Carlo techniques. We apply the methodology to a sparse data set obtained from a perdeuterated sample of the Fyn SH3 domain.

Applying Differentially Variable Component Analysis (dVCA) to Event‐related Potentials
View Description Hide DescriptionEvent‐related potentials (ERPs) generated in response to multiple presentations of the same sensory stimulus vary from trial to trial. Accumulating evidence suggests that this variability relates to a similar trial‐to‐trial variation in the perception of the stimulus. In order to understand this variability, we previously developed differentially Variable Component Analysis (dVCA) as a method for defining dynamical components that contribute to the ERP. The underlying model asserted that: (i) multiple components comprise the ERP; (ii) these components vary in amplitude and latency from trial to trial; and (iii) these components may co‐vary. A Bayesian framework was used to derive maximum a posteriori solutions to estimate these components and their latency and amplitude variability. Our original goal in developing dVCA was to produce a method for automated estimation of components in ERPs. However, we discovered that it is better to apply the algorithm in stages because of the complexity of the ERP and to use the results to define interesting subsets of the data, which are further analyzed independently. This paper describes this method and illustrates its application to actual neural signals recorded in response to a visual stimulus. Interestingly, dVCA of these data suggests two distinct response modes (or states) with differing components and variability. Furthermore, analyses of residual signals obtained by subtracting the estimated components from the actual data illustrate gamma‐frequency (circa 40 Hz) oscillations, which may underlie communication between various brain regions. These findings demonstrate the power of dVCA and underscore the necessity to apply this algorithm in a guided rather than a ballistic fashion. Furthermore, they highlight the need to examine the residual signals for those features of the signals that were not anticipated and not modeled in the derivation of the algorithm.

Blind Source Separation, Independent Component Analysis, and Pattern Classification — Connections and Synergies
View Description Hide DescriptionBy employing Bayesian methods and Mixture‐Of‐Gaussian (MOG) models, we derive a set of algorithms, based on the Expectation Maximization (EM) approach, that can be used for Blind Source Separation (BSS), Independent Component Analysis (ICA) and Pattern Classification (PC). All of these applications share a common generative model, which describes how each observation is produced. By constraining the model parameters in different ways, we can support each of these applications. ICA requires the most restricted model, BSS employs a less restricted model, and PC uses the least restricted model. Thus, in a sense, the PC model contains the BSS model, which in turn contains the ICA model. The relationships between these methods provide important synergies which can be exploited to both speed training and to extend the applicability of the methods.

The Associativity Equation Re‐Revisited
View Description Hide DescriptionAssociativity is regarded as functional equation. Two ways of obtaining its general continuous strictly increasing solutions on real intervals are described. No differentiability or existence of neutral element or commutativity is pre‐assumed. Comments are made, among others on applications, and a generalization is also presented.

Deriving Laws from Ordering Relations
View Description Hide DescriptionThe effect of Richard T. Cox’s contribution to probability theory was to generalize Boolean implication among logical statements to degrees of implication, which are manipulated using rules derived from consistency with Boolean algebra. These rules are known as the sum rule, the product rule and Bayes’ Theorem, and the measure resulting from this generalization is probability. In this paper, I will describe how Cox’s technique can be further generalized to include other algebras and hence other problems in science and mathematics. The result is a methodology that can be used to generalize an algebra to a calculus by relying on consistency with order theory to derive the laws of the calculus. My goals are to clear up the mysteries as to why the same basic structure found in probability theory appears in other contexts, to better understand the foundations of probability theory, and to extend these ideas to other areas by developing new mathematics and new physics. The relevance of this methodology will be demonstrated using examples from probability theory, number theory, geometry, information theory, and quantum mechanics.

Bayesian Cherry Picking Revisited
View Description Hide DescriptionTins are marketed as containing nine cherries. To fill the tins, cherries are fed into a drum containing twelve holes through which air is sucked; either zero, one or two cherries stick in each hole. Dielectric measurements are then made on each hole. Three outcomes are distinguished: empty hole (which is reliable); one cherry (which indicates one cherry with high probability, or two cherries with a complementary low probability known from calibration); or an uncertain number (which also indicates one cherry or two, with known probabilities that are quite similar). A choice can be made from which holes simultaneously to discharge contents into the tin. The sum and product rules of probability are applied in a Bayesian manner to find the distribution for the number of cherries in the tin. Based on this distribution, ways are discussed to optimise the number to nine cherries.

From Minimum Entropy Production Principle To Minimum Information Loss With Elliptic Type Quasilinear PDEs
View Description Hide DescriptionThe Laplace equation does not contain any entropy production [27]. The entropy production can be illustrated with the Dirichlet Integral Principle and the quasilinear PDE of second order [28,27]. They can show the physical meaning too. The content of the quasilinear PDE leads to the probability density function of the process and the minimum principle of the entropy production [15,16,19,25]. The Maxwell’s demon shows the connection between [18,26,21,20,22,23,24] thermodynamics and the theory of information. The negentropy principle of Brillouin [22] gives the important bridge between the thermodynamical problem of dissipation and the gain in information. The entropy compensation at an open stationary state shows the relation between negentropy principle [27] and minimum entropy principle and the connection to minimum information loss.

An Analysis Methodology for the Gamma‐ray Large Area Space Telescope
View Description Hide DescriptionThe Large Area Telescope (LAT) instrument on the Gamma Ray Large Area Space Telescope (GLAST) has been designed to detect high‐energy gamma rays and determine their direction of incidence and energy. We propose a reconstruction algorithm based on recent advances in statistical methodology. This method, alternative to the standard event analysis inherited from high energy collider physics experiments, incorporates more accurately the physical processes occurring in the detector, and makes full use of the statistical information available. It could thus provide a better estimate of the direction and energy of the primary photon.