BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING: 20th International Workshop

Maximum entropy from the laws of probability
View Description Hide DescriptionA new derivation is presented of maximum entropy, which is an extremizing principle for assigning probability distributions from expectation values. The additive form for the maximand is first proved by requiring that, when some probabilities are given, the procedure for finding the remaining probabilities should not depend on the values of the given probabilities. This condition induces functional equations whose solution generates the additive form. To find the function Φ we assign two distributions in separate spaces from separate expectation values; then assign a joint distribution by taking these same values to be expectations of its marginals; then require these marginals to be the same as the separately assigned distributions. The resulting functional equations have only one viable solution—the entropic form The exploitation of marginal distributions is due to Shore and Johnson [1], but the present derivation uses weaker axioms that require only consistency with the sum and product rules. In contrast to the informationtheoretic derivation of Shannon [2], no interpretation of the maximand functional is involved.

Role and meaning of subjective probability: Some comments on common misconceptions
View Description Hide DescriptionCriticisms of so called ‘subjective probability’ come on the one hand from those who maintain that probability in physics has only a frequentistic interpretation, and, on the other, from those who tend to ‘objectivise’ Bayesian theory, arguing, e.g., that subjective probabilities are indeed based ‘only on private introspection’. Some of the common misconceptions on subjective probability will be commented upon in support of the thesis that coherence is the most crucial, universal and ‘objective’ way to assess our confidence on events of any kind.

Algorithmic complexity and randomness
View Description Hide DescriptionAlgorithmic information content is an important fundamental idea for dealing with systems that contain or in the course of time produce large amounts of information. This notion provides a rigorous meaning to the intuitive concept of randomness, and thereby plays an important role in formal mathematics as well as in theoretical computer science. More importantly for applications, however, this idea has succeeded in providing a quantitatively precise way of characterizing the output of a deterministic dynamical system as random, or equivalently, algorithmically complex. This presentation will provide an introduction to the basic ideas of algorithmic complexity and randomness, as well to the applications of these ideas to the characterization of dynamical systems as examples of complex behavior.

Fisher’s information and the arrow of time
View Description Hide DescriptionThe fundamental equations of physics are invariant under time reversal, but everyday phenomena are usually irreversible ones. This is, in a nutshell, the socalled “arrow of time” dilemma. We address it here in terms of a rather old information measure: that of Fisher’s (1925).

MiniMax entropy and maximum likelihood: Complementarity of tasks, identity of solutions
View Description Hide DescriptionConcept of exponential family is generalized by simple and general exponential form. Simple and general potential are introduced. Maximum Entropy and Maximum Likelihood tasks are defined. ML task on the simple exponential form and ME task on the simple potentials are proved to be complementary in setup and identical in solutions. ML task on the general exponential form and ME task on the general potentials are weakly complementary, leading to the same necessary conditions. A hypothesis about complementarity of ML and MiniMax Entropy tasks and identity of their solutions, brought up by a special case analytical as well as several numerical investigations, is suggested in this case. MiniMax Ent can be viewed as a generalization of MaxEnt for parametric linear inverse problems, and its complementarity with ML as yet another argument in favor of Shannon’s entropy criterion.

On the foundations of Bayesianism
View Description Hide DescriptionWe discuss precise assumptions entailing Bayesianism in the line of investigations started by Cox, and relate them to a recent critique by Halpern. We show that every finite model which cannot be rescaled to probability violates a natural and simple refinability principle. A new condition, separability, was found sufficient and necessary for rescalability of infinite models. We finally characterize the acceptable ways to handle uncertainty in infinite models based on Cox’s assumptions. Certain closure properties must be assumed before all the axioms of ordered fields are satisfied. Once this is done, a proper plausibility model can be embedded in an ordered fields containing the reals, namely either standard probability (field of reals) for a real valued plausibility model, or extended probability (field of reals and infinitesimals) for an ordered plausibility model. The end result is that if our assumptions are accepted, all reasonable uncertainty management schemes must be based on sets of extended probability distributions and Bayes conditioning.

Change, time and information geometry
View Description Hide DescriptionDynamics, the study of change, is normally the subject of mechanics. Whether the chosen mechanics is “fundamental” and deterministic or “phenomenological” and stochastic, all changes are described relative to an external time. Here we show that once we define what we are talking about, namely, the system, its states and a criterion to distinguish among them, there is a single, unique, and natural dynamical law for irreversible processes that is compatible with the principle of maximum entropy. In this alternative dynamics changes are described relative to an internal, “intrinsic” time which is a derived, statistical concept defined and measured by change itself. Time is quantified change.

What is the question that MaxEnt answers? A probabilsitic interpretation
View Description Hide DescriptionThe BoltzmannWallisJaynes’ multiplicity argument is taken up and elaborated. MaxEnt is proved and demonstrated to be just an asymptotic case of looking for such a vector of absolute frequencies in a feasible set, which has maximal probability of being generated by a uniform prior generator/pmf.

Maximum entropy, fluctuations and priors
View Description Hide DescriptionThe method of maximum entropy (ME) is extended to address the following problem: Once one accepts that the ME distribution is to be preferred over all others, the question is to what extent are distributions with lower entropy supposed to be ruled out. Two applications are given. The first is to the theory of thermodynamic fluctuations. The formulation is exact, covariant under changes of coordinates, and allows fluctuations of both the extensive and the conjugate intensive variables. The second application is to the construction of an objective prior for Bayesian inference. The prior obtained by following the ME method to its inevitable conclusion turns out to be a special case of what are currently known under the name of entropic priors.

Cybernetic systems based on inductive logic
View Description Hide DescriptionRecent work in the area of inductive logic suggests that cybernetics might be quantified and reduced to engineering practice. If so, then there are considerable implications for engineering, science, and other fields. This paper attempts to capture the essential ideas of cybernetics cast in the light of inductive logic. The described inductive logic extends conventional logic by adding a conjugate logical domain of questions to the logical domain of assertions intrinsic to Boolean Algebra with which most are familiar. This was first posited and developed by Richard Cox. Interestingly enough, these two logical domains, one of questions and the other of assertions, only exist relative to one another with each possessing natural measures of entropy and probability, respectively. Examples are given that highlight the utility of cybernetic approaches to neuroscience, algorithm design, system engineering, and the design and understanding of defensive and offensive systems. For example, the application of cybernetic approaches to defense systems suggests that these systems possess a wavefunction which like quantum mechanics, collapses when we “look” through the eyes of the system sensors such as radars and optical sensors.

Relating Bayesian mixturemodel classifiers to other popular pattern classifiers
View Description Hide DescriptionThis paper relates Bayesian mixturemodel classifiers to other popular pattern classification algorithms including Parzen kernel, radialbasisfunction neural network, and support vector machine algorithms. It compares both the training and operation modes of the different algorithms. It shows that the models underlying the other methods can be viewed as subsets of mixture models. In particular, it shows that support vector machine methods can be used to establish starting points for Bayesian mixture model training methods.

An entropy decomposition related to law’s mixture
View Description Hide DescriptionThis note deals with the entropy of a joint distribution (here a continuous law) and the well known decomposition (8) with the relative or mutual information via Kullback K information, f and g the respective marginal densities, and and the marginal entropies for X and Y laws. But, as well, the densitiy factorizations yield new formulas or again (7) when using conditional entropies expectations. Unlike (8), the last term in (7) is positive and therefore provides some analogies with variance decomposition. Moreover one gets also (6) and this shows that entropy is an over additive function.

Relationship between entropies, variance and Fisher information
View Description Hide DescriptionIn this paper, relations between the entropy and the maximum entropy and other criteria such as variance and Fisher information are discussed.

Characterization of Pearsonian and bilateral power series distribution via maximum entropies
View Description Hide DescriptionThe maximization of the entropy in a class of distributions subject to certain constraints has an important role in statistical theory. Kagan, Linnik & Rao (1973), Kapur (1989) and A. W. Kemp (1997) used the maximum entropy distributions and characterized a large number of discrete and continuous distributions under certain constraints. Athreya (1994) showed that every probability distribution is the unique maximizer of the relative entropy in an appropriate class of pdf’s under certain constraints via a variant of the Lagrange multiplier method. Also, we obtain, using Athreya’s result, the conditions under which certain distributions such as Pearsonian distributions and inverse Gaussian distributions are MEPD. Characterizations via MEPD subject to certain constraints for bilateral polynomial power series, generalized PolyaEggenberger and generalized MarkovPolya and related distributions are also obtained.

Nonequilibrium ensembles: A Lagrangian formalism for classical systems
View Description Hide DescriptionA nonequilibrium ensemble is formulated and the partition function of a system in nonequilibrium transient state is derived by a method based upon the maximization of the Boltzmann’s entropy constrained by the nonlinear Boltzmannkinetic equation. We consider a logical sensation, which expresses that the characteristic function in a nonequilibrium ensemble that has a form similar to the equilibrium one. The connection between the Nparticles to oneparticle partition function is obtained and the timedependent thermodynamic functions as well as transport kinetics coefficients can then be calculated. On the other hand by combining the phenomenological theory of irreversible processes and the results of the kinetic theory of irreversible processes obtained from the Boltzmannkinetic equation for a dilute gas, a nonequilibrium ensemble method has been formulated for dilute gases as a parallel extension to the Gibbs ensemble method in equilibrium statistical mechanics. This method is distinct from those of McLennan, Zubarev and Sobouti. The main distinguishing feature is, the use of an irreversible kinetic equation (i.e., The Boltzmannkinetic equation) instead of the time reversal invariant Liouville equation.

The quantization of the attention function under a Bayes information theoretic model
View Description Hide DescriptionBayes experimental design using entropy, or equivalently negative information, as a criterion is fairly well developed. The present work applies this model but at a primitive level in statistical sampling. It is assumed that the observer/experimentor is allowed to place a window over the support of a sampling distribution and only “pay for” observations that fall in this window. The window can be modeled with an “attention function,” simply the indicator function of the window. The understanding is that the cost of the experiment is only the number of paid for observations: n. For fixed n and under the information model it turns out that for standard problems the optimal structure for the window, in the limit amongst all types of window including disjoint regions, is discrete. That is to say it is optimal to observe the world (in this sense) through discrete slits. It also shows that in this case Bayesians with different priors will receive different samples because typically the optimal attention windows will be disjoint. This property we refer to as the quantization of the attention function.

A minimumentropy estimator for regression problems with unknown distribution of observation errors
View Description Hide DescriptionWe consider a nonlinear regression model, with independent observation errors identically distributed with an unknown probability density function Instead of minimizing the empirical version of the entropy of based on the residuals, which corresponds to maximum likelihood estimation and requires the knowledge of we minimize the entropy of a (symmetrized) kernel estimate of constructed from the residuals. Two examples are presented to illustrate the finitesample behavior of this estimator (accuracy, robustness). Some (preliminary) consistency results are given.

System parameter estimation in tomographic inverse problems
View Description Hide DescriptionInverse problems are typically solved under the assumption of known geometric system parameters describing the forward problem. Should such information be unavailable or inexact, the estimation of these parameters from only observed sensor data may be necessary prior to reconstruction of the desired signal. We demonstrate the feasibility of such estimation via maximumlikelihood methods for the system parameters with expectationmaximization as an optimization mechanism within a Bayesian estimation framework for the final reconstruction problem.

Experimental design to maximize information
View Description Hide DescriptionThis paper will consider different methods to measure the gain of information that an experiment provides on parameters of a statistical model. The approach we follow is Bayesian and relies on the assumption that information about model parameters is represented by their probability distribution so that a measure of information is any summary of the probability distributions satisfying some sensible assumptions. Robustness issues will be considered and investigated in some examples using a new family of information measures which have the logscore and the quadratic score as special cases.

Image modeling and restoration—Information fusion, Settheoretic methods, and the Maximum entropy principle
View Description Hide DescriptionSeveral powerful, but heuristic techniques in recent image denoising literature have used overcomplete image representations. We present a general framework for fusing information from multiple representations based on fundamental statistical estimation principles where, information about image attributes from multiple wavelet transforms is incorporated as moment constraints on the underlying image prior. Our method constructs the maximum entropy distribution consistent with these moment constraints. A maximum a posteriori (MAP) image restoration algorithm based on this maximum entropy prior is developed. We also explore a fundamental equivalence between the stochastic setting of multipledomain restoration and its deterministic settheoretic counterpart. The insights gained by this analysis allow us to derive a stateoftheart denoising algorithm.