Full text loading...
Teaching statistics in the physics curriculum: Unifying and clarifying role of subjective probability
1.G. D’Agostini, “Bayesian reasoning vs conventional statistics in high energy physics,” in Proceedings of the XVIII International Workshop on Maximum Entropy and Bayesian Methods, Garching, Germany, July 1998, edited by V. Dose, W. von der Linden, R. Fischer, and R. Preuss (Kluwer Academic, Dordrecht, 1999);
1.LANL preprint physics/9811046. A copy can be found at the author’s URL: http://www-zeus.roma1.infn.itagostini/.
2.“Probable” comes from Latin and was used exactly with its contemporary meaning much before a formal theory of probability was developed.
3.B. de Finetti, Theory of Probability (Wiley, New York, 1974).
4.Note how “will” does not necessarily imply time ordering, but a condition of uncertainty concerning something that might have already happened.
5.D. Hume, Enquiry Concerning Human Understanding, 1748; electronic version at http://www.utm.edu/research/hume/wri/1enq/.
6.It is of crucial importance to have neatly separated in one’s mind “belief” from “imagination,” “subjective” from “arbitrary.” A clear analysis of the first two concepts was done by D. Hume (Ref. 5). The concept of coherence makes subjective degrees of belief not arbitrary.
7.The coherence rule is often described in the following way. Imagine that you assess the value of the probability, and hence the odds, and then another rational person chooses the direction of the bet. This situation is similar to the case where two persons wish to equally divide some goods: one makes the partition, and the other one has the choice.
8.In the axiomatic approach one does not attempt to define what probability is and how to assess it. Probability is just a real number satisfying the axioms. Using the axioms and the rules of logic, the probability of logically connected events can be evaluated. But the problem remains that probability is never well defined, which is a source of confusion mentioned in the Introduction.
9.It is obvious that, in an approach in which probability is always conditional probability, Eq. (5) cannot “define” conditional probability. The interpretation of Eqs. (4) and (5) in the subjective approach is that we are free to assess two of the three probabilities, but the third one is constrained by coherence. If the three assignments do not satisfy Eq. (5), it is possible to imagine a combination of bets in which one wins or loses with certainty, depending on the direction of the bets. Section 8.2 of Ref. 11 describes an example showing that the point of view on conditional probability described here is the same as that intuitively used by researchers.
10.One could argue that this number can also be obtained in any other approach, and this argument is formally true. The question is how to interpret it. Clearly 23% is neither a ratio of the number of favorable cases over the number of equiprobable cases, nor an evaluation from a long experiment on the relative frequency of favorable results. Only in the subjective approach is the result of each step of a probability calculation consistent with the definition.
11.G. D’Agostini, “Bayesian reasoning in high energy physics: Principles and applications,” CERN Report No. 99-03, July, 1999; electronic version at http://wwwas.cern.ch/library/cern_publications/yellow_reports.html and at author’s URL (see Ref. 1).
12.The concept of probability, well separated from the evaluation rules, is magnificently expressed in Chap. 6 of Hume’s essay (Ref. 5).
13.M. Kac, Probability and Related Topics in Physical Sciences (Interscience, New York, 1959).
14.G. Polya, Mathematics and Plausible Reasoning (Princeton University Press, Princeton, 1968), Vol. II.
15.F. Reif, Fundamental of Statistical and Thermal Physics (McGraw–Hill, New York, 1965).
16.R. von Mises, Probability, Statistics, and Truth, 1928, 2nd ed. (George Allen and Unwin, New York, 1957).
17.The term prevision rather than expected value is the preferred term of subjectivists. Prevision is a more general concept than the well known expected value, and can be applied to uncertain numbers as well as to events. When applied to events, prevision reduces to probability.
18.The law of large numbers is certainly the most known and the most misused law of probability. Bernoulli’s theorem talks about probabilities of relative frequencies, and not about a “limit of relative frequency to probability,” an expression which could give the idea of a limit in the usual mathematical sense. The theorem does not say that if at a certain moment a number in a lottery has appeared less frequently than what is expected from probability, then it will come out a bit more often in the future in order to obey the law of large numbers. It does not even justify the frequency based “definition” of probability. As pointed out by de Finetti (Ref. 3) “For those who seek to connect the notion of probability with that of frequency, results which relate probability and frequency in some way (and especially those results like the “law of large numbers”) play a pivotal role, providing support for the approach and for the identification of the concepts. Logically speaking, however, one cannot escape from the dilemma posed by the fact that the same thing cannot both be assumed first as a definition and then proved as a theorem; nor can one avoid the contradiction that arises from a definition which would assume as certain something that the theorem only states to be very probable.”
19.I find that students gain much in awareness of statistical matters if a clear distinction is made between descriptive statistics, probability theory, and inferential statistics. For example, an experimental histogram of a measured quantity should never be called a “probability distribution,” but should be called its correct name of “frequency distribution.”
20.Indicating by the subscript 1 the quantities referring to the remaining extractions, we have the obvious result, and Note, however, that the prevision of the relative frequency of the entire ensemble is in general different from that calculated a priori. Calling the uncertain number of favorable results in the next trials, we have the uncertain frequency and hence It is easy to understand that, as approaches we are practically sure about the overall relative frequency, because it belongs now to past.
21.The importance of this reasoning is well expressed by Poincaré: “… these problems are classified as probability of causes, and are the most interesting of all from their scientific applications” (Ref. 22).
22.H. Poincaré, Science and Hypothesis, 1905 (Dover, New York, 1952).
23.One can make frequency distributions of experimental observables (such as the readings of a scale) under apparently identical conditions of the quantity to be measured and of the measurement conditions, and use them to evaluate the likelihood. Instead, it is never possible to make a frequency distribution of true values, because they refer to an idealized concept. The only way to assess probabilities of true values is using a probability inversion following the reasoning we are developing. I find it crucially important that students be taught from the beginning about the distinction between the values of the reading (what is accessible to our senses) and that of the physics quantity (an abstract concept). Similarly, speaking about “data uncertainty” makes no sense (apart from pathological cases). Once the experiment is performed, data are certain by definition. What is uncertain are true values. The opposite reasoning is a product of frequentist teaching, according to which the true value is a constant of unknown value, and the category of probable is assigned only to data.
24.This time consuming procedure is not really needed, although introduced for teaching purposes, and one can use only the scores. Because the likelihoods in our example do not depend on the extraction, if at a certain moment we have observed white and black balls (with the iterative application of Bayes’ theorem gives where represents the binomial (B) probability function of parameters and This result corresponds to the intuitive idea that, in this problem, the inference should not depend on the order of the results. The fact that only two numbers and are sufficient to summarize the relevant information for the inference is related to the statistical concept of sufficiency. Instead, the idea that the possible sequences are considered a priori equiprobable, though the individual events (not to be confused with are not independent (because the probability of each event depends on the score of the previous events, as is clear from Eqs. (14) and (18) and as can be easily understood from Table I), is related to the concept of exchangeability (Ref. 3) which we will not consider here.
25.K. R. Popper, The Logic of Scientific Discovery (Hutchinson, New York, 1959).
26.Poincaré’s opinion about the probability of hypotheses is very enlighting. He calls the problem of assessing the “probability of the causes” (that is, of hypotheses) “the essential problem of the experimental method” (Ref. 22).
27.The standard hypothesis test is based on the following reasoning: One formulates a basic hypothesis (“null hypothesis”) and defines an observable θ for which one is able to calculate a probability distribution under the condition that is true. Then one defines a priori an interval in which θ has a high probability to occur and, as a consequence, a complementary region in which the probability is low. This latter probability is indicated by α and typical values considered are 1% and 5%. Finally, conclusions are drawn depending on where the experimental value of θ occurs. If it falls inside the high probability region, then is accepted. If it falls in the low probability region then “ is rejected with significance α” (see, for example, Ref. 28).
28.R. J. Barlow, Statistics (Wiley, New York, 1989).
29.Because this point is rather delicate and touches concepts well rooted in all those who are accustomed with standard statistical methods, it would need a long and careful discussion. I refer the reader to Ref. 11, and references therein. For a short account see also Refs. 1 and 30. The source of confusion is due to the fact that the statement, “the null hypothesis is rejected with a 1% significance,” is interpreted often (from my experience I would say almost always) as if had only a 1% chance of being correct. This mistake is made not only by students, but also by working scientists.
30.J. O. Berger and D. A. Berry, “Statistical analysis and the illusion of objectivity,” Am. Sci. 76, 159–165 (1988).
31.Obviously, prior knowledge is not always so vague as to be not influential. If one thinks of two sequential independent measurements of the same quantity performed with instruments of (generally speaking) similar quality, the global inference is obtained by iterating Bayes’ theorem, as was seen in the six box example. The prior of the second inference, that is, the final of the first one, has a similar weight of the second data. The presence of the priors in the inference is often considered as a weak point of Bayesian inference. But the criticism is not justified, because priors play a role which is consistent with what prior knowledge is expected to do. For an extensive discussion on this subject, see Ref. 32.
32.G. D’Agostini, “Overcoming priors anxiety,” Revista de la Real Academia de Ciencias 93 (1999) (to be published), special issue on Bayesian Methods in the Sciences, edited by J. M. Bernardo,
33.This result might seem trivial, because it is more or less how physicists interpret the results of measurements, even if they are not aware of Bayesian statistics. This interpretation is due to the fact that physicists’ intuition is very close to Bayesian reasoning (Ref. 1), and probability inversions of the kind implies that are considered very natural. However, in other approaches this inversion is arbitrary, although researchers do so intuitively, with a reasoning described in Refs. 1 and 11. But, unfortunately, most people are not aware of the implicit assumptions on which this intuitive probability inversion is based, namely, uniform priors and symmetric likelihood. If these assumptions do not hold, the numerical results are mistaken.
34.The fact that a consistent theory of measurement uncertainty which takes into account statistics and systematic contributions can only be achieved in the Bayesian scheme is also recognized by the metrology organizations. For example the ISO Guide (Ref. 35) states: “Type B standard uncertainty is obtained from an assumed probability density function based on the degree of belief that an event will occur [often called subjective probability… ];” “Recommendation … upon which this Guide rests implicitly adopts such a viewpoint of probability … as the appropriate way to calculate the combined standard uncertainty of a result of a measurement.” (According to the ISO recommendations, “The uncertainty in the result of a measurement generally consists of several components which may be grouped into two categories according to the way in which their numerical value is estimated: (A) those which are evaluated by statistical methods; (B) those which are evaluated by other means.” More precisely, the Type A uncertainty is evaluated from the dispersion of the results in the measurements of the physical quantity of interest, Type B is evaluated from all other information concerning the measurement, and it includes all uncertainties due to systematic errors.)
35.International Organization for Standardization, “Guide to the expression of uncertainty in measurement,” Geneva, Switzerland, 1993.
36.Errors within quotation marks remind the reader that error is often used improperly as a synonym for uncertainty. The metrology organizations, in particular ISO (Ref. 35) and DIN (Ref. 37) have done much work to bring some clarification in the terminology concerning measurement, measurement errors and measurement uncertainty. The result of this work has been adopted also by NIST (Ref. 38).
37.DIN Deutsches Institut für Normung, “Grunbegriffe der Messtechnick, Behandlung von Unsicherheiten bei der Auswertung von Messungen” (DIN 1319 Teile 1–4, Beuth Verlag GmbH, Berlin, Germany, 1985). Parts 3 and 4 have been re-edited after the ISO Guide (Ref. 35).
38.B. N. Taylor and C. E. Kuyatt, “Guidelines for evaluating and expressing uncertainty of NIST measurement results,” NIST Technical Note 1297, September 1994. URL: http://physics.nist.gov/Pubs/guidelines/outline.html.
39.G. D’Agostini, “Measurements errors and measurement uncertainty: Critical review and proposals for teaching,” Internal Report No. 1094, Department of Physics, University of Rome “La Sapienza,” May, 1998 (in Italian). A copy can be found at the author’s URL (Ref. 1).
40.For example, Gauss makes explicit use of the concepts of prior and posterior probability of hypotheses in his derivation of the Gaussian distribution (Ref. 41). He derives a formula equivalent to Bayes’ theorem valid for a priori equiprobable hypotheses (condition explicitly stated). Then, using some symmetry arguments, plus the condition that the final distribution is maximized when the true value of the quantity equals the arithmetic average of the measurements, he obtained the result that the mathematical function of the error distribution (playing the role of likelihood) is what we now name after him.
41.C. F. Gauss, Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium, 1809 (Werke 7, Gotha, F. A. Perthes, 1871), n.i 172–179, pp. 225–234.
42.Frequentist ideas began in the early 1900s (see, for example, Ref. 43, and references therein).
43.C. Howson and P. Urbach, Scientific Reasoning: The Bayesian Approach, 2nd ed. (Open Court, New York, 1993).
44.J. M. Bernardo and A. F. M. Smith, Bayesian Theory (Wiley, New York, 1994).
45.A. O’Hagan, Bayesian Inference, Vol. 2(B) in Kendall’s Advanced Theory of Statistics (Halsted, New York, 1994).
46.H. Jeffreys, Theory of Probability (Oxford U. P., Oxford, 1961).
47.E. T. Jaynes, “Clearing up mysteries: The original goal,” in Maximum Entropy and Bayesian Methods, edited by J. Skilling (Kluwer Academic, New York, 1989).
48.R. Scozzafava, “The role of probability in statistical physics,” Transport Theory and Statistical Physics (to be published);
48.R. Scozzafava, “A classical analog of the two-slit model of quantum probability,” Pure Math. Appl. Ser. C 2, 223–235(1991).
49.Maximum Entropy in Action, edited by B. Buck and V. A. Macaulay (Oxford U. P., Oxford, 1991).
50.From Statistical Physics to Statistical Inference and Back, edited by P. Grassberger and J. P. Nadal (Kluwer Academic, New York, 1994).
51.The International Society for Bayesian Analysis (ISBA), URL: http://www.bayesian.org/.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Article metrics loading...