1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
f
Do Concept Inventories Actually Measure Anything?
Rent:
Rent this article for
Access full text Article
/content/aas/journal/aer/9/1/10.3847/AER2010024
1.
1.Ackerman, T. A. , Gierl, M. J. , and Walker, C. M. 2003, “Using Multidimensional Item Response Theory to Evaluate Educational and Psychological Tests,” Educ. Meas.: Issues & Pract., 22, 3751.
http://dx.doi.org/10.1111/j.1745-3992.2003.tb00136.x
2.
2.Allen, K. 2007, “Getting More from Your Data: Application of Item Response Theory to the Statistics Concept Inventory,” in 2007 ASEE Annual Conference and Exposition Proceedings, Honolulu, HI.
3.
3.Andersen, E. B. 1977, “Sufficient Statistics and Latent Trait Models,” Psychometrika, 42, 6981.
http://dx.doi.org/10.1007/BF02293746
4.
4.Andrich, D. 2004, “Controversy and the Rasch Model: A Characteristic of Incompatible Paradigms?,” Med. Care, 42, I716.
http://dx.doi.org/10.1097/01.mlr.0000103528.48582.7c
5.
5.Bailey, J. M. 2007, “Development of a Concept Inventory to Assess Students’ Understanding and Reasoning Difficulties about the Properties and Formation of Stars,” Astron. Educ. Rev., 6, 1339.
http://dx.doi.org/10.3847/AER2007028
6.
6.Bailey, J. M. 2009, “Concept Inventories for ASTR0 101,” Phys. Teach., 47, 43941.
http://dx.doi.org/10.1119/1.3225503
7.
7.Baker, F. B. and Kim, S. 2004, Item Response Theory: Parameter Estimation Techniques, 2nd ed., New York: Dekker.
8.
8.Bao, L. 2006, “Theoretical Comparisons of Average Normalized Gain Calculations,” Am. J. Phys., 74, 91722.
http://dx.doi.org/10.1119/1.2213632
9.
9.Bardar, E. M. , Prather, E. E. , Brecher, K. , and Slater, T. F. 2007, “Development and Validation of the Light and Spectroscopy Concept Inventory,” Astron. Educ. Rev., 5, 10313.
http://dx.doi.org/10.3847/AER2006020
10.
10.Bejar, I. I. 1980, “A Procedure for Investigating the Unidimensionality of Achievement Tests Based on Item Parameter Estimates,” J. Educ. Meas., 17, 28396.
http://dx.doi.org/10.1111/j.1745-3984.1980.tb00832.x
11.
11.Bereiter, C. 1963, “Some Persisting Dilemmas in the Measurement of Change,” in Problems in Measuring Change, ed. C. W. Harris, Madison, WI: The University of Wisconsin Press, pp. 320.
12.
12.Borsboom, D. 2005, Measuring the Mind: Conceptual Issues in Contemporary Psychometrics, New York: Cambridge University Press.
http://dx.doi.org/10.1017/CBO9780511490026
13.
13.Borsboom, D. and Scholten, A. Z. 2008, “The Rasch Model and Conjoint Measurement Theory from the Perspective of Psychometrics,” Theory & Psych., 18, 1117.
http://dx.doi.org/10.1177/0959354307086925
14.
14.Briggs, D. C. and Wilson, M. 2003, “An Introduction to Multidimensional Measurement Using Rasch Models,” J. App. Meas., 4, 87100.
15.
15.Brogt, E. , Sabers, D. , Prather, E. E. , Deming, G. L. , Hufnagel, B. , and Slater, T. F. 2007, “Analysis of the Astronomy Diagnostic Test,” Astron. Educ. Rev., 6, 2542.
http://dx.doi.org/10.3847/AER2007003
16.
16.Crocker, L. and Algina, J. 1986, Introduction to Classical and Modern Test Theory, Orlando, FL: Harcourt Brace Jovanovitch.
17.
17.Cronbach, L. J. and Furby, L. 1970, “How We Should Measure ‘Change’—Or Should We?,” Psych. Bull., 74, 6880.
http://dx.doi.org/10.1037/h0029382
18.
18.Ding, L. and Beichner, R. 2009, “Approaches to Data Analysis of Multiple-Choice Questions,” Phys. Rev. ST: Phys. Educ. Res., 5, 020103.
http://dx.doi.org/10.1103/PhysRevSTPER.5.020103
19.
19.Ding, L. , Chabay, R. , Sherwood, B. , and Beichner, R. 2006, “Evaluating an Electricity and Magnetism Assessment Tool: Brief Electricity and Magnetism Assessment,” Phys. Rev. ST: Phys. Educ. Res., 2, 010105.
http://dx.doi.org/10.1103/PhysRevSTPER.2.010105
20.
20.Embretson, S. E. and Reise, S. P. 2000, Item Response Theory for Psychologists, Mahwah, NJ: Erlbaum.
21.
21.Fischer, G. H. 1995, “Derivations of the Rasch Model,” in Rasch Models: Foundations, Recent Developments, and Applications, eds. G. H. Fischer and I. W. Molenaar, New York, NY: Springer-Verlag, pp. 1538.
22.
22.George, D. and Mallery, P. 2009, SPSS for Windows Step by Step: A Simple Guide and Reference, Boston, MA: Pearson Education.
23.
23.Hake, R. R. 1998, “Interactive-Engagement Versus Traditional Methods: A Six-Thousand-Student Survey of Mechanics Test Data for Introductory Physics Courses,” Am. J. Phys., 66, 6474.
http://dx.doi.org/10.1119/1.18809
24.
24.Hambleton, R. K. and Jones, R. J. 1993, “Comparison of Classical Test Theory and Item Response Theory and Their Applications to Test Development,” Educ. Meas.: Issues & Pract., 12, 25362.
25.
25.Harris, D. 1989, “Comparison of 1-, 2-, and 3-Parameter IRT Models,” Educ. Meas.: Issues & Pract., 8, 15763.
26.
26.Herrmann-Abell, C. F. , DeBoer, G. E. , and Roseman, J. E. 2009, “Using Rasch Modeling to Analyze Standards-Based Assessment Items Aligned to Middle School Chemistry Ideas,” Poster presented at the DR-K12 PI Meeting (Washington, D.C.).
27.
27.Hestenes, D. , Wells, M. , and Swackhamer, G. 1992, “Force Concept Inventory,” Phys. Teach., 30, 14158.
http://dx.doi.org/10.1119/1.2343497
28.
28.Holland, P. W. 1990, “On the Sampling Theory Foundations of Item Response Theory Models,” Psychometrika, 55, 577601.
http://dx.doi.org/10.1007/BF02294609
29.
29.Karabastos, G. 2001, “The Rasch Model Additive Conjoint Measurement and New Models of Probabilistic Measurement Theory,” J. Appl. Meas., 2, 389423.
30.
30.Keller, J. M. 2006, “Development of a Concept Inventory Addressing Students’ Beliefs and Reasoning Difficulties Regarding the Greenhouse Effect,” Ph.D. thesis, University of Arizona.
31.
31.Kyngdon, A. 2008, “The Rasch Model from the Perspective of the Representational Theory of Measurement,” Theory & Psych., 18, 89109.
http://dx.doi.org/10.1177/0959354307086924
32.
32.Lee, Y. , Palazzo, D. J. , Warnakulasooriya, R. , and Pritchard, D. E. 2008, “Measuring Student Learning with Item Response Theory,” Phys. Rev. ST: Phys. Educ. Res., 4, 010102.
http://dx.doi.org/10.1103/PhysRevSTPER.4.010102
33.
33.Libarkin, J. C. and Anderson, S. W. 2005, “Assessment of Learning in Entry-level Geoscience Courses: Results from the Geoscience Concept Inventory,” J. Geo. Educ., 53, 394401.
34.
34.Linacre, J. M. 2005, “Measurement, Meaning and Morality,” MESA Research Memorandum No. 71. Available at: www.rasch.org/memo71.pdf.
35.
35.Lindell, R. S. 2001, “Enhancing College Students’ Understanding of Lunar Phases,” Ph.D. thesis, University of Nebraska.
36.
36.Lord, F. M. 1980, Applications of Item Response Theory to Practical Testing Problems, Hillsdale, NJ: Erlbaum.
37.
37.Lord, F. M. and Novick, M. R. 1968, Statistical Theories of Mental Test Scores, Reading, MA: Addison-Wesley.
38.
38.Luce, R. D. and Tukey, J. W. 1964, “Simultaneous Conjoint Measurement: A New Type of Fundamental Measurement,” J. Math. Psych., 1, 127.
http://dx.doi.org/10.1016/0022-2496(64)90015-X
39.
39.Maloney, D. P. , O’Kuma, T. L. , Hieggelke, C. J. , and van Heuvelen, A. 2001, “Surveying Students’ Conceptual Knowledge of Electricity and Magnetism,” Am. J. Phys., 69, S1223.
http://dx.doi.org/10.1119/1.1371296
40.
40.Marshall, J. A. , Hagedorn, E. A. , and O’Connor, J. 2009, “Anatomy of a Physics Test: Validation of the Physics Items on the Texas Assessment of Knowledge and Skills,” Phys. Rev. ST: Phys. Educ. Rev., 5, 010104.
http://dx.doi.org/10.1103/PhysRevSTPER.5.010104
41.
41.Marx, J. D. and Cummings, K. 2007, “Normalized Change,” Am. J. Phys., 75, 8791.
http://dx.doi.org/10.1119/1.2372468
42.
42.Masters, G. N. 1988, “Item Discrimination: When More Is Worse,” J. Educ. Meas., 25, 1529.
http://dx.doi.org/10.1111/j.1745-3984.1988.tb00288.x
43.
43.Masters, G. N. 2001, “The Key to Objective Measurement,” MESA Research Memorandum No. 70. Available at: www.rasch.org/memo70.pdf.
44.
44.Michell, J. 2008, “Conjoint Measurement and the Rasch Paradox: A Response to Kyngdon,” Theory & Psych., 18, 11924.
http://dx.doi.org/10.1177/0959354307086926
45.
45.Morris, G. A. , Branum-Martin, L. , Harshman, N. , Baker, S. D. , Mazur, E. , Dutta, S. , Mzoughi, T. , and McCauley, V. 2006, “Testing the Test: Item Response Curves and Test Quality,” Am. J. Phys., 74, 44953.
http://dx.doi.org/10.1119/1.2174053
46.
46.Narens, L. and Luce, R. D. 1986, “Measurement: The Theory of Numerical Assignments,” Psych. Bull., 99, 16680.
http://dx.doi.org/10.1037/0033-2909.99.2.166
47.
47.Orlando, M. and Thissen, D. 2000, “Likelihood-Based Item-Fit Indices for Dichotomous Item Response Theory Models,” App. Psych. Meas., 24, 5064.
http://dx.doi.org/10.1177/01466216000241003
48.
48.Pek, P. and Poh, K. 2000, “Framework of a Decision-Theoretic Tutoring System for Learning of Mechanics,” J. Sci. Educ. & Tech., 9, 34356.
http://dx.doi.org/10.1023/A:1009484526286
49.
49.Perline, R. , Wright, B. D. , and Wainer, H. 1979, “The Rasch Model as Additive Conjoint Measurement,” Appl. Psych. Meas., 3, 23755.
http://dx.doi.org/10.1177/014662167900300213
50.
50.Planinic, M. 2006, “The Rasch Model-Based Analysis of the Conceptual Survey of Electricity and Magnetism,” in Proceedings of GIREP Conference 2006: Modeling in Physics and Physics Education, eds. D. van den Berg and T. Ellermeijer, Amsterdam, NL: University of Amsterdam, pp. 133134.
51.
51.Planinic, M. , Ivanjek, L. , and Susac, A. 2010, “Rasch Model Based Analysis of the Force Concept Inventory,” Phys. Rev. ST: Phys. Educ. Res., 6, 010103.
http://dx.doi.org/10.1103/PhysRevSTPER.6.010103
52.
52.Prather, E. E. , Rudolph, A. L. , Brissenden, G. , and Schlingman, W. 2009, “A National Study Assessing the Teaching and Learning of Introductory Astronomy. Part I. The Effect of Interactive Instruction,” Am. J. Phys., 77, 32030.
http://dx.doi.org/10.1119/1.3065023
53.
53.Rasch, G. 1960, Probabilistic Models for Some Intelligence and Attainment Tests, Chicago, IL: University of Chicago Press.
54.
54.Rogosa, D. R. and Willett, J. B. 1983, “Demonstrating the Reliability of the Different Score in the Measurement of Change,” J. Educ. Meas., 20, 33543.
http://dx.doi.org/10.1111/j.1745-3984.1983.tb00211.x
55.
55.Rupp, A. A. and Zumbo, B. D. 2006, “Understanding Parameter Invariance in Unidimensional IRT Models,” Educ. & Psych. Meas., 66, 6384.
http://dx.doi.org/10.1177/0013164404273942
56.
56.Sadler, P. M. 1998, “Psychometric Models of Student Conceptions in Science: Reconciling Qualitative Studies and Distractor-Driven Assessment Instruments,” J. Res. Sci. Teach., 35, 26596.
http://dx.doi.org/10.1002/(SICI)1098-2736(199803)35:3<265::AID-TEA3>3.0.CO;2-P
57.
57.Sadler, P. M. , Coyle, H. , Miller, J. L. , Cook-Smith, N. , Dussault, M. , and Gould, R. R. 2010, “The Astronomy and Space Science Concept Inventory: Development and Validation of Assessment Instruments Aligned with the K-12 National Science Standards,” Astron. Educ. Rev., 8, 010111.
http://dx.doi.org/10.3847/AER2009024
58.
58.Stevens, S. S. 1946, “On the Theory of Scales of Measurement,” Science, 103, 67780.
http://dx.doi.org/10.1126/science.103.2684.677
59.
59.Thompson, B. 2003, “Understanding Reliability and Coefficient alpha, Really,” in Score Reliability, ed. B. Thompson, Thousand Oaks, CA: SAGE, pp. 330.
60.
60.Vogt, W. P. 2007, Quantitative Research Methods for Professionals, Boston, MA: Pearson Education.
61.
61.Wang, J. and Bao, L. 2010, “Analyzing Force Concept Inventory with Item Response Theory,” Am. J. Phys. 78, 10641070.
http://dx.doi.org/10.1119/1.3443565
62.
62.Whitely, S. E. and Dawis, R. V. 1974, “The Nature of Objectivity with the Rasch Model,” J. Educ. Meas., 11, 16378.
http://dx.doi.org/10.1111/j.1745-3984.1974.tb00988.x
63.
63.Wilson, M. 2005, Constructing Measures: An Item Response Modeling Approach, Mahwah, NJ: Erlbaum.
64.
64.Wright, B. D. 1997, “A History of Social Science Measurement,” Educ. Meas.: Issues & Pract., 16, 3345.
http://dx.doi.org/10.1111/j.1745-3992.1997.tb00606.x
65.
65.Wright, B. D. and Linacre, J. M. 1989, “Observations are Always Ordinal; Measurements, However, Must Be Interval,” Archiv. Phys. Med. & Rehab., 70, 85760.
66.
66.Wu, M. and Adams, R. J. 2010, “Properties of Rasch Residual Fit Statistics,” J. Appl. Meas., (in press).
67.
67.Yen, W. M. 1984, “Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model,” Appl. Psych. Meas., 8, 12545.
http://dx.doi.org/10.1177/014662168400800201
68.
68.Yen, W. M. and Fitzpatrick, A. R. 2006, “Item Response Theory,” in Educational Measurement, 4th Ed., ed. R. Brennan, Westport, CT: American Council on Education/Praeger, pp. 111153.
69.
69.Zimowski, M. F. , Muraki, E. , Mislevy, R. J. , and Bock, R. D. 1996, BILOG-MG: Multiple-Group IRT Analysis and Test Maintenance for Binary Items, Chicago, IL: Scientific Software.
http://aip.metastore.ingenta.com/content/aas/journal/aer/9/1/10.3847/AER2010024
Loading

Figures

Image of Figure 1.

Click to view

Figure 1.

The Rasch model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)

Image of Figure 2.

Click to view

Figure 2.

The 2PL model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)

Image of Figure 3.

Click to view

Figure 3.

The 3PL model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)

Image of Figure 4.

Click to view

Figure 4.

A comparison of the difficulty parameters for each item for the Rasch (blue double triangles), 2PL (red diamonds), and 3PL models (green triangles). Error bars represent standard errors

Image of Figure 5.

Click to view

Figure 5.

A comparison of the discrimination parameters for each item for the 2PL (red diamonds) and 3PL models (green triangles). Error bars represent standard errors

Image of Figure 6.

Click to view

Figure 6.

A comparison of the guessing parameters for each item for the 3PL model. Error bars represent standard errors

Image of Figure 7.

Click to view

Figure 7.

Rasch model (upper left), 2PL (upper right), and 3PL (bottom) ability estimates as a function of the percent correct on the SPCI both pre- (blue squares) and post-instruction (red diamonds). Error bars represent the standard errors. The error bars are suppressed in the 2PL and 3PL graphs for clarity

Image of Figure 8.

Click to view

Figure 8.

The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the Rasch model

Image of Figure 9.

Click to view

Figure 9.

The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the 2PL model

Image of Figure 10.

Click to view

Figure 10.

The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the 3PL model

Image of Figure 11.

Click to view

Figure 11.

The Wright map for the Rasch model ability and item difficult estimates for the SPCI. Each bin is 0.2 logits wide and is labeled by the upper value of the bin (e.g., bin 1 includes logit values between 0.8 and 1). Pre-instruction ability estimates are shown in grey while post-instruction abilities are shown in white

Image of Figure 12.

Click to view

Figure 12.

IRT calculated gains as a function of Hake’s normalized gain. The upper left panel shows Rasch model gains, the upper right panel shows 2PL gains, and the bottom panel shows 3PL gains

Image of Figure 13.

Click to view

Figure 13.

The 2PL fit plot for item 23. The curve is the 2PL ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC

Image of Figure 14.

Click to view

Figure 14.

The Rasch model fit plot for item 7. The curve is the Rasch model ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC

Image of Figure 15.

Click to view

Figure 15.

The 3PL fit plot for item 5. The curve is the 3PL ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC

Image of Figure 16.

Click to view

Figure 16.

The Rasch model difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie

Image of Figure 17.

Click to view

Figure 17.

The 2PL difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie

Image of Figure 18.

Click to view

Figure 18.

The 3PL difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie

Tables

Generic image for table

Click to view

Table 1.

CTT statistics for the SPCI, calculated using students’ pre-instructional responses only, then using post-instructional responses only. Items are presented in order of ascending -value within each group

Generic image for table

Click to view

Table 2.

A table of proportions, such as the one shown here, can motivate parameter invariance. Each item receives its own row, and each total score receives its own column. Each cell represents the proportion of respondents with that total score who correctly answer that item. These proportions are used as estimates of the probability that a respondent with a given score will correctly answer a given item

Generic image for table

Click to view

Table 3.

Rasch model item parameters for the SPCI. SE stands for standard error

Generic image for table

Click to view

Table 4.

2PL model item parameters for the SPCI. SE stands for standard error

Generic image for table

Click to view

Table 5.

3PL model item parameters for the SPCI. SE stands for standard error

Generic image for table

Click to view

Table 6.

-values for the Rasch, 2PL, and 3PL models. All -values are bolded and italicized

Generic image for table

Click to view

Table 7.

Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the Rasch model. Cells with values are highlighted

Generic image for table

Click to view

Table 8.

Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the 2PL model. Cells with values are highlighted

Generic image for table

Click to view

Table 9.

Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the 3PL model. Cells with values are highlighted

Abstract

Although concept inventories are among the most frequently used tools in the physics and astronomy education communities, they are rarely evaluated using item response theory (IRT). When IRT models fit the data, they offer sample-independent estimates of item and person parameters. IRT may also provide a way to measure students’ learning gains that circumvents some known issues with Hake’s normalized gain. In this paper, we review the essentials of IRT while simultaneously applying it to the Star Properties Concept Inventory. We also use IRT to explore an important psychometrics debate that has received too little attention from physics and astronomy educationresearchers: What do we mean when we say we “measure” a mental process? This question leads us to use IRT to address the provocative question that constitutes the title of this paper: Do concept inventories actually measure anything?

Loading

Full text loading...

/deliver/fulltext/aas/journal/aer/9/1/1.3499285.html;jsessionid=1359878npks0p.x-aip-live-01?itemId=/content/aas/journal/aer/9/1/10.3847/AER2010024&mimeType=html&fmt=ahah&containerItemId=content/aas/journal/aer
true
true
This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Do Concept Inventories Actually Measure Anything?
http://aip.metastore.ingenta.com/content/aas/journal/aer/9/1/10.3847/AER2010024
10.3847/AER2010024
SEARCH_EXPAND_ITEM