
Download
XML
0.00MB

Read Online
HTML
0.00MB

Download
PDF
0.00MB
Abstract
Although concept inventories are among the most frequently used tools in the physics and astronomy education communities, they are rarely evaluated using item response theory (IRT). When IRT models fit the data, they offer sampleindependent estimates of item and person parameters. IRT may also provide a way to measure students’ learning gains that circumvents some known issues with Hake’s normalized gain. In this paper, we review the essentials of IRT while simultaneously applying it to the Star Properties Concept Inventory. We also use IRT to explore an important psychometrics debate that has received too little attention from physics and astronomy educationresearchers: What do we mean when we say we “measure” a mental process? This question leads us to use IRT to address the provocative question that constitutes the title of this paper: Do concept inventories actually measure anything?
We owe a special thanks to Derek Briggs, from whom we learned IRT and who drew our attention to many of the fundamental issues in theories of measurement. This article would not have been possible without his support and guidance. Ed Prather and Doug Duncan also contributed helpful comments and feedback. This material is based in part upon work supported by the National Science Foundation under Grant Nos. 0833364 and 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
1. INTRODUCTION
2. ISSUES WITH TRADITIONAL CONCEPT TEST ANALYSES
2.1. Issues with CTT
2.2. Issues with Learning Gains
3. IRT BASICS
3.1. The Rasch Model
3.2. The Two Parameter Logistic Model
3.3. The Three Parameter Logistic Model
3.4. Assumptions of IRT
3.5. Parameter Invariance
3.6. Estimating IRT Parameters
4. IRT ANALYSIS OF THE SPCI
5. LEARNING GAINS IN IRT
6. ASSESSING MODEL FIT
7. TESTING THE ASSUMPTIONS OF IRT
8. COMPARING THE RASCH, 2PL, AND 3PL MODELS
9. SUMMARY AND CONCLUSIONS
Key Topics
 Researchers
 18.0
 Stellar structure and properties
 10.0
 Stellar spectral lines
 9.0
 Testing procedures
 8.0
 Numerical modeling
 6.0
Figures
The Rasch model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)
Click to view
The Rasch model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)
The 2PL model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)
Click to view
The 2PL model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)
The 3PL model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)
Click to view
The 3PL model ICCs for items 16 (solid green curve) and 17 (dotted blue curve)
A comparison of the difficulty parameters for each item for the Rasch (blue double triangles), 2PL (red diamonds), and 3PL models (green triangles). Error bars represent standard errors
Click to view
A comparison of the difficulty parameters for each item for the Rasch (blue double triangles), 2PL (red diamonds), and 3PL models (green triangles). Error bars represent standard errors
A comparison of the discrimination parameters for each item for the 2PL (red diamonds) and 3PL models (green triangles). Error bars represent standard errors
Click to view
A comparison of the discrimination parameters for each item for the 2PL (red diamonds) and 3PL models (green triangles). Error bars represent standard errors
A comparison of the guessing parameters for each item for the 3PL model. Error bars represent standard errors
Click to view
A comparison of the guessing parameters for each item for the 3PL model. Error bars represent standard errors
Rasch model (upper left), 2PL (upper right), and 3PL (bottom) ability estimates as a function of the percent correct on the SPCI both pre (blue squares) and postinstruction (red diamonds). Error bars represent the standard errors. The error bars are suppressed in the 2PL and 3PL graphs for clarity
Click to view
Rasch model (upper left), 2PL (upper right), and 3PL (bottom) ability estimates as a function of the percent correct on the SPCI both pre (blue squares) and postinstruction (red diamonds). Error bars represent the standard errors. The error bars are suppressed in the 2PL and 3PL graphs for clarity
The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the Rasch model
Click to view
The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the Rasch model
The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the 2PL model
Click to view
The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the 2PL model
The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the 3PL model
Click to view
The standard error (dotted red curve) and test information (solid blue curve) as a function of ability for the 3PL model
The Wright map for the Rasch model ability and item difficult estimates for the SPCI. Each bin is 0.2 logits wide and is labeled by the upper value of the bin (e.g., bin 1 includes logit values between 0.8 and 1). Preinstruction ability estimates are shown in grey while postinstruction abilities are shown in white
Click to view
The Wright map for the Rasch model ability and item difficult estimates for the SPCI. Each bin is 0.2 logits wide and is labeled by the upper value of the bin (e.g., bin 1 includes logit values between 0.8 and 1). Preinstruction ability estimates are shown in grey while postinstruction abilities are shown in white
IRT calculated gains as a function of Hake’s normalized gain. The upper left panel shows Rasch model gains, the upper right panel shows 2PL gains, and the bottom panel shows 3PL gains
Click to view
IRT calculated gains as a function of Hake’s normalized gain. The upper left panel shows Rasch model gains, the upper right panel shows 2PL gains, and the bottom panel shows 3PL gains
The 2PL fit plot for item 23. The curve is the 2PL ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC
Click to view
The 2PL fit plot for item 23. The curve is the 2PL ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC
The Rasch model fit plot for item 7. The curve is the Rasch model ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC
Click to view
The Rasch model fit plot for item 7. The curve is the Rasch model ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC
The 3PL fit plot for item 5. The curve is the 3PL ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC
Click to view
The 3PL fit plot for item 5. The curve is the 3PL ICC. Each dot represents the proportion of respondents in an ability bin who gave the correct answer. The error bars represent the standard error of the ICC
The Rasch model difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie
Click to view
The Rasch model difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie
The 2PL difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie
Click to view
The 2PL difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie
The 3PL difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie
Click to view
The 3PL difficulties of the 13 star properties items estimated without the other ten items versus the difficulties estimated with the other ten items. Error bars represent standard errors. The solid line is where the points should lie if unidimensionality holds, while the dashed line represents the line on which the points actually lie
Tables
CTT statistics for the SPCI, calculated using students’ preinstructional responses only, then using postinstructional responses only. Items are presented in order of ascending value within each group
Click to view
CTT statistics for the SPCI, calculated using students’ preinstructional responses only, then using postinstructional responses only. Items are presented in order of ascending value within each group
A table of proportions, such as the one shown here, can motivate parameter invariance. Each item receives its own row, and each total score receives its own column. Each cell represents the proportion of respondents with that total score who correctly answer that item. These proportions are used as estimates of the probability that a respondent with a given score will correctly answer a given item
Click to view
A table of proportions, such as the one shown here, can motivate parameter invariance. Each item receives its own row, and each total score receives its own column. Each cell represents the proportion of respondents with that total score who correctly answer that item. These proportions are used as estimates of the probability that a respondent with a given score will correctly answer a given item
Rasch model item parameters for the SPCI. SE stands for standard error
Click to view
Rasch model item parameters for the SPCI. SE stands for standard error
2PL model item parameters for the SPCI. SE stands for standard error
Click to view
2PL model item parameters for the SPCI. SE stands for standard error
3PL model item parameters for the SPCI. SE stands for standard error
Click to view
3PL model item parameters for the SPCI. SE stands for standard error
values for the Rasch, 2PL, and 3PL models. All values are bolded and italicized
Click to view
values for the Rasch, 2PL, and 3PL models. All values are bolded and italicized
Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the Rasch model. Cells with values are highlighted
Click to view
Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the Rasch model. Cells with values are highlighted
Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the 2PL model. Cells with values are highlighted
Click to view
Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the 2PL model. Cells with values are highlighted
Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the 3PL model. Cells with values are highlighted
Click to view
Yen’s Q3 statistic for each pair of items. Item parameters are estimated from the 3PL model. Cells with values are highlighted
Abstract
Although concept inventories are among the most frequently used tools in the physics and astronomy education communities, they are rarely evaluated using item response theory (IRT). When IRT models fit the data, they offer sampleindependent estimates of item and person parameters. IRT may also provide a way to measure students’ learning gains that circumvents some known issues with Hake’s normalized gain. In this paper, we review the essentials of IRT while simultaneously applying it to the Star Properties Concept Inventory. We also use IRT to explore an important psychometrics debate that has received too little attention from physics and astronomy educationresearchers: What do we mean when we say we “measure” a mental process? This question leads us to use IRT to address the provocative question that constitutes the title of this paper: Do concept inventories actually measure anything?
Full text loading...
Commenting has been disabled for this content