Psychometric Defined by Research

Psychometric Defined by Research

Goals of This Session • Brief wrap up of brown bags • Psychometrics defined through research • Broad historical perspective • Research framework • The parallel universe concept • Some current research here at Measured Progress • Some concluding remarks

Psychometric Brown Bags • All these brown bags have been introductory in nature • Eventually, these will be posted on our company website • Staff members • Clients • Teachers & parents

Psychometric Brown Bags • We have covered a lot of ground • Statistics, classical test theory, item response theory • Standard setting, equating, adaptive testing, DIF, skills diagnosis

Psychometric Brown Bags • If you found these talks interesting let us know • Because of the introductory nature of our presentations – there’s lots more we could present on

We can really define psychometrics from a variety of perspectives • Historical • Assessment program • Analyzing data here at Measured Progress • Research

Historical Perspective • The history of psychometrics has deep roots in the cross roads of psychology, physiology, and philosophy • Ultimately these disciplines are trying to better understand the human experience • Psychometrics does this by quantifying behavioral observations

Historical Perspective • Early psychometricians focused primarily on the quantification of intelligence • Psychometricians have also worked extensively on the application of psychometric models to assess patients within a clinical setting

Historical Perspective • Psychometrics is ultimately a very broad discipline • Psychometrics is an example of blending of the social sciences with the quantitative sciences • Sociometrics • Econometrics

Research in Psychometrics • Because of the psychometrics is a broad discipline there are many national and international research organizations and societies • This results in many • Peer reviewed journals • Conferences • Opportunities for research

Research Societies • American Educational Research Association • National Council on Measurement in Education • Psychometric Society • American Psychological Association • International Testing Commission • Society for Industrial/Organizational Psychology

Psychometricians at Other Organizations • Again, because of our broad discipline psychometricians work in a variety of places: • American Institute of Certified Public Accountants • National Board of Medical Examiners • Law School Admissions Council • The Rand Corporation • Research Triangle Institute

Research at Other Organizations • Research Agendas • This approach tends to be an laundry list approach of ideas that are not well connected • Products and Services • This is a narrowly focused method with a specific goal • Both these approaches are not resource friendly and will lead to research programs that are not well orchestrated

Psychometric Research at Measured Progress • We wanted to come up with a different way of organizing and conducting research • Our approach is an attempt at: • Connecting research projects in meaningful ways • Allowing for product based research to be done in a cost effective manner • Connecting research with products

Psychometric Research at Measured Progress • This approach also allows for external opportunities • Interns • Through other research institutes • Center for Assessment • Center for Advanced Studies in Measurement and Assessment • Center for Educational Research and Evaluation • The Research & Evaluation Methods Program • Visiting Scholars • Clients

Research Framework • Because all assessment programs have some common structure, any research project should fit somewhere in that structure. • Most research projects relate to more than one area. Still, a framework with separately delineated areas is helpful for organizing and discussing such research.

Research Framework • Design and Modeling • Statistical Analyses • Scoring and Reporting

Design and Modeling • Included in this category is research having to do with modeling the students, the assessment tasks, the interaction of the students with the tasks, or test-centered research • The focus is on the design or modeling of the test as a whole

Design and Modeling • Task modeling • Student modeling • Modeling Student-Task interaction • Test-centered modeling research • Test design • Test assembly

Statistical Analyses • Focus is on statistics used to evaluate the individual assessment tasks, and the overall assessment instrument with respect to the psychometric model applied to the test data. • This includes research on the calibration of psychometric models, model fit analyses, estimation of reliability, and validity analyses.

Statistical Analyses • Calibration and ability estimation • Interpretation of estimated parameters • item parameters and ability distribution • Model fit • Reliability and Generalizability • Validity • internal and external

Scoring and Reporting • Here the focus is on how best to score assessment tasks and the assessment instrument as a whole. • This includes how to transform the observed scores and ability estimates from the psychometric model into useful and interpretable score reports

Scoring and Reporting • Observed scores, scaled scores, & IRT ability • Equating • Linking • Standard setting • Score Reports and Interpretive Guides

The Parallel Universe Concept

Parallel Universe • A research project will certainly fit somewhere in the Framework – it’s helpful for organizing different research projects. • But can the converse be true? Could the Framework fit into a research project? Could the Framework help organize the research project?

Parallel Universe • Sometimes a research projectis better characterized as a research program. Research Program: A set of research projects organized around a common theme and intended to address most or all of the components listed in the Framework. • There are also other parallel universes besides research programs! For example, any one of our testing programs!

Parallel Universe ExampleSkills Diagnosis Research Program • Design and Modeling • How does one design a test specifically for diagnostic purposes? What’s the psychometric model? Content specifications? IRT specs? • Statistical Analyses • Need new estimation methods for new models. Also new fit statistics. How do we estimate reliability? Internal validity stats? • Scoring and Reporting • How to report diagnostic scores?

Parallel Universe ExampleA State Testing Program • Design and Modeling • Which psychometric model will be used? • How many items? How many subscores? • Statistical Analyses • What calibration software is used and in what way? • What kind of supporting statistical analyses will be done? DIF? Dimensionality? Validity? Reliability? • Scoring and Reporting • Design of the score report. • Statistics to be reported. • Interpretation of the scores

Current Research • Here’s 8 of the 17 papers we’re presenting at the 2007 AERA/NCME Meeting in Chicago, how each fits in the Framework, and its possible relevance to real life.

Conditional item exposure in multidimensional adaptive testing. • Researchers: Matt Finkelman, Michael Nering, & Louis Roussos. • Framework: Design and Modeling • Modeling the item selection algorithm so as to prevent items from being over-exposed. • Application: CAT is desired to be used with multidimensional IRT, but current exposure control techniques won’t work.

Generalized Mathematical Formulation for Computing Inter-Rater Inconsistency for BOW, Bookmark, and Yes/No methods • Researchers: Abdullah Ferdous & Barbara Plake (Univ. of Nebraska) • Framework: Statistical Analyses and Scoring and Reporting • Standard setting raters are part of the scoring procedure • Provide statistical support for internal validity • Application: Can be used as part of the standard setting process to improve the quality of the ratings.

Use of Subset of Test Items in Bookmark Standard Setting • Researchers: Abdullah Ferdous • Framework: Scoring and Reporting • Standard setting is part of the scoring procedure • Application: Can be used to streamline the Bookmark standard setting procedure, saving money and time, and perhaps increasing reliability by reducing fatigue.

Using the DFIT framework to evaluate equating items • Researchers: Michael Nering, & Wonsuk Kim. • Framework: Statistical Analyses • Support the internal validity of the equating items. • Application: A new method that may be more sensitive to ill-suited equating items than the current method that is used.

Using Person Fit in a Body of Work Standard Setting • Researchers: Matt Finkelman & Wonsuk Kim. • Framework: Statistical Analyses (major) and Scoring and Reporting (minor) • Statistical support for selecting students to be used in the body of work standard setting method. • Application: A new method for detecting aberrant students who should be excluded from the BOW standard setting.

Development and evaluation of an effect-size measure for the DIMTEST statistic • Researchers: Minhee Seo (U. of Ill.) & Louis Roussos • Framework: Statistical Analyses • DIMTEST assesses test unidimensionality, giving statistical support for test internal validity. • Application: Testing programs want us to check dimensionality. DIMTEST is a reliable hypothesis test. An effect size measure much improves the interpretation of DIMTEST results.

Variations of Body of Work. • Researchers: Kevin Sweeney and Abdullah Ferdous. • Framework: Scoring and Reporting. • Body or Work is a standard setting method, which, of course, determines cut scores for a test. • Application: To improve the efficiency of BOW by reducing the time and work activities in preparing for and conducting the standard setting.

Detection of compromised items in personnel selection examination • Researchers: Yongwei Yang (Gallup), Abdullah Ferdous, & Katherine Chin (U. of Neb). • Framework: Statistical Analyses—Validity. • Old method looked at change in p-value over time; new improved method does this conditional on ability. • Application: Improves the efficacy of personnel selection by getting rid of compromised items. Can also be used to improve item bank for appropriate assessment programs (like CAT).

Concluding Remarks • In Educational Assessment, psychometric research and practice are interdependent. • Good communication b/t research and practice is essential for the efficacy of both. • Our research ideas come directly from questions and problems that arise in practice • Our Research Framework helps give structure and completeness to both our research and practice.

Psychometric Defined by Research