380 likes | 544 Vues
Psychometric Defined by Research. Goals of This Session. Brief wrap up of brown bags Psychometrics defined through research Broad historical perspective Research framework The parallel universe concept Some current research here at Measured Progress Some concluding remarks.
E N D
Goals of This Session • Brief wrap up of brown bags • Psychometrics defined through research • Broad historical perspective • Research framework • The parallel universe concept • Some current research here at Measured Progress • Some concluding remarks
Psychometric Brown Bags • All these brown bags have been introductory in nature • Eventually, these will be posted on our company website • Staff members • Clients • Teachers & parents
Psychometric Brown Bags • We have covered a lot of ground • Statistics, classical test theory, item response theory • Standard setting, equating, adaptive testing, DIF, skills diagnosis
Psychometric Brown Bags • If you found these talks interesting let us know • Because of the introductory nature of our presentations – there’s lots more we could present on
We can really define psychometrics from a variety of perspectives • Historical • Assessment program • Analyzing data here at Measured Progress • Research
Historical Perspective • The history of psychometrics has deep roots in the cross roads of psychology, physiology, and philosophy • Ultimately these disciplines are trying to better understand the human experience • Psychometrics does this by quantifying behavioral observations
Historical Perspective • Early psychometricians focused primarily on the quantification of intelligence • Psychometricians have also worked extensively on the application of psychometric models to assess patients within a clinical setting
Historical Perspective • Psychometrics is ultimately a very broad discipline • Psychometrics is an example of blending of the social sciences with the quantitative sciences • Sociometrics • Econometrics
Research in Psychometrics • Because of the psychometrics is a broad discipline there are many national and international research organizations and societies • This results in many • Peer reviewed journals • Conferences • Opportunities for research
Research Societies • American Educational Research Association • National Council on Measurement in Education • Psychometric Society • American Psychological Association • International Testing Commission • Society for Industrial/Organizational Psychology
Psychometricians at Other Organizations • Again, because of our broad discipline psychometricians work in a variety of places: • American Institute of Certified Public Accountants • National Board of Medical Examiners • Law School Admissions Council • The Rand Corporation • Research Triangle Institute
Research at Other Organizations • Research Agendas • This approach tends to be an laundry list approach of ideas that are not well connected • Products and Services • This is a narrowly focused method with a specific goal • Both these approaches are not resource friendly and will lead to research programs that are not well orchestrated
Psychometric Research at Measured Progress • We wanted to come up with a different way of organizing and conducting research • Our approach is an attempt at: • Connecting research projects in meaningful ways • Allowing for product based research to be done in a cost effective manner • Connecting research with products
Psychometric Research at Measured Progress • This approach also allows for external opportunities • Interns • Through other research institutes • Center for Assessment • Center for Advanced Studies in Measurement and Assessment • Center for Educational Research and Evaluation • The Research & Evaluation Methods Program • Visiting Scholars • Clients
Research Framework • Because all assessment programs have some common structure, any research project should fit somewhere in that structure. • Most research projects relate to more than one area. Still, a framework with separately delineated areas is helpful for organizing and discussing such research.
Research Framework • Design and Modeling • Statistical Analyses • Scoring and Reporting
Design and Modeling • Included in this category is research having to do with modeling the students, the assessment tasks, the interaction of the students with the tasks, or test-centered research • The focus is on the design or modeling of the test as a whole
Design and Modeling • Task modeling • Student modeling • Modeling Student-Task interaction • Test-centered modeling research • Test design • Test assembly
Statistical Analyses • Focus is on statistics used to evaluate the individual assessment tasks, and the overall assessment instrument with respect to the psychometric model applied to the test data. • This includes research on the calibration of psychometric models, model fit analyses, estimation of reliability, and validity analyses.
Statistical Analyses • Calibration and ability estimation • Interpretation of estimated parameters • item parameters and ability distribution • Model fit • Reliability and Generalizability • Validity • internal and external
Scoring and Reporting • Here the focus is on how best to score assessment tasks and the assessment instrument as a whole. • This includes how to transform the observed scores and ability estimates from the psychometric model into useful and interpretable score reports
Scoring and Reporting • Observed scores, scaled scores, & IRT ability • Equating • Linking • Standard setting • Score Reports and Interpretive Guides
Parallel Universe • A research project will certainly fit somewhere in the Framework – it’s helpful for organizing different research projects. • But can the converse be true? Could the Framework fit into a research project? Could the Framework help organize the research project?
Parallel Universe • Sometimes a research projectis better characterized as a research program. Research Program: A set of research projects organized around a common theme and intended to address most or all of the components listed in the Framework. • There are also other parallel universes besides research programs! For example, any one of our testing programs!
Parallel Universe ExampleSkills Diagnosis Research Program • Design and Modeling • How does one design a test specifically for diagnostic purposes? What’s the psychometric model? Content specifications? IRT specs? • Statistical Analyses • Need new estimation methods for new models. Also new fit statistics. How do we estimate reliability? Internal validity stats? • Scoring and Reporting • How to report diagnostic scores?
Parallel Universe ExampleA State Testing Program • Design and Modeling • Which psychometric model will be used? • How many items? How many subscores? • Statistical Analyses • What calibration software is used and in what way? • What kind of supporting statistical analyses will be done? DIF? Dimensionality? Validity? Reliability? • Scoring and Reporting • Design of the score report. • Statistics to be reported. • Interpretation of the scores
Current Research • Here’s 8 of the 17 papers we’re presenting at the 2007 AERA/NCME Meeting in Chicago, how each fits in the Framework, and its possible relevance to real life.
Conditional item exposure in multidimensional adaptive testing. • Researchers: Matt Finkelman, Michael Nering, & Louis Roussos. • Framework: Design and Modeling • Modeling the item selection algorithm so as to prevent items from being over-exposed. • Application: CAT is desired to be used with multidimensional IRT, but current exposure control techniques won’t work.
Generalized Mathematical Formulation for Computing Inter-Rater Inconsistency for BOW, Bookmark, and Yes/No methods • Researchers: Abdullah Ferdous & Barbara Plake (Univ. of Nebraska) • Framework: Statistical Analyses and Scoring and Reporting • Standard setting raters are part of the scoring procedure • Provide statistical support for internal validity • Application: Can be used as part of the standard setting process to improve the quality of the ratings.
Use of Subset of Test Items in Bookmark Standard Setting • Researchers: Abdullah Ferdous • Framework: Scoring and Reporting • Standard setting is part of the scoring procedure • Application: Can be used to streamline the Bookmark standard setting procedure, saving money and time, and perhaps increasing reliability by reducing fatigue.
Using the DFIT framework to evaluate equating items • Researchers: Michael Nering, & Wonsuk Kim. • Framework: Statistical Analyses • Support the internal validity of the equating items. • Application: A new method that may be more sensitive to ill-suited equating items than the current method that is used.
Using Person Fit in a Body of Work Standard Setting • Researchers: Matt Finkelman & Wonsuk Kim. • Framework: Statistical Analyses (major) and Scoring and Reporting (minor) • Statistical support for selecting students to be used in the body of work standard setting method. • Application: A new method for detecting aberrant students who should be excluded from the BOW standard setting.
Development and evaluation of an effect-size measure for the DIMTEST statistic • Researchers: Minhee Seo (U. of Ill.) & Louis Roussos • Framework: Statistical Analyses • DIMTEST assesses test unidimensionality, giving statistical support for test internal validity. • Application: Testing programs want us to check dimensionality. DIMTEST is a reliable hypothesis test. An effect size measure much improves the interpretation of DIMTEST results.
Variations of Body of Work. • Researchers: Kevin Sweeney and Abdullah Ferdous. • Framework: Scoring and Reporting. • Body or Work is a standard setting method, which, of course, determines cut scores for a test. • Application: To improve the efficiency of BOW by reducing the time and work activities in preparing for and conducting the standard setting.
Detection of compromised items in personnel selection examination • Researchers: Yongwei Yang (Gallup), Abdullah Ferdous, & Katherine Chin (U. of Neb). • Framework: Statistical Analyses—Validity. • Old method looked at change in p-value over time; new improved method does this conditional on ability. • Application: Improves the efficacy of personnel selection by getting rid of compromised items. Can also be used to improve item bank for appropriate assessment programs (like CAT).
Concluding Remarks • In Educational Assessment, psychometric research and practice are interdependent. • Good communication b/t research and practice is essential for the efficacy of both. • Our research ideas come directly from questions and problems that arise in practice • Our Research Framework helps give structure and completeness to both our research and practice.