1 / 37

Critiquing Research in Educational Technology

. Housekeeping. . The Big Question (impact of tech on learning)Ed tech research (good, bad, usefully bad)Research methods, paradigms. Conceptual work for tonight. . . . The Big Question (impact of tech on learning)Ed tech research (good, bad, usefully bad)Research methods, paradigms. Concept

stamos
Télécharger la présentation

Critiquing Research in Educational Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Critiquing Research in Educational Technology Thomas C. Hammond TLT 470 Summer, 2008 Session 6

    2. Housekeeping

    3. The Big Question (impact of tech on learning) Ed tech research (good, bad, usefully bad) Research methods, paradigms Conceptual work for tonight Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)

    4. The Big Question (impact of tech on learning) Ed tech research (good, bad, usefully bad) Research methods, paradigms Conceptual work for tonight Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)Last session focused on overview of research methods, using two studies. Tonight we focus on critiquing the field of ed tech, again referring to those two studies, plus new reading for today (Boster, 2006) and other material (my dissertation research)

    5. Constructs, variables, research questions utos, UTOS, *UTOS Lab vs. field vs. manipulated field Experiment vs. non-experiment vs. quasi-experiment Varying treatment over groups and time Interpretation: Internal and external validity Medical model of research vs. patterns in ed research Snapshot review ? Boster, 2006 Lets quickly review this using the Boster, 2006 pieceLets quickly review this using the Boster, 2006 piece

    6. What we missed: Measures / Observations Classic errors / critiques State of the field in ed tech: Crisis? Today Again, Im not putting myself above any of this I make the exact same mistakes, and this stuff isnt easy. Im just hoping to build up (1) your ability to read research critically (through being aware of its limitations) and (2) your level of insight when talking about or using tech for instruction.Again, Im not putting myself above any of this I make the exact same mistakes, and this stuff isnt easy. Im just hoping to build up (1) your ability to read research critically (through being aware of its limitations) and (2) your level of insight when talking about or using tech for instruction.

    7. Observations = Qualitative? Interview Document analysis Measures = Quantitative? Test Survey Observations / Measures These are actually equivalent, but Ill spread them out to make a point about methodsThese are actually equivalent, but Ill spread them out to make a point about methods

    8. Why are these so critical? How do we know if theyre any good? What effects are they able to discern? Observations / Measures

    9. Pre vs. post Tests of statistical significance Measures of effect size Suggestion of practical significance Measuring effects

    10. Curriculum-focused tests vulnerable to Ceiling effects Low coefficients of reliability Diffusion of construct validity? Researcher-designed tests vulnerable to Lack of alignment to curriculum = teachers unhappy! Demoralization / motivation challenges from students Measures of student learning as a special problem

    11. Finding the right mix of quant and qual Quant observes an effect Measure students and teachers, run descriptive stats Run inferential stats: Which differences are significant? Qual lets you discuss why it took place Context of curriculum & instruction, teacher behaviors Context of student work Measures of student learning as a special problem

    12. Mis-use of statistical tests Critiques / classic errors: Quant Analysis of 142 original articles appearing in AHAs Circulation during 1975 (excluding certain types of studies) Chart from Glantz, S.A. (1980). Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation, 61, 1-7. (p. 2) Sampled all articles appearing in 1975 issues of the journal, excluded non-original pieces and some specialities (radiology, clincopathologic correlations, and case reports) Sample error: Using t-test to compare more than one group Point is: t-test is popular, but more often mis-used than not, at least in the sample. However, author offers lots of other studies that support same basic idea: In medical research, stats mis-applied as often as not. Analysis of 142 original articles appearing in AHAs Circulation during 1975 (excluding certain types of studies) Chart from Glantz, S.A. (1980). Biostatistics: how to detect, correct and prevent errors in the medical literature. Circulation, 61, 1-7. (p. 2) Sampled all articles appearing in 1975 issues of the journal, excluded non-original pieces and some specialities (radiology, clincopathologic correlations, and case reports) Sample error: Using t-test to compare more than one group Point is: t-test is popular, but more often mis-used than not, at least in the sample. However, author offers lots of other studies that support same basic idea: In medical research, stats mis-applied as often as not.

    13. Mis-use of statistical tests Failure to observe nested effects Critiques / classic errors: Quant This is a key aspect of doing large-scale education studies: Any student effect is nested in a classroom / school / district / state. Cant just aggregate all the data.This is a key aspect of doing large-scale education studies: Any student effect is nested in a classroom / school / district / state. Cant just aggregate all the data.

    14. Mis-use of statistical tests Failure to observe nested effects Inattention to effect size / practical significance Critiques / classic errors: Quant Turning to Kingsley, 2005 or, rather, the write-up provided by the vendorTurning to Kingsley, 2005 or, rather, the write-up provided by the vendor

    15. Failure to examine sub-groups Prior knowledge Tracking LEP SES / at-risk / under-served students Critiques / classic errors: Design

    16. Failure to examine sub-groups Effects over long-term vs. short-term: Lee & Molebash, 2004 G1 = Google G2 = Archive G3 = selected documents and scaffold Critiques / classic errors: Design

    17. Failure to examine sub-groups Effects over long-term vs. short-term Gaps in interpretability due to mix of quant and qual Boster, 2006 Dynarski et al., 2007 Contrast: Brush & Saye 1999, 2001, 2002, 2004, 2005, 2006 Critiques / classic errors: Design I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations Note: A good mix of quant and qual, in my opinion, makes a study always at least usefully bad. Pure quant or pure qualeh. Usually not goodbut can be great, certainly. I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations Note: A good mix of quant and qual, in my opinion, makes a study always at least usefully bad. Pure quant or pure qualeh. Usually not goodbut can be great, certainly.

    18. Example of tobacco company-funded research Accelerated Reader (Oppenheimer, 2003) Ignite!Learning studies Boster, 2006? Critiques / classic errors: Vendor-funded studies? I guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observationsI guess we also have a one-and-done-ness to that is a further limitation. Brush & Saye obviously are a counter-example, and when the second Dynarski et al comes out, it will add longitudinal observations

    19. Carbon monoxide studies Purely attitudinal outcomes Survey research Easy-to-do? State of the field Theres some merit here, but the point is that too much carbon monoxide is harmful. Trace elements exist naturally, it is produced for industrial purposesbut too much of it and you asphyxiateTheres some merit here, but the point is that too much carbon monoxide is harmful. Trace elements exist naturally, it is produced for industrial purposesbut too much of it and you asphyxiate

    20. Carbon monoxide studies Boutique studies Early Logo studies Idiosyncratic OLEs PBL strategies trade off of exploring bleeding edge vs. ecological validity (practicality) State of the field Lots of these, and I dont want to name names. However, I find a lot of merit to these someone has to play with the bleeding edge to find out what worksLots of these, and I dont want to name names. However, I find a lot of merit to these someone has to play with the bleeding edge to find out what works

    21. Carbon monoxide studies Boutique studies Sprague, 2005: Are we talking to ourselves? State of the field again, this is why I like TPaCK it forces one to color outside the lines, cross the streams, what have you. again, this is why I like TPaCK it forces one to color outside the lines, cross the streams, what have you.

    22. Carbon monoxide studies Boutique studies Sprague, 2005: Are we talking to ourselves? Scientifically-Based Research (NCLB) Persuasive research that empirically examines important questions using appropriate methods that ensure reproducible and applicable findings (Beghetto, 2003) State of the field

    23. Carbon monoxide studies Boutique studies Sprague, 2005: Are we talking to ourselves? Scientifically-Based Research (NCLB) Persuasive Empirical Important questions (Does it work? What was it?) Appropriate methods (Privileging RCTs / medical model?) Replicable and applicable findings (Limitation of qual studies) State of the field From the article: Persuasive. This attribute refers to research that is moving from "tentative knowledge claims generated at local research sites to become stabilized and transformed into widely accepted facts" (Smith and others 2002). Appropriate research design, methods, and techniques; logic and reasoning; and replicable results can all help to establish persuasiveness. A critical element in persuasiveness is the peer-review process, in which researchers who have been trained in research methodology review and critique each other's work to help ensure that the methods used match the research questions and conclusions. Research findings published in a peer-reviewed journal can be assumed to have undergone careful scrutiny, been considered in light of alternative explanations, and deemed sufficiently "persuasive" by a panel of individuals with expertise in research methods. Empirical. Research that is empirical is based on measurement or observation, that is, experienced "through the senses" (NRC 2002). For example, research that measures or observes the impact of school vouchers on student achievement would be considered empirical. However, there are certain questions that cannot be addressed by empirical investigations (NRC ), such as "Should school vouchers programs be enacted in my state?" Questions involving "should" are typically addressed through means other than observation and measurement. Important Questions. This refers to questions addressed by research that build upon, add to, fill a void in, or otherwise clarify what is known and practiced. The NRC explains that the importance of a question is often determined by its relationship to prior research, theory, and relevance to policy and practice. Appropriate Methods. This refers to the use of designs, methods, and techniques that fit the nature of the question the study is attempting to answer. However no research design, method, or analytic technique on its own makes a study or program of research scientific (NRC). If the question pertains to "Does it work?," then randomized experiments or quasi-experiments are most appropriate (Raudenbush 2002, Coalition for Evidence-Based Policy). Simply stated, randomized experiments involve randomly assigning individuals, schools, or districts to a group that receives a particular intervention (such as class-size reduction) and to a group that does not. In contrast, if the question pertains to "What was the 'it'?," then qualitative methods (such as the case study) are most appropriate (Erickson and Gutierrez 2002). Among other things, qualitative methods provide "up-close descriptions" of what is, or is not, working; how interventions are working; and what might be facilitating or impeding the effectiveness of a particular intervention (Raudenbush). Replicable and Applicable Findings. In general, this attribute refers to consistent, meaningful findings. The research presents sufficient detail to allow for "replication or, at a minimum, ... the opportunity to build systematically on their findings" (NCLB 2002). Such findings are understandable, accessible, and applicable to a wide audience (Comprehensive School Reform Program Office). For example, a program of research should be designed and conducted to ensure that school leaders across the nation have a solid sense of whether they can expect to see similar results from implementing a school-reform program that has demonstrated increased student learning in another state. From the article: Persuasive. This attribute refers to research that is moving from "tentative knowledge claims generated at local research sites to become stabilized and transformed into widely accepted facts" (Smith and others 2002). Appropriate research design, methods, and techniques; logic and reasoning; and replicable results can all help to establish persuasiveness. A critical element in persuasiveness is the peer-review process, in which researchers who have been trained in research methodology review and critique each other's work to help ensure that the methods used match the research questions and conclusions. Research findings published in a peer-reviewed journal can be assumed to have undergone careful scrutiny, been considered in light of alternative explanations, and deemed sufficiently "persuasive" by a panel of individuals with expertise in research methods. Empirical. Research that is empirical is based on measurement or observation, that is, experienced "through the senses" (NRC 2002). For example, research that measures or observes the impact of school vouchers on student achievement would be considered empirical. However, there are certain questions that cannot be addressed by empirical investigations (NRC ), such as "Should school vouchers programs be enacted in my state?" Questions involving "should" are typically addressed through means other than observation and measurement. Important Questions. This refers to questions addressed by research that build upon, add to, fill a void in, or otherwise clarify what is known and practiced. The NRC explains that the importance of a question is often determined by its relationship to prior research, theory, and relevance to policy and practice. Appropriate Methods. This refers to the use of designs, methods, and techniques that fit the nature of the question the study is attempting to answer. However no research design, method, or analytic technique on its own makes a study or program of research scientific (NRC). If the question pertains to "Does it work?," then randomized experiments or quasi-experiments are most appropriate (Raudenbush 2002, Coalition for Evidence-Based Policy). Simply stated, randomized experiments involve randomly assigning individuals, schools, or districts to a group that receives a particular intervention (such as class-size reduction) and to a group that does not. In contrast, if the question pertains to "What was the 'it'?," then qualitative methods (such as the case study) are most appropriate (Erickson and Gutierrez 2002). Among other things, qualitative methods provide "up-close descriptions" of what is, or is not, working; how interventions are working; and what might be facilitating or impeding the effectiveness of a particular intervention (Raudenbush). Replicable and Applicable Findings. In general, this attribute refers to consistent, meaningful findings. The research presents sufficient detail to allow for "replication or, at a minimum, ... the opportunity to build systematically on their findings" (NCLB 2002). Such findings are understandable, accessible, and applicable to a wide audience (Comprehensive School Reform Program Office). For example, a program of research should be designed and conducted to ensure that school leaders across the nation have a solid sense of whether they can expect to see similar results from implementing a school-reform program that has demonstrated increased student learning in another state.

    24. Carbon monoxide studies Boutique studies Sprague, 2005: Are we talking to ourselves? Scientifically-Based Research (NCLB) Are we in the wrong paradigm? (Reeves, 1993; Shaver, 2000) Scientific question: Why does this happen? Engineering question: Is it useful? State of the field

    25. We see your gold standard and raise you a platinum standard (Schrum et al., 2005) Here are specific kinds of studies we need to do (Roblyer, 2006) Establish relative advantage Improve implementation strategies Monitor impact on important societal goals Monitor and report on common uses and shape desired directions Reading for next week Note that Roblyer laid out the idea in 2005 and is writing a series of research highlights. Im asking you to read her example of a type 2 studyNote that Roblyer laid out the idea in 2005 and is writing a series of research highlights. Im asking you to read her example of a type 2 study

    26. Can you describe its design? Methods? How does this stack up? Quant fumbles Design fumbles More carbon monoxide? A boutique study? Talking to only IT people? SBR-ready? Hopelessly positivist? Time permitting: Tearing into my work

    28. Treatment: Instruction, plus end-of-unit project; diff topics per projectTreatment: Instruction, plus end-of-unit project; diff topics per project

    29. Quantitative data All are teacher-derived tests. Tests derived from CG, items aimed to mirror end-of-year SOLs Unfortunately, pre- and post- items are NOT included on the unit tests! Same content, but diff questions. On the positive side: really reduces the threat of carry-over. On the negative side: Assuming construct validity from pre/post to unit requires a leap of faith Variable coefficients of reliability: Sem pre-post ranges from .24 to .58; end-of-unit tests range from .4 to .8All are teacher-derived tests. Tests derived from CG, items aimed to mirror end-of-year SOLs Unfortunately, pre- and post- items are NOT included on the unit tests! Same content, but diff questions. On the positive side: really reduces the threat of carry-over. On the negative side: Assuming construct validity from pre/post to unit requires a leap of faith Variable coefficients of reliability: Sem pre-post ranges from .24 to .58; end-of-unit tests range from .4 to .8

    30. Qualitative data Classroom obs = 80 class periods Student responses = approx 1000 total Student projects = approx 80 total Classroom obs = 80 class periods Student responses = approx 1000 total Student projects = approx 80 total

    34. Link to 271Link to 271

    35. Point of including this slide: As we can see on left, lots of off-task activity with backgrounds. As we can see from featured slide: content-inappropriate image. (Clip art suggests military circa Crimean War, or pre-Civil War; content = Spanish-American War, or shortly before World War I)Point of including this slide: As we can see on left, lots of off-task activity with backgrounds. As we can see from featured slide: content-inappropriate image. (Clip art suggests military circa Crimean War, or pre-Civil War; content = Spanish-American War, or shortly before World War I)

    36. Point of this slide: info copied from wikipediaPoint of this slide: info copied from wikipedia

    37. The especially interesting bit to me is the info addressed on the semester exam countThe especially interesting bit to me is the info addressed on the semester exam count

    38. Whats due Wednesday? How is it going to get done? Where is it to be posted? What level of assistance / oversight will the instructor provide? Closure

More Related