Unveiling Student Evaluation Misconceptions

Student Evaluation Questionnaires: Dispelling Misconceptions and Looking at the Literature. Aline Germain-Rutherford, Ph.D University of Ottawa

“In Antioch in AD 350 … … any father who was dissatisfied with the instruction given to his son could file a formal complaint to a panel and laymen, and ultimately transfer his son to another teacher if the teacher could be shown to have neglected his duty” (Doyle, 1983, quoted by Marsh, 1987)

Then … • Harvard - 1926 • 1055 studies between 1976 and 1984 • H. H. Remmers, first systematic research in 1927

3 distinctive periods (Bernard, 2002) • 1970-80: Formative evaluation • looking at indicators of teaching efficiency. • 1980-1990: Administrative use of evaluation • examining the validity of student ratings, • looking for multiple ways to conduct a comprehensive evaluation • 1990 – 2000: Teaching Dossier • the impact and the effectiveness of students’ evaluation.

The question of accountability “Market forces, responding to society’s perception of the failure of traditional higher education, brought forth the rapid emergence and success of private online educational institutions, as well as corporate universities” (Arreola, 2002)

The effectiveness of students’ evaluations: Students’ perceptions • Evaluation is important to improve teaching and students’education But • Little impact on students’education • Little influence on faculty practices • Faculty don’t read students’ comments • They are not considered for promotion • Universities are not interested in teaching • Teaching is not very important to Faculty • Evaluation is important but not useful • « Opération bidon » (Bernard, 2002)

For what purpose … • A measure of teaching effectiveness to be used in administrative decision-making (promotion, tenure, renewal of contract) • A diagnostic feedback to faculty about the effectiveness of their teaching that will be useful for the improvement of teaching (formative evaluation) • Information for students to use in the selection of courses and instructors.

A measure of the quality of the course, to be used in course improvement and curriculum development • An outcome or a process description for research on teaching.

Are they biased ? • 72% … the course difficulty, • 68% … grading leniency, • 63% … instructor popularity • 62% … student interest in subject before course • 60% … course workload, • 60% … class size, • 55 % … reason for taking the course. (Marsh & Overall, 1979)

What is the perception of Faculty? • All Faculty must be evaluated • All Courses must be evaluated But • Student should not be the only source of information • Faculty must be allowed to take part • Evaluation helps to improve teaching • Students’ written comments are an interesting source of information but also a «lieu de défoulement» • Questionnaires are not a good tool for evaluation (Bernard, 2002)

The most common questions, beliefs and concerns… • Aren’t student ratings just a popularity contest, based on instructor’s style rather than on the content of their delivery? • Aren’t student ratings forms just plainly unreliable and invalid? Students’ evaluation questionnaires provide inconclusive evidence about teaching effectiveness. • Aren’t students too immature, inexperienced and capricious to make any consistent judgments about the instructor and instruction? • Don’t students have to be away from the course and possibly the college/university for several years before they are able to make correct judgements about the instructor and instruction? • Isn’t it true that I can “buy” good student ratings just by giving easy grades?

6. Isn’t true that students who are required to take a course tend to rate the course more harshly than those taking it as an elective? 7. Do Majors in a course rate it differently than non-Majors? 8. Isn’t it generally easier to get good ratings in upper year courses? 9. Isn’t there a gender bias in student ratings? Don’t female faculty tend to get lower ratings than male faculty? 10. Isn’t it more difficult for Math and Sciences faculty to get good ratings? What is the generality of student ratings across different courses taught by the same instructor? Is there an effect of the specific course being taught on the rating of the instructor?

11. Isn’t it true that the size of the class affects student ratings? 12. Does the time of the day the course is taught affect student ratings? 13. Does the rank of the instructor (instructor, assistant professor, associate professor or professor) affect student ratings? 14. Does the workload/difficulty level of a course cause a potential bias in student ratings? 15. Isn’t it true that the only faculty who are really qualified to teach or evaluate their peers’ teaching are those who are actively involved in conducting research in their field? 16. What good are student ratings in an effort to improve instruction? 17. Is the students' evaluation questionnaire the appropriate tool to measure the instructor’s impact on student learning?

What does the research tell us … • Aren’t student ratings just a popularity contest, based on instructor style rather than on the content of their delivery? • Teaching is multifaceted • The design of instruments to evaluate teaching effectiveness should reflect this multidimensionality.

“A large item pool was obtained from a literature review, instruments in current usage, and interviews with faculty and students about characteristics which they see as constituting effective teaching. Then, students and faculty were asked to rate the importance of items, faculty were asked to judge the potential usefulness of the items as a basis for feedback, an open-ended student comments on pilot instruments were examined to determine if important aspects had been excluded. These criteria, along with psychometric properties, were used to select items and revise subsequent versions. This systematic development constitutes evidence for the content validity of SEEQ and makes it unlikely that it contains any irrelevant factors.” (Marsh, 1987: 266)

2. Aren’t student ratings forms just plainly unreliable and invalid? Other criteria to be examined: • Rating of former students, • Changes in student behaviours, • Instructor’s self-evaluations, • Evaluation of peers and/or administrators, • Frequency of occurrence of specific behaviours observed by trained observers, • Effects of experimental manipulations.

“ There is warrant for ascribing validity to student ratings not merely as measures of students’ attitudes toward instructors for which validity and reliability are synonymous but also as measured by what students actually learn of the content of the course (Remmers, et al.,1949); “undergraduate judgment as a criterion of effective teaching … can no longer be waved aside as invalid and irrelevant” (Remmers, 1950)

Study the environment (group of students - room – schedule – resources - etc.) • Study the students’ characteristics • Peer, chair and alumni ratings • Examination of course outlines • Teaching portfolio and self-assessment (Bernard, 2002)

3. Aren’t students too immature, inexperienced and capricious to make any consistent judgments about the instructor and instruction? 4. Don’t students have to be away from the course and possibly the college, for several years before they are able to make correct judgements about the instructor and instruction? Student ratings are quite stable, and added perspective does not alter the ratings given at the end of a course.

5. Isn’t it true that I can “buy” good student ratings just by giving easy grades , that instructors who inflate grades are rewarded by positive evaluations? “The influence of student motivation upon student performance, grades, and satisfaction [ratings] appear to be a more potent contributor to the covariation between grades and satisfaction than does the direct contaminating effect of grades upon student satisfaction.” (Howard & Maxwell, quoted by Marsh, 1987)

6. Isn’t true that students who are required to take a course tend to rate the course more harshly than those taking it as an elective? 7. Do majors in a course rate it differently than non- majors? • major requirement, • major elective, • general interest, • general education requirement, • minor/related field, • other

8. Isn’t it generally easier to get good ratings in upper year courses? 1st year students tend to rate a course more harshly than 2nd year students, 2nd year students tend to rate a course more harshly than 3rd year students, etc… Graduate level courses tend to receive slightly higher ratings. 9. Isn’t there a gender bias in student ratings? Don’t female faculty tend to get lower ratings than male faculty? Gender has no effect on student ratings and male and female instructors are rated the same way.

10. Isn’t it more difficult for Math and Sciences faculty to get good ratings? What is the generality of student ratings across different courses taught by the same instructor? Is there an effect of the specific course being taught on the rating of the instructor? Humanities and arts are usually rated higher than social sciences, and social sciences are rated higher than physical sciences and engineering. “Students’ evaluation primarily reflect the effectiveness of the instructor rather than the influence of the course, and some instructors may be uniquely suited to teaching some specific courses.” (Mars, 1987:278)

11. Isn’t it true that the size of the class affects student ratings? Class size is moderately correlated with the most logical related variables (Group Interaction and Individual Report), but not with other dimensions, nor with the overall ratings of the course instructor. • 12. Does the time of the day the course is taught affect student ratings? There is too little research on this aspect. The research done doesn’t see a time-of-the-day effect in student ratings.

13. Does the rank of the instructor (instructor, assistant professor, associate professor or professor) affect student ratings? No relation between rank and global ratings, but a slightly positive correlation with the dimension “breadth of coverage” and slightly negative correlation with the dimension “group Interaction”. 14. Does the workload/difficulty cause a potential bias in student ratings? Positive correlation with student ratings. In pairs of courses taught by the same instructor, courses perceived to have higher level of workload/difficulty also achieved higher ratings.

15. What good are student ratings in an effort to improve instruction? • Global items can be used for administrative summative decisions about teaching. • Students’ comments are more useful for diagnostic feedback for the instructor than class-average ratings. But • Only instructors who already have an interest in improving their teaching will use the students’ comments.

The instructor has to obtain new information on his/her teaching • He/she has to value this feedback • He/she has to know how to make changes • He/she has to be motivated Bernard & Bourque (1997) and Centra (1993)

“ The key finding that emerges here is that student ratings can be used to improve instruction if used as part of a personal consultation between the faculty member and a faculty development resource person” (Arreola, 2002).

16. Is the Student Evaluation Questionnaire the appropriate tool to measure the instructor’s impact on student learning? “The assumption that effective teaching and student learning are synonymous is unwarranted. A more reasonable assumption that is consistent with the construct validation approach is that student learning is only one indicator of effective teaching, even if a very important indicator” (Marsh, 1987)

17. Isn’t it true that the only faculty who are really qualified to teach or evaluate their peers’ teaching are those who are actively involved in conducting research in their field? Peer ratings are “1) less sensitive, reliable and valid, 2) more threatening and disruptive of faculty morale and 3) more affected by non-instructional factors such as research productivity” than student ratings. (Murray, 1980)

To conclude … A review of the literature indicates that students’ evaluation of teaching effectiveness questionnaires, if well designed, properly administrated and its results carefully interpreted , are valid and reliable tools, and can provide valuable formative information to the instructor for teaching enhancement purposes and summative information to the administration for personnel decision.

An integrated strategy: Where faculty development, teaching evaluation and teaching improvement system, and valorisation of teaching are all interconnected and interdependent. Bernard and Bourque (1997)

Unveiling Student Evaluation Misconceptions