Measuring & Evaluating Teaching

An Informal Summary of University Policies, Procedures, and Resources for Undergraduate Instruction

OID: Faculty Consultation

Measuring & Evaluating Teaching

Defining and measuring effective teaching

Effective teaching can be defined, very simply, as activities that promote student learning. It encompasses all of those instructor behaviors that foster student learning of the instructor’s and/or of the institution’s educational objectives. Ideally, the students will also share at least some of these objectives. This definition of effective teaching includes curriculum and course development, advising, and supervision of student research as well as classroom performance. Given this broad definition, no single approach is sufficient for evaluating effective teaching. Rather, student ratings, self-reviews, peer evaluations, and objective criteria such as student performances and improvements are all useful tools for evaluating different aspects of teaching.

Table 1.1 below indicates some important sources of data that can be used to measure effective teaching. The sources fall into three main types: students, peers, and the instructor him/herself (through self-reflection). Since measuring teaching is clearly not an exact science, the more varied the data sources, the more useful the measurement is likely to be.

Since there is a great deal of focus on student evaluations used to assess teaching at UCLA, much of the remainder of this guide pertains to the implementation of this particular measure. However, the other two sources of data can provide valuable additional insight, and should be considered as part of any comprehensive approach.
 

Sources of information about teaching

The following is a list of some important sources of information about teaching and their main advantages and disadvantages for evaluation purposes.

Students

Systematic student evaluations

These are very important for a global picture of the course. The students are the ones who are doing the learning, so their perception is important. Their response often highlights strengths and weaknesses. However, students are not subject matter experts. Also, students’ ratings are sometimes influenced by their own motivations, attitudes and needs.

Interviews with students

This is a very useful evaluation procedure which can yield much information in a short time. A group of students from a course are interviewed by other faculty about their experience in a course. A structured format is followed and typically, a consensus view of the nature of the course -- its strengths, weaknesses, and problems -- emerges in 15 to 20 minutes. The difficulties with this technique are associated with the training needed to do the interviews and report the results, and the selection and recruitment of the sample of students interviewed.

Long-term follow-up of students

Surveys or interviews with seniors and alumni can yield information based on a wider context of university and life experience than given by the usual end-of-course student survey. However, reaching alumni can be difficult so response rates are often low

Peer Review

Classroom visits

Visits by other faculty can provide information about the process of teaching. However, correct use of this procedure is time consuming. It is best done when training can be provided, and two or more visits can be arranged by at least two observers. In addition, this technique is most effective when prefaced by a discussion between the instructor and observer regarding the goals of the class.

Colleague evaluation of materials

Colleagues have the expertise to evaluate the quality of a course as evidenced by its content and format. Colleagues can also evaluate student achievement as indicated by performance on exams and papers.

Self reflection

Teaching activities, reports and self-reviews

The instructor’s statement of his/her goals for the course, teaching methods and philosophy, student outcomes, and plans for improvement are a critical source of information. Oftentimes, there may be external factors, bad classes, difficult teaching problems and experiments with innovative teaching techniques (which may lower ratings initially before ultimately raising them) on which only the instructor can reflect. A systematic self-review has the potential for contributing significantly to the instructor’s teaching improvement by focusing on the strengths and weaknesses of the course in light of his/her original course objectives. If the instructor notes broad shifts from the course’s original objectives it may lead to a reassessment of methodological approaches when drafting future courses.

Measures of student achievement

When appropriate tests are available, measures of student learning are a prime criterion of effective teaching. However, valid direct measures of student learning require considerable developmental effort. Also, interpretation of achievement tests requires some comparable measures of student motivation and interest. There are a number of informal assessment techniques which may be employed to gather this information. (See for example, Angelo and Cross, 1993.)
 

Data from students: Systematic student evaluations

The student evaluation system at UCLA

The Office of Instructional Development’s Evaluation of Instruction Program (EIP) helps to assess and improve teaching at UCLA by providing instructor evaluation services for instructors and TAs across campus. At the end of each quarter, instructors have the opportunity to solicit formal evaluations from students enrolled in their classes. EIP distributes, collects, and processes evaluation forms via a network of departmental coordinators. Interested instructors should first contact their department evaluation coordinator, or may reach the centralized Evaluation of Instructional Program at 310-825-6939 (http://www.oid.ucla.edu/assessment/eip).

EIP’s standard evaluation forms, which cover most teaching situations for faculty and TAs, were designed in consultation with faculty committees, national experts on assessment, and recommendations from UCLA and System-wide surveys of faculty, TAs, and students. While most departments on campus use the standard campus-wide forms, a few departments and schools have devised their own standard forms that are distributed and administered by EIP. The Faculty Consultation Service can work with departments and individuals to develop questionnaires for special needs.
 

Policy on data

The Evaluation of Instruction Program seeks to provide as secure an environment for data as it can. The forms are stored for processing in a physically safeguarded location. Data are compiled centrally only for the numerical scores. In addition to individual instructor scores, larger collective comparison data (e.g. for Divisions, or Schools) are also calculated. These compiled individual and comparison data are returned to Department Chairs along with the original student forms. Each Department may devise its own particular policy on how data are presented, distributed, or made openly available. Most often, the forms are returned to the instructor so that the written comments may be read firsthand. Some departments choose to transcribe the comments for small seminars to protect the identity of the students. The numerical data are provided by the Evaluation of Instruction Program in electronic format, and include both departmental scores and appropriate comparison data. The Evaluation of Instruction Program does not release individual instructor data except to the Department or to the individual instructor. External requests for such data are referred to the Departments. External requests for comparison data (without any individuals identified) are generally granted.
 

Retaining data

Most departments develop a reliable system for storing numerical data from teaching evaluations. It is less common for departments to retain written comments. Because such data are often used when compiling dossiers for personnel decisions, faculty should be careful to keep copies of their own evaluations. Even normally reliable systems sometimes have unexplainable lapses, and it can be extremely difficult (if not impossible) to re-establish such data after the fact. In addition, it may be useful to annotate records with information that might provide insight into any anomalous results (e.g. “the class was scheduled into a room that was frequently impacted by construction noise,” or “it was my first attempt to develop group projects and they did not work the way I had hoped.”)

Some departments retain only overall ratings, and again, instructors would be better advised to keep data which encompass all the individual items on the form. Such information can often expand on understanding why certain classes may have been rated higher or lower.

Most evidence of teaching data is used within a period of six years after the time it was collected. Based on actual requests for re-establishing older records, eight years would provide a more certain time frame. Comparison data, such as to other instructors in the department or to the overall University mean, should likewise be kept in order to provide bases for comparison, should later disputes arise. Faculty who are nominated by their departments for teaching awards also find some of the written comments useful in documenting what students find particularly compelling about their teaching.

Disputed data

In the infrequent situation that the integrity of the data are disputed (e.g. if forms are intercepted by the instructor or the chair before processing, or considerably more forms are returned than the number of students enrolled in the course) what facts are known are forwarded to the Committee on Teaching for any further consideration of action. Other Senate Committees may become involved as appropriate.
 

Student questionnaire administration procdures

Since student ratings are sensitive to a wide variety of situational factors, it is very important that questionnaire administration procedures are consistent. The goal is to get candid feedback focused on instructional issues from as many students in the course as possible. The following general guidelines for collecting student ratings may help instructors achieve this goal:

Protect student anonymity

It is important that students not be identified in any way with their evaluations, otherwise less than candid responses are likely. In particular, students fear that an adverse rating might negatively impact their course grade. It is, of course, difficult to maintain confidentiality of student raters in small classes and individual study courses. There is no simple solution to this problem. One option is to have the departmental coordinator type up the responses in such cases.
 

Have a third party administer evaluations

In order to protect anonymity, evaluation questionnaires should not be administered by instructors or TAs. Rather, a responsible student or the department evaluation coordinator should be appointed to collect the completed questionnaires and deliver them to the department office.
 

Time questionnaire administration appropriately

Ratings should be collected during the last two weeks of the quarter. Students should be forewarned that evaluations will be done on a certain date so that they will be in class and will be prepared. Administration of evaluations at the final exam or other test is not recommended.
 

Emphasize the importance of evaluation

It is advisable to give students some context for the evaluation, especially for first year students. It is useful for them to know that the department and instructor value their comments, and the use to which they will be put. Distributing questionnaires at the beginning of the class period and allowing sufficient time for students to complete them, all contribute to the sense of importance placed upon a student’s opinion, and are hence likely to produce more constructive results.
 

Interpreting quantitative student evaluations for a course

Several weeks after the end of each quarter, instructors will receive their students’ responses to the formal questionnaires, along with a summary sheet of statistical information, including mean, median, and standard deviation of the questionnaire responses. Instructors should keep the following points in mind when interpreting the results:
 

Procedures

Were reasonable procedures (such as suggested in the preceding section) used to collect the ratings?
 

Sample size/response rate

How many students in the class completed the questionnaires? Responses from less than two-thirds of the class should be viewed with caution. The minimum number of student raters in a class should be 8 to 10; a sample of 15 or more students increases reliability and generalizability of the results. It should be noted, however, that even in such “small sample” situations the qualitative comments may still be extremely valuable.
 

Making comparisons

The average ratings can be interpreted according to an absolute scale or relative to the ratings of other courses and instructors. For example, a mean rating of 6.5 on a 9-point scale for overall course evaluation may seem “above average.” Taking them at their word, students rated this course as adequate. It may be, however, that 70 percent of similar courses are rated above 6.5. Thus, relative to other courses, the 6.5 rating was in the lower third. Which interpretation is correct? The answer is that both interpretations are useful. The course was judged positively, but students were not particularly excited by it.

 

Have a third party administer evaluations

In order to protect anonymity, evaluation questionnaires should not be administered by instructors or TAs. Rather, a responsible student or the departmental EIP coordinator should be appointed to collect the completed questionnaires and deliver them to the department office.

Time questionnaire administration appropriately

Ratings should be collected during the last two weeks of the quarter. Students should be forewarned that evaluations will be done on a certain date so that they will be in class and will be prepared. Administration of evaluations at the final exam or other test is not recommended.

Emphasize the importance of evaluation

It is advisable to give students some context for the evaluation, especially for first year students. It is useful for them to know that the department and instructor value their comments, and the use to which they will be put. Distributing questionnaires at the beginning of the class period and allowing sufficient time for students to complete them, all contribute to the sense of importance placed upon a student’s opinion, and are hence likely to produce more constructive results.

Interpreting quantitative student evaluations for a course

Several weeks after the end of each quarter, instructors will receive their students’ responses to the formal questionnaires, along with a summary sheet of statistical information, including mean, median, and standard deviation of the questionnaire responses. Instructors should keep the following points in mind when interpreting the results:

Procedures

Were reasonable procedures (such as suggested in the preceding section) used to collect the ratings?

Sample size/response rate

how many students in the class completed the questionnaires? Responses from less than two-thirds of the class should be viewed with caution. The minimum number of student raters in a class should be eight to 10; a sample of 15 or more students increases reliability and generalizability of the results. It should be noted, however, that even in such “small sample” situations the qualitative comments may still be extremely valuable.

Making comparisons

the average ratings can be interpreted according to an absolute scale or relative to the ratings of other courses and instructors. For example, a mean rating of 6.5 on a nine-point scale for overall course evaluation may seem “above average.” Taking them at their word, students rated this course as adequate. It may be, however, that 70 percent of similar courses are rated above 6.5. Thus, relative to other courses, the 6.5 rating was in the lower third. Which interpretation is correct? The answer is that both interpretations are useful. The course was judged okay, but students were not particularly excited by it.

The campus context

In making comparisons, it may be helpful to consider the campus context. Means do vary considerably between departments and divisions, or between different kinds of courses. Departments are, therefore, encouraged to keep records regarding their own norms.

Variability of responses

The variability of student responses is important diagnostic information. For example, consider an average course rating of 7 on a 9-point scale with a small standard deviation, say of 1.0 or so. This means that most students rated the course as 6, 7, or 8. On the other hand, the same average rating with a larger standard deviation, say 3.4, indicates a greater range of ratings that may suggest problems with the course. It is also important to look at the total distribution of ratings. For example, there are sometimes bimodal distributions of ratings in which a large group of students really liked the course and another large group of students disliked it. This could indicate two different ability or interest groups in the class, which would be worth exploring further for future iterations of the course.

The importance of other variables

Although student response to a course is important for evaluation and teaching improvement, it should not be used as the only measure of teaching. Student ratings are affected by many variables including course content, amount of work, expected grades, class size, and students’ own needs and attitudes.
 

Interpreting quantitative student evaluations for personnel decisions

Faculty are often concerned about how evaluation results will be used by their departments in administrative personnel decision-making. Although details of the process differ among departments, the substantial body of research literature on the use of student course and instructor evaluations suggests that the following practices be observed:

  • Guidelines for administration and explanations of how ratings will be used should be consistent and well-publicized.

  • Small differences between scores should be kept in perspective.

  • Multiple sets of student ratings should be used in administrative decision-making.

  • Global ratings (i.e., overall ratings of the instructor) are more often used than specific items (such as the instructor’s organization or communication skills) for making personnel decisions. While this may be the most convenient measure, decision-makers might note that global ratings are also those most likely to reflect personal bias on the part of students.

  • Ratings for any single instructor or course should be considered in conjunction with university, college, division, department, and even specific course norms.

  • Multiple sources of information should be used in administrative decision-making. In other words, numerical ratings should be only one piece of the larger picture.

Further help for faculty concerned about student evaluations

Perhaps the most common concern that faculty members express about the evaluation process is that students do not take the evaluations seriously and that students are not aware of the gravity of their input into the tenure and merit review process. Research and experience show that instructors who openly announce to students that they themselves take student input seriously are usually the recipients of the most constructive comments.

The vast body of research on student ratings that has accumulated over the years also suggests that student ratings correlate highly with the ratings conducted by one’s own colleagues and even, in some instances, with self-ratings. The high correlation holds up, however, only when students are asked to judge selected aspects of teaching. Students as consumers are well-equipped to assess the clarity of presentation, to comment on organizational matters, to rate the quality of student/instructor interaction, and even to assess their own learning to some extent. Students are not, however, the best judges of whether what they are learning is current, whether the instructor’s perspective is biased, or even whether the selection of course material was appropriate for the achievement of course goals.

Instructors occasionally deride quarterly evaluations because they believe that students cannot make accurate judgments regarding a course or instructor until the students have been away from the course, or even the university, for several years. While it has proven to be very difficult for researchers to obtain a representative sample in longitudinal follow-up studies, the research shows that, in general, alumni who had been out of school for five to ten years rated instructors much the same as did the students currently enrolled. Current research also provides little substantiation for the widely held belief that only warm, friendly, humorous instructors win the “ratings game.” Most students are quite able to discriminate between glossy presentations with little substance and less flashy lectures with relevant content.

If a faculty member’s number one concern about evaluation of instruction results is how they will be used in personnel decision-making, the number one concern among students is that their feedback will not be acted upon. It is, therefore, crucial that having conducted any type of feedback activity with students, instructors be seen to respond to the results. This may not be as easy as it sounds, since bimodal distributions, for example, can make obvious courses of action elusive.

It is important for faculty—particularly first-time faculty—to remember that some students can be insensitive or may lack the maturity necessary to deliver constructive criticism in a non-threatening manner. At times, their comments are overstated and short on diplomacy. While such comments can be very discouraging, if they come from only a few students, they represent only an extreme point of view. However, if such comments come from a majority of the students, advice from a trusted peer or from an objective consultant might be useful. Even if painful, they may contain insight into teaching issues that can be addressed – but again, only if they present a cogent argument, not just a personal attack.
 

Consultation

Faculty are well advised to seek consultation when deciphering their teaching evaluations. Quite often an outside perspective offers insight to evaluation data that may not be apparent to the recipient of the evaluation. Speaking with peers or mentors within one’s department or at other universities may help discern new approaches to instructional improvement.

Instructors who wish for advice outside of a departmental context are welcome to contact the Office of Instructional Development’s Faculty Consultation service. The office offers consultation to faculty on all aspects of teaching and learning. The objective is to help them reflect on their classroom practice and suggest strategies for improvement. Faculty Consultation services take the form of individual consultation by appointment, workshops for groups of interested faculty, and seminars in college teaching tailored for individual schools/departments. Professional consultation is given based on an analysis of some initial inquiry, or on the provision of such data as a classroom video taped by Audio Visual Services or a set of evaluation results administered through the Evaluation of Instruction Program (EIP). All consultations are strictly confidential and participation by faculty is entirely voluntary.
 

Supplemental types of evaluation

While end of term student questionnaires continue to be the predominant type of instructor evaluation in most departments, various instructional needs often warrant the inclusion of additional evaluation strategies. The usefulness of an evaluation strategy depends on the desired level of detail necessary and the timeframe within which the data are required. The following section offers a range of techniques that vary from informal feedback strategies to help address current term teaching improvement, to more elaborate processes that garner in depth evaluative data for ongoing improvement.
 

Informal feedback strategies

Questioning—A very simple tool for checking effective teaching is to incorporate specific questions within a lesson to gauge student understanding of the material. For example, an instructor may ask students to verbally answer a question similar to one that will be asked on an exam. This tool is more useful than simply asking if students have any questions because students who are confused may not be able to articulate their questions. Moreover, some students may falsely believe they understand the lesson and not ask questions. Checking for understanding within a lesson helps the instructor discover students’ level of learning and to make adjustments during the lesson itself.

Classroom Response Systems—A problem with simple questioning is that an instructor generally will get a response from only one or two students rather than the entire class. This problem can be resolved with a few strategies that fall under the Classroom Response umbrella.

The first strategy is the easiest to implement. An instructor asks a multiple choice question or makes an agree/disagree statement about the material. Students indicate by the position of their thumb whether they believe the answer is A (upright), B (sideways), or C (downward) or Agree (upright) or Disagree (downward). The instructor can then quickly look around the room to determine how many students have the correct answer.

The second strategy involves the use of colored index cards. Its method is identical to the first strategy except that the instructor is using color coded cards for the responses. The advantage of using colored index cards is that they are easier to see than thumbs.

The third strategy involves the use of hand-held remote controls (“clickers”) to measure student responses. The technology is linked to software in a computer—either a laptop or a classroom computer—and can keep a record of student responses. Many instructors use this technology by imbedding the question into their presentation software. Both the instructor and students receive immediate feedback to the responses. In addition to the recordkeeping aspect of this system, a primary advantage of clickers is student anonymity in their responses in the classroom. A major disadvantage is the cost and performance reliability of the clickers themselves.

Open Class Discussion—This technique can be used either during the class session or by monitoring student online discussion. By asking discussion questions that require critical thought, instructors are able to gauge students’ understanding of the lesson material and whether they are making necessary connections to other course material. Many times students believe they know the material but their misunderstandings are revealed during discussion.

Minute Paper—This evaluation tool is done at the end of class several times during the quarter. It derives its name from the fact that students spend no more than one minute answering any number of questions. The instructor reads the responses before the next class meeting and responds appropriately. Examples of questions asked are
  • What was the most important thing you learned during class?
  • What unanswered questions do you have?
  • What was the muddiest point for you?
  • At what point this week were you most engaged as a learner?
  • Can you summarize today’s lesson in one sentence? If so, please summarize it.
  • What has been most helpful to you this week in learning the course material?

Index Card—A variation on the Minute Paper is for the instructor to write the responses to the following questions on a 3 x 5” index card following a lesson: “What worked? What didn’t work? What are some ideas for changing the lesson?” The 3 x 5 card limits the amount of information than can be written down and serves as a reminder to write down ideas but to only spend a few minutes writing them down. Attach the card to the lesson notes to serve as a reminder the next time the lesson is taught.

Course Exams and Assignments—Student success on course exams and assignments are a powerful data source on teaching effectiveness. A short questionnaire at the end of exams can ask students to identify which questions were the most difficult to answer and why they were difficult. A pattern may develop that can be used to make changes. Additionally, an instructor may ask students to critique assignments. Questions instructors may ask are:
  • Were instructions clear?
  • Did the assignment help students learn course material?
  • Were the expectations reasonable for the time-frame?
  • How many hours were devoted to completing the assignment?

Mid-quarter evaluation—An effective way of gauging student learning and satisfaction is via anonymous mid-quarter evaluations. The evaluations can take a variety of forms. A simple survey asking students to describe what is working, what is not working, and suggestions for change can be conducted via paper-pencil or online. Many of the course management systems have tools that allow anonymous feedback. Instructors need to check with their system’s administrator to find out how to do it. Many instructors provide 15-25 minutes of class time to a neutral party for the purpose of getting feedback from students. A more formal method is to use the same forms that are used for course evaluations. One thing to note is that even if course changes cannot be made during the quarter the evaluation takes place, mid-quarter evaluations allow instructors to engage in dialogue with their students regarding the teaching-learning process and students will feel more positive toward the instructor.

 

Data from students: Interviews with students

A procedure using interviews with students is an excellent technique for obtaining rich, in-depth information about student reaction to courses and instructors. Two methods have proven useful. The first involves an interview with a group of students. The second procedure uses a series of interviews with single students.

The procedure is simply to have two or more colleagues (either in an instructor’s department or from other departments) interview a group of students from a current course. Alternatively, a group of former students may be interviewed. The interview usually takes no longer than 30 minutes, and a brief two or three page report is completed by the interviewers. A group interview requires some planning, but it is really not a difficult technique to use and it can yield valuable information. An instructor may request group interviews when he or she wants some candid student feedback at midterm, or after a course is completed. A department may also want to use this technique to get information for personnel actions such as tenure decisions, accelerations, or special teaching awards.

The interviewers

Usually colleagues from the instructor’s department or other departments do the interviews. It is best to use two interviewers, one to ask all the questions, the second to record the responses. Alternatively, the Faculty Consultation service offer consultation regarding the training of interviewers.

Conference with instructor

The interviewers should meet the instructor to learn the course characteristics, goals, instructor’s concerns and make arrangements for the interview. This meeting should decide the issues to be stressed in the interview, such as course objectives, organization, workload, instructor skills, instructor-student relationships, students’ attitudes, and the like. The instructor may have special concerns that should be considered in structuring the interview. Decisions should be made about whether the report should be written and/or oral and who should receive copies.
 

Constructing the interview schedule

After meeting with the instructor, the interviewers should formulate some of the questions for the interview. Table 1.2 provides suggested questions. Issues frequently arise during an interview which suggests a line of questioning not anticipated in the interview schedule. Probing students’ comments may sometimes be more useful and appropriate than asking all the questions on the schedule.

Sample interview schedule for a focus group with students

The instructor of this class has asked for your feedback on his/her teaching of the class. Your comments will be treated with the strictest confidence. Suggestions from this focus group will be summarized and relayed anonymously to the instructor to improve his/her teaching. Please take a few minutes to answer the questions below before we discuss them as a group. Be as specific as possible, giving examples to illustrate your points. The more constructive you are, and the more suggestions you can give, the more you will help your instructor to improve, both in this and in other classes. Thank you in advance for your time.

1. Do you feel that your instructor is well-organized in this class? (Please explain your answer with an example. You might wish to comment on time management, presentation of concepts, or clarity of explanations)

2. Do you find it easy to identify the main points from each class? (Please comment with reference to ways the instructor helps you take notes, or uses other methods to summarize or emphasize main points.)

3. Do the exams/quizzes relate well to the class material? (Please comment on whether questions are fair, and reflect concepts taught in class.)

4. Is the feedback you receive on assignments helpful? How might feedback you receive be improved to help you to learn better?

5. Please identify key strengths your instructor could build on to improve his/her teaching.

6. Please identify any barriers which prevent you learning in this class. Wherever possible please suggest specific ways in which the instructor could help you to overcome these barriers.

7. How appropriate was the instructor’s use of technology in this class? How did the use of technology enhance your learning experience, if at all?

Interviewing the students

A convenient time for the interview is the last 20-30 minutes of a class period. Unless class time is used for the interview, it is difficult to get a representative group. In conducting the interview, one interviewer should concentrate on asking the questions, the other on recording the answers and comments. Taping interviews is not recommended because of the problem of maintaining confidentiality.

The instructor should briefly discuss the purpose of the interview with the class before the interviewers arrive. After introducing the interviewers, the instructor should leave.

Group size

The size of the group is an important variable. In a small group (less than 12-15), the interviewers can more easily probe for in-depth information. With a larger group, the number of students who want to comment makes in-depth coverage more difficult. You can divide the class group into smaller groups of about five and select one person from each group to act as a recorder and one as a spokesperson. If you are dividing the group up, then first have the smaller groups meet individually and arrive at a consensus on the predetermined questions. Then after 10 minutes of discussion time, have each spokesperson report one response to each of the questions.

Alternatively, to get a representative sample of students from large classes, the interviewers may want to use the instructor’s course list and/or grade book to select a small number of students to participate in the interview. In larger classes, students may feel more comfortable writing down their responses before participating in a group discussion.

Data from peers

While peer review of teaching may take many forms, at UCLA it most often involves class observation. Classroom visitation is a form of evaluation strongly supported by faculty as a useful source of information. When colleague observation is undertaken for instructional improvement, the most important considerations in establishing systematic and fair procedures are:

Number and timing of visits

In courses taught exclusively by the lecture method, at least two visits by each colleague evaluator are advisable. If the instructor employs a variety of teaching strategies (such as lecture, discussion, student presentations, or role playing), it becomes very difficult to choose one or two class sessions that would be typical or would give a balanced picture of the instructor’s teaching. In some small classes, the presence of an observer may be more distracting than in a larger class, and frequent observations by several colleagues during a single term might be problematic. The number and timing of visits should probably be worked out between colleague evaluators and the faculty member being evaluated, to assure an adequate evaluation with minimal disruption.

Explicit, appropriate criteria and guidelines

A set of explicit criteria by which colleague observers are asked to judge the quality of teaching will make the evaluations much more reliable and the evaluations made by different colleagues more comparable. For colleagues observing strictly for the purpose of evaluation, the criteria help to guide the observations. For colleagues who have ongoing contact and observation, they help to summarize the impressions developed over numerous observations. The number of criteria should be kept small and appropriate to the type of teaching done in the department. The format may consist of open-ended questions, or rating scales, or a combination of these.

Criteria should reflect aspects of teaching on which there is broad departmental consensus and for which faculty observers would be in the best position to provide information. For example, faculty observers might be asked how well prepared the instructor was for the class session, but should not be asked to comment on the instructor’s accessibility to students outside of class if this has not been observed. Colleagues may reasonably be asked to comment on the instructor’s coverage of a topic or on the appropriateness of the teaching strategy, but should not be asked to evaluate student motivation or satisfaction, which can only be inferred at best. Comments about actual student participation, however, would be appropriate.
 

Special teaching situations

Colleague observation and evaluation of clinical teaching present problems analogous to those which arise with small classes. Clinical teaching takes place most often in a one-to-one or small group context. In this case, the presence of several colleague observers would be intrusive and might significantly disrupt the teaching and practice situation. On the other hand, many clinical settings provide a natural situation for colleague observation. Indeed, colleagues often work side by side in clinical settings and frequently observe one another’s teaching. Observation for evaluation should take advantage of these opportunities in much the same manner as evaluation based on observation in team-taught courses. There should be some indication as to the content, frequency, and length of the observations on which the evaluation is based. Where such natural observation is not possible and special visits for colleague observation-evaluation are needed, only one observer should have an opportunity to make several observations over a period of time. The development of criteria for the evaluation of clinical teaching should follow the guidelines for regular courses, although the specific items would be different and would be sensitive to the nature and purpose of clinical teaching. For example, in medicine, the observer might be asked to comment on the instructor’s integration of biomedical theory and clinical management or on the actual demonstration of a procedure or technique.

Constructive feedback

Feedback to the faculty member is an important consideration in designing departmental peer review procedures. For evaluations to be useful for the improvement of teaching, feedback and discussion are essential, yet this may present certain problems of confidentiality when colleague evaluation is conducted as part of the personnel process.

Sensitive implementation

Many instructors are understandably anxious about peer evaluation. Departments implementing new systems of colleague observations should be sensitive to the problems and insecurities among faculty that will inevitably arise. The suggestions summarized above are also useful for instructors to keep in mind when observing their TAs in discussion or laboratory sections and writing up their evaluations.

Data from oneself

In conducting a self-review of a course, faculty members may wish to compare their own pre-course objectives or expectations with perceived post-course outcomes. A model whereby the instructor assesses the abilities and knowledge of students before and after the introduction of an innovation or improvement effort is especially useful for evaluating new courses or courses with significant changes in content or structure. Often, however, instructors may not have planned for a self-evaluation before the course began; the self-evaluation can then look only at course outcomes.

When faculty members are interested in examining their own teaching behavior, rather than course outcomes, they can follow an end-of-course format similar to that used by students. An instructor might use the same form and complete it from the point of view of self-perceived behavior. Alternatively, they might benefit from completing the student form from the perspective of what they expect, on the average, that students will say; instructors might complete a self-evaluation form at the same time that students complete their evaluations and then compare results. Also, instructors can arrange to videotape a class and then observe their performance, focusing on particular teaching skills of interest. While this latter technique can be extremely valuable, it is usually best achieved with the help of a faculty consultant, who can help the instructor to focus on the key elements. Individuals looking at videotapes of themselves often are biased towards seeing only the negative elements.

Faculty should expect that their self-review in most cases will be more favorable than reviews by students. If these self-evaluations are included as part of the dossier (see “Documenting teaching using a teaching portfolio or dossier” section below), instructors may wish to comment on any deficiencies noted or discrepancies between self and student evaluations. The comment should not attempt to defend one’s self-review as being more accurate than that of the students, or to discourse on every aspect of the course. Rather it should be considered as additional information that can assist reviewers in interpreting student data or in understanding how the self-review contributed to course changes or modifications in teaching style.
 

Suggested Readings

Arreola, R. (2000). Developing a comprehensive faculty evaluation system: A handbook for college faculty and administrators on designing and operating a comprehensive faculty evaluation system. Bolton, MA: Anker Publishing.

Angelo, TA. & Cross, K.P. (1993). Classroom assessment techniques: A handbook for college teachers, (2nd ed.). San Francisco, CA: Jossey-Bass.

Brinko, K.T. (1991). The interactions of teaching improvement. New Directions for Teaching and Learning, 1991(48), 39-49.

Cashin, W.E. (1990). Student ratings of teaching: Recommendations for use. Retrieved March 22, 2007 from http://www.idea.ksu.edu/papers/Idea_Paper_22.pdf

Cashin, W.E. (1995). Student ratings of teaching: The research revisited. Retrieved March 22, 2007 from http://www.idea.ksu.edu/papers/Idea_Paper_32.pdf

Chen, Y. & Hoshower, L.B. (2003). Student evaluation of teaching effectiveness: An assessment of student perception and motivation. Assessment and Evaluation in Higher Education 28(1), 71-88.

Cohen, P.A. (1980). Effectiveness of student rating feedback for improving college instruction: A meta-analysis of findings. Research of Higher Education, 13 (4), 321-341.

Davis, B.G. (1993). Tools for teaching. San Francisco, CA: Jossey-Bass.

Duncan, D. (2005). Clickers in the classroom. Upper Saddle River, NJ: Pearson Education.

McKeachie, W. & Svinicki, M. (2006). McKeachie’s teaching tips. Boston, MA: Houghton Mifflin.

Seldin, P. & Associates (1999). Changing practices in evaluation teaching: A practical guide to improved faculty performance and promotion/tenure decisions. Bolton, MA: Anker Publishing.