« Welcome to Step 5: Analyzing Evidence of Learning | Main | Philosophy Team (UW La Crosse) »

Comments

Bill Cerbin

Project Log 6: Summary of evidence for the second iteration of the research lesson taught march 2, 2005.

Summarize the evidence, identifying major patterns and tendencies in student performance.

Observers’ comments and evaluation: Four observers filled out an observation protocol. Below are the scores for each likert scale item [1 (totally disagree) - 7(totally agree)]. The variability of scores indicates pronounced differences among the groups in terms their understanding of the concepts. There was little or no evidence that students understood the concept of “construct.”

1. All members participated in the process 3, 2, 6, 5

2. The group was able to stay on track with the lesson (i.e. did not derail, discussing irrelevant information) 6, 2, 5, 5

3. The group seemed confused about the technical processes of the lesson 2, 5, 2, 2,

4. The group seemed confused about the concepts the lesson was addressing 2, 6, 2, 5

5. The group seemed to understand the concept of construct validity 5, 3, 5, 2

6. The group seemed to understand the concept of construct. NA, No evidence, NA, 3

7. The group seemed to understand the logic of construct validity 6, 3, 6, 2

Problems observed in some groups
• Bogged down in technical details such as how to word test items.
• Two of the observers watched groups that were particularly ineffective during the lesson. It was difficult to tell what each student actually thought or knew in these cases. This indicated that students could be disengaged, lost or inattentive during the lesson.
• Students did not discuss the concept of “construct” at any point in the lesson.

Brief Analysis of Groups’ Written Work
All nine groups described a study in which depressed and non-depressed individuals would take the group’s test. The groups’ ability to describe results that would support the validity of the test varied. One group appeared to be on track with the following:
• “Give our test to a group of 500 clinically diagnosed depressed people as well as a group of 500 people that have not been diagnosed as depressed. See if the scores of both groups are significantly different from each other. Clinically depressed people would score higher than non-depressed people.”
Another group suggested giving the test, and then having a clinician evaluate the participants. Those participants that scored high on the test should be diagnosed as depressed. Most other groups talked about the need for the test scores to “correlate” with a psychologist’s classification of the individual as depressed or not depressed.

All nine groups were able to describe the expected correlations between their individual depression tests and another test of depression (i.e. scores on the two tests should be related). Eight of the nine groups correctly predicted the correlation between their individual depression test and a math achievement test would be low/not significant. One group stated they expected a low correlation, but then went on to describe a negative correlation (i.e. “if a person scores high on the depression test, they should score low on the math test.”). Only one group tied the results to the validity of their test.
• “If our test is accurate it will have a high correlation with the depression scale and a low correlation with the math achievement test.”

Results of the “minute paper” at the end of class:
What was the most difficult part of the lesson?
1) Confused by how the math test was related to determining validity of depression test.
2) Lack of clarity of assignment—how much depth, detail, direction
Most important thing learned from the lesson
1) There are multiple ways to determine the validity of a test
What is still confusing?
1) Nothing!
2) Statistics

Describe major findings and conclusions about what, how and why students met or did not meet learning goals.
Results of Related Exam Questions: Three exam questions related specifically to the lesson. Results of the three questions were mixed.
• One of the processes used to examine construct validity is examining group differences. Explain the logic behind this process (2 pts).
o Correct answers:
 If the theory of the construct suggests two groups have different levels of the construct, and the test actually measures the construct, then the two groups should have different scores on the test.
 If a test is supposed to measure depression then when the test is given to a group of depressed people and a group of non-depressed people, then the depressed group should score higher than the non-depressed group.
o Only 27% of the class received full points for their answers, while 48% received no points for their answers. Those that received 0 points tended to focus on the need for a test to be valid for different demographic groups (e.g. ethnic groups)
 “A test must be able to measure results for all types of groups, or one group in particular. Because you can’t give a five year old a test meant for a 20-year-old and expect them to score well. Therefore groups must be examined to make up for the differences.”
 “Everybody is different thus when putting people into groups the groups will be different so in order to obtain the desired results from them it is impairable (sic) that the test be accurate in what it wants to measure.”
• I have developed Dr. V’s Attention Deficit Hyperactivity Disorder (ADHD) Scale, a 25 item paper and pencil self-report instrument to diagnose ADHD. I want to evaluate the construct validity of this instrument. Since it is often difficult to differentiate ADHD from anxiety, I want to be sure my test measures ADHD and not anxiety. I have collected data from 75 children. I gave each child my scale as well as a self-report anxiety scale. In addition, each child was observed by a trained research assistant for ADHD and anxiety behaviors. These data yielded the following multitrait-multimethod matrix.
(Note: a correlation matrix was provided.)

o What is the convergent validity evidence for or against Dr. W’s (ADHD Scale) test [make sure to list and explain the number(s)]? (3 pts)
 Correct answer: If Dr. W’s scale measures ADHD, then scores on that scale should relate to another scale that measures ADHD. The correlation is .62 (significantly different than 0) , therefore the validity of Dr. W’s test is supported
 Nearly 42% of the class received full points for the answer, and no students received 0 points. Students who failed to receive all the points tended to explain the correlation, but failed to include the “theory” behind the explanation.
o What is the divergent validity evidence for or against MY (ADHD Scale) test [make sure to list and explain the number(s)]? (3 pts)
 Correct answer: If Dr. W’s scale measures ADHD, then scores on that scale should not relate to scales that measure anxiety. The correlation between the self-report anxiety scale and Dr. W’s scale is .33 (significantly different than 0), therefore the validity of Dr. W’s test is not supported. The correlation between the behavioral observation anxiety scale and Dr. W’s scale is .08 (not significantly different than 0), therefore the validity of Dr. W’s test is supported.
 About 35% of the class received perfect scores on the question, while again, no students received 0 points. Mistakes on this question were similar in nature to the convergent validity question. Namely students correctly interpreted the correlations, but failed to frame them in the theory.

Based on your analysis how will you change the lesson?

Quicker transition to Step 3. My group did nothing for 20+ minutes. Consider addressing all the groups at the same time—even if they have not completed their work.

Presentation and analysis of the studies. This segment seemed repetitive. Groups presented the same types of studies with minor differences (e.g., number of subjects). Plus, none of the students asked any questions during this segment.

Foster better analysis of the studies. Rather than have each group present its study:
1) Select groups to present on the basis of type of study. Group 1 presents then ask for a group that has a different type of study. Ask students to point out key differences. Goal should be to categorize the studies and bring out the essential differences among them.

OR

2) Give groups a handout that identifies types of studies and their characteristics and ask the groups to categorize the studies presented.

STEP 4: Leave more time to analyze Step 4.
The group I observed discussed the step 4 handout for one minute. They tried to find a relationship between Math Achievement and Depression. Someone offered a plausible connection and that terminated the discussion. One member wrote an answer on the worksheet while the others contributed nothing. The answer was wrong.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment