CP 6691 - Week 2 (Part 2)

Evaluating Measures Used in
Social Science Research


Interactive Table of Contents (Click on any Block)

Part 2 of 2
Evaluate a Sample Study That Uses Paper-and-Pencil Measures
Evaluate a Sample Study That Uses Questionnaire Measures
Evaluate a Sample Study That Uses Interview and Observation Measures

Assignment for Week 3

Evaluating a Sample Study That Uses Paper-and-Pencil Measures

Use of Embedded Headings and Intact Outline With Videotaped Instruction by Frank, Garlinger, and Kiewra [Study #8 in your Supplemental Book (SB) (binder of supplementary materials)]. This is an example of a study that makes use of pencil-and-paper tests as data collection (measurement) instruments. On the first page of this study we see an abstract and the Introduction section (untitled) which lays the background for the study. Note at the end of the introduction (on page 278) that this section ends with a statement of the study's purpose. If we apply the criteria stated in Chapter 6 of the text (dealing with what constitutes a good research hypothesis, objective, or question), we find that this purpose statement appears to be somewhat concise (albeit a long, complex sentence). The variables to be studied are stated (structural support, immediate and delayed recognition, and recall learning) but are not well defined. The statement does, however, address the relationship between these variables (what combination of structural support optimizes recognition and recall learning...). So, this statement could be considered a fair research objective (It's not an hypothesis because it does not predict a particular outcome; and it obviously is not a question.).

Let's move to the Method section. At the top of page 278 in the study (right-hand column) we find where the researchers created the measures for this study ("We prepared a 25-item completion test and a 25-item multiple-choice test ... ."). Since these are researcher-made tests rather than standardized tests, you should realize that it the researcher's responsibility to determine both validity and reliability of the tests. The first paragraph at the top of the right-hand column on page 278 gives information on reliability and validity. The first sentence of the paragraph ("... based on material presented in the lecture and wrote test questions ... with each test item consisting of paraphrases of information presented in the lecture."). This sentence describes content validity (test questions are based on the content of the lectures). The last sentence in this paragraph describes how test reliability was assessed. Therefore, the researchers met their responsibility of determining both validity and reliability. If you recognized both validity and reliability assessments, you should find this to be a strength of the study.

But, what if you didn't recognize the validity assessment. Then you would evaluate this study by saying that the researchers did assess reliability of the measures but you saw no evidence that they assessed validity. You should, then, see this as a weakness (threat) because you should have realized that since the researchers created the tests, they were responsible for assessing both validity and reliability.

Some students have more knowledge of validity and reliability than others. I don't expect all students to possess that kind of knowledge (it's not a prerequisite for the course). It's more important for you to recognize that the researchers are responsible for this assessment since they created the measures.

In the next paragraph (under the heading Subjects and Design), we find out about the subjects of the study, how they were selected, and how many there were. At this point in your learning about evaluation, I only want to point out the fact that these subjects are college students (enrolled in an educational psychology course). Therefore, we can reasonably expect these subjects are able to read (though it's not a foregone conclusion that all college students today can). And, since the tests are based on the content of the course these subjects are taking, we can expect they have the information necessary to accomplish the tests (although they may not all do well on the tests - which is the focus of the study). So, we say from reading this first sentence that the measures are applied to an appropriate sample.

Even though we only covered a small fraction of this study, I hope you can begin to see that you can get a lot of information from a report if you know what to look for, which is what you'll be learning throughout this course. If there are any areas of this evaluation you don't understand, review Chapter 5 in your text and, if necessary, discuss your questions with your fellow students and the instructor.


Evaluating a Sample Study That Uses Questionnaire Measures

Teachers' Use of Homework in High Schools by Murphy and Decker [Study #9 in the SB]. This is an example of a study that makes use of questionnaires as data collection (measurement) instruments. Let's turn to the first full paragraph in the left-hand column on page 262 of this study. We find in the first sentence the objective of this study: "In this article, we present a descriptive picture of how teachers in high schools use homework." This is clearly not a hypothesis, since it doesn't predict an outcome; and it obviously isn't a question. You may think this statement doesn't meet all the criteria necessary to be a research objective since it doesn't identify the variables to be studied. But, actually, it does in a way. As you'll discover in a later lesson, this is a descriptive study. The purpose of descriptive studies is to describe attributes of some group of subjects. In this study, the group of subjects is high school teachers, and the attribute being described is how they use homework. These are the "variables" in this particular study. Don't be too concerned if you can't clearly understand what I just said -- it'll become clearer as you learn more. For now, what's most important is that you be consistent when you evaluate. That is, if you didn't think the statement at the beginning of the paragraph was a research objective (and if you also knew it wasn't a research hypothesis or research question), then you should have said something like this:

I did not see a clearly written research hypothesis, objective, or question, so I find that to be a weakness in the written report.

Notice the questions at the end of the first paragraph in the left-hand column. This is the closest we get to looking at the questionnaire used by the researchers. With these questions available to us, we can try to make some determination about whether any of them could be leading or psychologically threatening, and about the degree of subjectivity (judgment) called for in providing a response. Since the subjects of the study were teachers and principals (right-hand column in the section entitled Sample), we should expect them to have the information necessary to answer these questions. Furthermore, in looking over the questions, it appears that the first 11 are straightforward, factual questions requiring little if any judgment to render an answer. Although, we could say that some of the questions might be psychologically threatening to some teachers that might not assign or grade homework according to the guidelines laid out by the school. Some teachers might feel that their responses could be used against them by principals (unless the questionnaires were anonymous, which we are not told). There is also the possibility that some teachers may answer these questions in ways that put them in a positive light regarding their use of homework. Either way, there is a potential for self-report bias because of the nature of the questions being asked. Since these questions are the heart of the study, it's unlikely the researchers would be willing to delete them. Therefore, you should carefully read the rest of the study to see if the researchers do anything to eliminate or compensate for this potential self-report bias. Finally, questions 12, 13, and 14 appear to require a higher level of judgment than the first 11. This could lead to another source of error. Since all of these sources of error arise from the measuring instrument, we call them collectively instrumentation threats.

Now, let's move to the Method section of the report, in the right-hand column of page 262. We see that the sample is randomly selected (a definite strength when it comes to being able to generalize the results of the study) and that it is relatively large (approximately one-seventh of the target population -- public high schools in Illinois). In the second sentence of the paragraph, it appears that their was a 92 percent return rate from the questionnaire (of the 100 schools surveyed, "teachers and principals in 92 schools agreed to contribute"). However, one look at the rest of the paragraph (or at Table 1 at the bottom of the page) shows that the individual return rates were quite different depending on the type of school system. The largest returns came from rural schools, while the smallest came from mediopolis and central city schools. We can say, from looking at the number of respondents in the table, that rural schools were over-sampled and mediopolis and central city schools were under-sampled. What all that means is that you should have the greatest confidence generalizing results to other rural schools in the state of Illinois, and the least generalizing to mediopolis and central city schools.

Generalizability was strengthened by the initial random sampling of the schools. However, non-respondent bias (or volunteer sample bias as your textbook calls it) lessens the effect of random sampling. Fortunately, the response to the surveys was so high (at least for rural schools) that the results may still be generalizable. Ultimately, you, the evaluator, must decide if the return rate is high enough that the representativeness producted by the initial random selection is still preserved. If you believe it is, then you will be confident generalizing results, if not, the you won't!

Given the less than 100 percent response to the questionnaire, we should expect the researchers to have done some follow-up effort to try to increase response rate. Unfortunately, the study provides us no evidence that follow-up was done. We cannot say with certainty that it was or wasn't done, only that there is no evidence of it. Therefore, we find this to be a weakness (threat) in the study.

In the next section, entitled Data Collection and Analysis, we can see that the researchers developed a 37-item survey (questionnaire). On page 263, we see that the researchers had pilot tested the survey (another strength). As with Sample Study #8, if there are any areas of this evaluation you don't understand, review Chapters 5 and 6 in your text and discuss your questions with your fellow students and the instructor.


Evaluating a Sample Study That Uses Interview and Observation Measures

Changes in Social Play Behavior as a Function of Preschool Programs by Wintre [Study #10 in the SB]. This is an example of a study that makes use of both interviews and direct observations. Look at the last paragraph in the Introduction section on page 295, right-hand column, just above the Method section. Here we have an example of a well stated research objective ("... to compare directly the social play behaviors expressed at two schools by observing children in their own classrooms with their classmates and teachers.") Actually, a second research objective follows the first. These statements are concise. They also identify the variables to be studied (social play behaviors and variables within school policy), although these are not enumerated. Does the researcher expect to study all such behaviors and school policy variables? Of course she doesn't. But what she has told us in these statements is enough to give us a pretty clear idea what the rest of the study will be about. That, after all, is the function of a research objective, question, or hypothesis.

In the Method section, the researcher tells us where the subjects for her study came from (two schools in a large metropolitan Canadian city). On page 296, the researcher continues to tell us something about how she chose the subjects for the study, as well as some descriptive information (demographic data) about them, such as their age and gender. The gender description may be a little hard to find. It's in the last sentence of the first full paragraph in the left-hand column ("I balanced the groups according to their sex."), which means she had equal numbers of males and females.

In the next paragraph, we find that the researcher did the interviewing herself. This seems like a clear violation of the principal of having the interviewer know as little as possible about the research. There is an obvious potential for interviewer-induced bias. We must now carefully read on to see if the researcher does anything to eliminate or reduce this potential bias. We don't have to read very far (the last sentence in this paragraph) to see that the researcher audiotaped and transcribed the interviews. What effect did this have, do you think? (Before reading my answer in the next paragraph, try to figure it out for yourself first.)

By audiotaping and transcribing the interview questions and responses, the researcher could review each question for signs of leading questions. If she found any, then she would be able to eliminate the response and remove the interviewer-induced bias. So, by doing this, the researcher turned an obvious threat to the validity of the study into a strength.

Notice also in this paragraph that the researcher asked questions of a factual nature (low inference or low judgment) as well as questions about "educators' views" (somewhat higher inference). This could lead to some degree of an instrumentation threat. It would ultimately be your personal decision whether this threat was serious enough to cause you to reject the findings of the study. Another possible concern was that there was no evidence that the interview protocol was tried out before the interview (pilot tested). Finally, although it doesn't state anything in the report, we may be able to assume that the researcher was trained in interview techniques.

In the next section Coding and Observation Period, we see that two observers (coders) were used in the study and that they had received training. We also know that they were conspicuous to the subjects of the study ("... on the fringe of the play area and in full view of the children.") It's easy to imagine that strange adults (probably holding tablets or clipboards and writing things) might be a distraction to young children. When we see what behaviors were being observed, we find that they involved different types of play (solitary or group) or watching others playing. Could these adult observers, conspicuous as they were, be an interference in the normal play behaviors of the children? The answer is yes. We can't say how long one or more children might be concentrating on the adults rather than their socializing play activities. But any time away from normal play would be interference. So, here we have a case of observer-induced bias (albeit unintentional).

Once again, the cleverness of the researcher is tested to come up with a way of trying to reduce or eliminate this bias. She does it rather elegantly by (as she says in the next paragraph) modifying the Parten observation scale she is using by adding "... a seventh category, adult-directed behavior (child directs attention to or interacts with, an adult)." Adding this behavior takes advantage of the conspicuousness of the observers and the natural curiosity of children. Here, again, the researcher has turned a potential threat into a strength.

In the first paragraph at the top of the right-hand column of page 296, the researcher tells us about the observation schedule (order of observations). She says that each child was observed for only 30 seconds at a time (quite a short period). But, a total of 60 observations for each child was done over a 5-day period. This amounts to an overall total of 15 minutes of observation over the 5-day period, which is plenty of time to get a good cross-section of social play behaviors. Imagine for a moment if the researcher had done a single 15-minute-long observation for each child. It's quite possible that some children would get so engrossed in one or two play activities that it would use up the entire 15-minute observation period. However, the way the observations were done in this study, it guaranteed that a wide cross-section of behaviors would be displayed by each child. As I said in an earlier part of this lesson, longer observation periods are not necessarily better.

One final point before leaving this study. In the Results section, under Observational Data on page 296, notice that the researcher reports the interrater (inter-observer) agreement as 84 percent. This is quite good for a seven-item behavior scale, and implies that the observers must have, indeed, been trained well. As with Studies #8 and #9, review any areas of this evaluation you don't understand in Chapter 6 in your text and discuss your questions with your fellow students and the instructor.


End of Week 2 lesson

Assignment For Next Week
Gall: Chapter 7, 9, & 10 (Descriptive Designs)
SB: Study 12