CP 6691 - Week 6 (Part 1)

True and Quasi-Experimental Research Designs


Interactive Table of Contents (Click on any Block)

Part 1 of 2
Purpose of Experimental Research
How to Identify True and Quasi-Experimental Designs
Threats to Internal Validity of Experiments
Threats to External Validity of Experiments
The Value of Random Assignment in Experiments
Evaluating Sample Study #16 (Effectiveness of the DISTAR ...)
Part 2 of 2
Practice Evaluation Mini-Study 1
Practice Evaluation Mini-Study 2

Assignment for Week 7

Part 1 of 2

Purpose of Experimental Research

The purpose of experimental research (either true or quasi-experimental) is to determine actual cause-and-effect relationships between variables. Recall the purpose of causal-comparative research -- to determine possible cause-and-effect relationships between variables. When we discussed causal-comparative designs, we said that the only kind of design capable of determining cause-and-effect is a "group difference" design. Causal-comparative designs are group difference designs; so are experimental designs.


How to Identify True and Quasi-Experimental Designs

Recall, again, when we discussed causal-comparative designs that we discussed "markers" you should look for to determine that the design is causal-comparative. Well, similar markers exist for true and quasi-experimental to help you differentiate between these three kinds of group difference design types. There are three markers in all, and they can all be found in the Methodology section of the research report. Let's examine each of them.

The first marker is that the researcher has created two or more groups to study. If you can determine from the methodology section that the researcher has created groups, then you can eliminate descriptive and correlational designs. If, on the other hand, you determine that the researcher has not created groups, then you can eliminate causal-comparative, true experimental, and quasi-experimental designs. If the first marker shows that the researcher has created groups to study, then move on to the second marker.

The second marker is to identify the independent variable and determine if the researcher is manipulating it in real time, or if the independent variable has already occurred before the researcher did the study. Remember that in causal-comparative research, the independent variable (otherwise known as the treatment) had occurred before the researcher did the study. The researcher was, in fact, doing the study well after the subjects had self-selected to adopt or ignore the independent variable (like the cigarette smoking study). In experimental research (either true or quasi), the researcher actually manipulates the independent variable -- meaning the researcher determines which group will receive the treatment and which will not (alternatively, the researcher can give more of the treatment to one group and less to the other(s)). The group that receives the treatment (or more of the treatment) is called the experimental group, whereas the group denied the treatment (or receives less of it) is called the control group. By manipulating the independent variable in this way, the researcher has considerably more control over extraneous variables than does the researcher who does a causal-comparative research study. But the greatest amount of control comes by way of the third marker. So, if the second marker shows that the researcher manipulates the independent variable in real time (decides who gets the treatment and who doesn't), then the study is experimental. If the independent variable occurred before the researcher did the study, then you have a causal-comparative study. If the second marker shows that the study is experimental, then move on to the third marker, which will help you differentiate between a true experimental and a quasi-experimental design.

The third marker is to determine if the researcher is randomly assigning subjects to the experimental and control groups. If you determine that the researcher has randomly assigned subjects to groups, then you know that the study is a true experiment. Otherwise, it is a quasi-experiment. You will know whether or not the researcher randomly assigns subjects to groups, because the report must tell you so. If the report says only that "subjects were assigned to groups," then you cannot assume it was random assignment. Also remember one thing very well -- random assignment and random selection are NOT the same thing. Selection has to do with the way the sample is selected from the population. Assignment has to do with how subjects from the sample are assigned (or placed) into groups to be studied. If you confuse the terms "selection" and "assignment," you will most probably identify the research design incorrectly and, therefore, evaluate it incorrectly. BE VERY CAREFUL!!!!!

Now, you know how to identify all five types of research designs: descriptive, causal-comparative, correlational, true experimental, and quasi-experimental. Using the markers described above will help you completely determine a group difference study. If it is not a group difference study, the first marker will tell you so. Then, all you need to do it to decide between descriptive and correlational (where you will probably need to resort to the purpose of the research study). Now that you have the ability to discriminate between research design types, let's continue by discussing what to evaluate in experimental research designs.


Threats to Internal Validity of Experiments

Internal validity threats are extraneous variables that detract from your confidence in the results of the research by threatening their validity. To be a threat means that these extraneous variables are alternate explanations for difference between the means of the dependent variable between the experimental and control groups. Not all extraneous variables are potential threats in all situations. It is your job as evaluator to determine which are potential threats in a given study, and what they could do to invalidate the results. At the same time, you should also give some thought to how such a threat could be eliminated or reduced to produce a stronger, more valid, study.

These extraneous variables have been categorized into eight areas. They are listed and briefly explained in your textbook on pages 306-307. I won't duplicate that information here, but I will elaborate on each of these threats.

History:

People normally think of history as something that occurred in the past. When we speak of history as a threat in an experiment, we are speaking of an extraneous variable that is occurring simultaneously with the experiment, but is beyond the control of the researcher. For example, Let's say we're giving a test to students in a school building to test the hypothesis that students in brightly colored rooms will score higher on academic tests than students in pastel colored rooms. During the testing period, an automobile accident occurs outside the window of the pastel colored room, causing a lot of noise and distraction. Obviously, students are distracted during the testing session. After the test, it is determined that the mean score for students in the brightly colored room is significantly higher than the mean score for students in the pastel colored room. Can we conclude that the room color is the cause of the improved test score? No, we cannot, not with a high degree of certainty at least. Why? Because there is at least one other possible explanation for why students in the brightly colored room did better --- the distraction caused by the accident could have led to poorer test scores in the pastel colored room allowing for a statistically significant difference to occur between the two groups. We call this an historic event because it is one that occurred simultaneously with the experiment and which is beyond the control of the researcher.

That historic event occurred to only one of the groups being studied. But what about historic events that affect all groups at the same time? Here's an example. A researcher wants to do an experiment to test the effectiveness of a novel reading program on first graders. The study will last for the entire school year and will involve a set period of reading instruction each day. Are there any extraneous variables that are beyond the control of the researcher which could affect children's reading ability? Yes, there are probably several. Parents could tutor their children after school. But, this is not extremely likely. Most parents leave schooling to the school unless and until academic problems arise, and even then, all parents don't get involved. A more likely extraneous variable would be the existence of reading-type programs on public television, such as "Sesame Street," "Reading Rainbow," etc. Such programs do teach children to recognize words, and, therefore, to read. It's also reasonable to assume that children would have access to such programs since most people own televisions and these programs come on during after-school hours, at times accessible to young children.

Some historic events affect only the experimental or control group (as in the first example). These events are not usually reasonable to assume. You will have to be told in the research report that something out of the ordinary occurred to one group and not the other, and that what occurred was beyond the control of the researcher. Then, to be a threat to the study, all you need to do is to show how it could affect the dependent variable.

Most historic events will affect both the experimental and control groups (as in the second example). For these to be a threat to the study, you must argue that it is reasonable for such a threat to occur and how it could affect the dependent variable. Remember, an extraneous variable that cannot or does not affect the dependent variable is not a threat to the study.

Maturation:

Borg and Gall cover this one pretty well in the text, and we've discussed this threat before when we talked about Descriptive Research Designs. The thing to remember is that this is usually a long-term threat, meaning it is more of a threat in long studies. Also, like history, for maturation to be a threat, it MUST have a direct effect on the dependent variable. If you cannot imagine maturation having an impact on the dependent variable, then you have no basis for saying maturation is a threat in that study.

Testing:

This has to do with the problem of students becoming test-wise or remembering information from one administration of a test to the next. Obviously, for testing to be a threat in a study, the same test must be administered to the same subjects in two different sessions (pretest and post test). Furthermore, those sessions should not be too far apart in time. How far apart is too far? That's fairly easy to answer: how long is it reasonable to expect a subject will remember answers from a test.

There's a related effect of the testing threat. By giving a pretest to two groups (one that will receive a treatment and one that will not), that is similar to the treatment (say a test of mathematics ability and the treatment is a new way of learning mathematics), the researcher runs the risk of sensitizing subjects in the experimental group to the treatment. That is, giving a math pretest sensitizes subjects to be ready to learn math concepts. This isn't bad in and of itself, but the evaluator should realize the possibility of this bias occurring. If it does occur (and you usually have no way of knowing if it does), it could make the experimental program look better than it really is. This isn't necessarily bad either. That is, the researcher could conclude that the combination of the pretest and the new mathematics treatment leads to improved math performance in students. A problem arises, though, if the researcher doesn't recognize the possible sensitizing effect of the pretest and concludes that the treatment alone leads to the recorded improvement.

There are several ways to eliminate this threat. One is to lengthen the study to such an extent that memory will cease to be a factor (usually a few months is long enough to do this). But if you can't change the length of the study, you could remove the pretest, thereby providing the subject nothing to remember for the post test. But, if your design calls for administering a pretest, the only other way to avoid the testing threat is to make the pretest different from the post test (even a small difference will be sufficient).

Instrumentation:

There is a whole host of threats that fall under this umbrella term as we discussed in the Week 2 lesson. At the risk of being redundant, let's repeat them here. You have seen several already in Chapter 6 of your text. The questions posed there relating to the four types of measurement instruments all represent potential instrumentation threats.

Another type of instrumentation threat is when the researcher fails to assess the reliability and validity of the instruments used to collect data in a study. Keep in mind that the researcher does not have to assess the validity of standardized instruments he/she uses (that's already been done, usually by someone else and would be available for review in a reference like Buro's Mental Measurements Yearbook, Tests in Print, or similar sources. However, the researcher should provide evidence that the instrument is appropriate for the group it is being used on. Also, the researcher is always responsible for assessing reliability of the instrument, since reliability is a function of the specific group being measured.

Yet another type of instrumentation threat is caused when slightly different forms of the same test are used in a pretest-posttest design (testing before and after a treatment). If the two tests are different, then the testing threat disappears (if you don't understand this, re-read the previous section on the Testing Threat). However, then the instrumentation threat comes into play. The question arises as to whether the two tests are equivalent (of the same level of difficulty, measuring exactly the same content, etc.) Unless tests are formally equated (which is done with alternate forms of standardized tests, making them free of this instrumentation threat), there's no way to be sure they are equal. This usually occurs when researchers make their own tests for use in a study, and make slightly different pre and post tests.

Statistical regression:

This is the most conceptually challenging of all the threats. It seems like magic, but it really isn't. It essentially works this way. Say you administer a test to a group of subjects who all score extremely low (the group mean is very low). Then, let's say you re-administer the same test at some later date (a day later, a week, a year, it really doesn't matter). What you will see is that the group mean score on the post test will mysteriously increase (or regress) toward the mean of the general population score for the test. The crude figure below illustrates this concept.

The opposite would occur if you gave the first test to a group and the group mean was extremely high. Upon subsequent testing, you would find the group mean regressed (or declined) toward the mean of the general population.

Why does this occur? Well, here's a very simplified, non-statistical explanation. Let's say we administered a test to 10 students and they all did very well on the test. Say, everyone got 100%. Then, of course, the average of the class of 10 students would be 100%. Now, if the general population of school-age children took this test, they might average 82%.

Now, let's say we the same test a few days later. The conditions of the test are the same. But let's say that one of two of the children didn't feel very well that morning (one was sick, another didn't get much sleep the previous night and was very tired). Obviously, they won't do their very best on the test. As a result, 8 student get 100%, one gets a 97% and one gets a 93%. That gives a class mean of 99%. What has happened is that the group score on the test has regressed (declined) toward the general population mean. We could blame this on the "ceiling effect" of the test. When people score very high on a test, there's not much room to improve. In our example, since every student got 100% on the first test, there was no room to improve. Then on subsequent testing, any of a variety of random events could occur to reduce the performance level of students, which would show up in their test scores.

We could construct a similar example with very poorly performing students, but the results would be similar. On subsequent testing the group mean would regress (increase) toward the general population mean. That's what we call statistical regression.

To learn more about statistical regression, go to Sabermetrics 101:  Regression Toward the Mean. It's a good introduction to the topic presented in an easy to follow manner.

Differential selection (or Differential assignment):
Very simply put, this threat is caused by the fact that the experimental and control groups in the study were not randomly assigned out of the sample. Actually, I prefer the term differential assignment because it reminds you that this threat is caused by non-random assignment of the sample to the groups in the study, NOT to how the sample was selected from the population. Don't forget that the method used by the researcher to select the sample from the population has no bearing on how subjects are assigned to groups. That is, a researcher could randomly select the sample for a study and not randomly assign the subjects to groups. Or, the researcher could not randomly select the sample, but randomly assign the subjects to groups. Or, the researcher could both randomly select subjects from the population and randomly assign them to their respective groups. Any combination is possible. But, don't get confused. Concentrate on how the researcher assigns the subjects to the groups in the study. That will tell you if differential selection is a threat or not. Simply, if the researcher randomly assigns subjects to groups, then differential selection is NOT a threat, but it is a threat if subjects are not randomly assigned.

Selection-maturation interaction:
Each of these internal threats can interact with the others. The strongest of these interactions is the one between differential selection (selection) and maturation. Essentially, the problem is this: If we do not randomly assign subjects to the experimental and control groups, then it might be possible that they are of different ages or that they will mature at different rates. Thus, not only are the individual threats (maturation and differential selection) threats in their own right, but the interaction of these two is also a threat. That's what we call "selection-maturation interaction."

It's surprisingly easy to tell if this interaction will be a threat in a given study. All you have to do is to look on your list of threats, and if you identified maturation and differential selection as threats, then if follows that the interaction must be a threat. Logically, then, if you find that either maturation or differential selection in NOT a threat, then it is impossible for the interaction to be a threat (there's nothing to interact).

Experimental Mortality:
This may be the easiest threat of all to recognize. Simply look to see if subjects dropped out of the study. We saw this happen a lot in studies that used questionnaires to collect data. Those who failed to return their surveys were automatically considered to be dropouts (mortality). When you're reviewing studies for mortality, be sure to check all Tables in addition to the text of the article. The researcher should tell you if any mortality occurred. But, sometimes the researcher won't say anything in the text, but mortality will show up in the data tables. When you look at data tables, look for an indication of the number of subjects (usually indicated by the letter "n" or "N"). If the table says something like "N = 102," that means the number of subjects in the table is 102. Check this against the number of subjects mentioned in the Methodology section of the study. If the numbers don't match, that's usually an indication that mortality took place.

Of course, it's troublesome when subjects drop out of a study, because the researcher loses valuable data. When the number of dropouts is high compared to the sample size, it may inject a bias into the final results, because there may be something similar to all or most dropouts that would tend to make the remaining sample nonrepresentative. This, you should recall, will affect the ability to generalize results.

You should also pay attention to the different number of subjects that drop out of the experimental group compared to the number dropping out of the control group. Usually, there is greater mortality from the experimental group than the control group. This makes the two groups more different from each other. That is, we know the groups are supposed to be different because one gets the treatment and one does not. But, the groups should be as close to identical in every other important way as possible. So, if we see a statistically significant difference between the groups, we can then say the it must be caused by the ONE thing different between the groups (which is the treatment). However, mortality will also make the groups different on whatever variable contributed to the mortality in the first place (which are usually unknown). So, we're left with the knowledge that the groups are different because of the treatment and also because of some mysterious reason that led to the mortality. How do we, then, know which of these is the true cause for any significant difference between the groups???? The answer is that we don't know. In this case, mortality becomes what is known as a confounding variable because it confounds (or confuses) the answer to the question of what caused the difference between the groups.

There is one additional internal threat category not addressed on pp. 306-307, but is one we've already covered.  It is the Appropriate Use of Inferential Statistics...the rule you hopefully learned from the Week 3 lesson.  If you've forgotten the rule, return to the Week 3 lesson and review it.


Threats to External Validity of Experiments

External validity threats are extraneous variables that detract from your ability to confidently generalize the results of the research to other populations. Not all external validity threats are potential threats in all situations. As with the internal threats, it is your job as evaluator to determine which are potential threats in a given study, and how they would limit your ability to generalize results.

These extraneous variables have been categorized into three areas. They are listed and briefly explained in your textbook on pages 238-239. You may recall that we discussed the first two of these external validity threats in the Week 2 lesson. I'll mention them here briefly. But if you need to, please go back and review in the Week 2 lesson.

Population validity:
This is a measure of how representative the sample is of the population. It is based on how the sample was selected from the population and the size of the sample selected.

Personological variable validity (or Demographics):
This is a measure of whether the researcher is collecting data on all the relevant personological (or demographic) variables of the subjects selected for the study.

Ecological validity:
The root word here is "ecology" or environment. That's the key to this threat. It is concerned with the environment in which the study is conducted. Remember, that external validity relates to the ability to generalize findings of the study to a population. When you consider the issue of generalizing, you must also stop to consider how the study was conducted and ask yourself if it was a "natural" environment. For instance, the results of a study done is a typical school environment will most likely transfer to other similar school environments. A study done in a clinical, laboratory situation might not transfer easily to a typical school situation, because in the laboratory, noise, light, temperature, etc., are all controlled. Not so in a typical school (as anyone should realize). So, an ecologically valid study is one which is conducted in the environment in which the results are likely to be used.

So, now you know all the internal and external threats to validity that can affect any type of quantitative research study. Our trusty model (figure below) shows them all.
 


The Value of Random Assignment in Experiments

The real value of random assignment in experiments is the ability it provides to researcher to effortlessly control for a host of extraneous (potentially confounding) variables. By randomly assigning subjects to groups, any extraneous variable that could affect the dependent variable will be randomly distributed throughout both groups. At the end of the study, when the researcher subtracts the mean scores of the two groups, the randomized extraneous variables will cancel themselves out because they were in roughly equal amounts in both groups to begin with.

To envision this, think of a large bowl of colored candy bits (like M&Ms® or Skittles®). Let's say you randomly (blindly) grab a candy bit and put it in the "experimental" glass and then in the "control" glass, and repeat the process 10 times. Then count the number bits of each color in each group. You won't have equal (or even closely equal) numbers of each color in each group. But, randomly assign 90 more bits to each group. Now there are 100 candy bits in each group. The distribution of colored candy bits is much more equal now than it was before. And if you continued randomly assigning colored bits, more of the colors become equally (or very nearly equally) distributed between the groups. If you think of extraneous variables like the colors of the candy bits, you'll get the idea of what I'm saying.

However, the reality of social science research is that researchers do not usually deal with hundreds or thousands of subjects in a single study. With only 40 or 50 or 80 or so subjects to be assigned between two or more groups, the chances are that NOT all the important extraneous variables will be randomized. That, in fact, is why some researchers use a pretest -- to measure one or more important extraneous variables to see is they are, in fact, randomized out across the groups.

The most powerful effect of random assignment is that it nullifies most of the threats to internal validity we discussed earlier. It is able to do this because most of the variables responsible for these threats are balanced between the groups being studied (we say, therefore, they are controlled). For example, let's say a researcher is doing a study to test a novel approach to improve writing ability among high school students. Students are randomly assigned to the experimental and control groups. The experimental group receives the novel writing treatment, while the control group receives nothing. The study lasts for a school year. Now, given this scenario, let's look at, say, the history threat. Is there anything going on simultaneously with the experiment that could have an effect on the students' writing ability. The answer, of course, is yes -- students are attending numerous classes where they have to write papers. The act of writing papers will also lead to improvements in writing ability over the course of a year. So, this might be confounded with the writing treatment. But wait --- since the researcher randomly assigned students to groups, it's just as likely that students in the experimental group will be taking courses requiring them to write papers as it is for the control group. Therefore, if the act of simply writing papers actually does improve writing ability, then we should expect the same level of improvement in both the experimental and control groups. Consequently, this history effect will cancel itself out of the study (whatever occurs equally to all groups in a study is considered to be a controlled event).

Random assignment has similar effects on other internal validity threats. You just have to reason through the effect of random assignment, as I did above, to determine if random assignment will control for the threat. One more thing, though. The threat of mortality operates against the effects of random assignment. Mortality usually affects the experimental group moreso than the control group. Thus, a large imbalanace in the drop-out rate between the experimental and control groups will essentially unbalance the groups and undo the effects of random assignment. What this means to you the evaluator is this: if you detect a large imbalance in mortality between the experimental and control groups, then you may have to reconsider all the threats again, because the groups may no longer be random as they were initially.


Evaluating Sample Study #16
(Effectiveness of the DISTAR Reading I Program
in Developing First Graders' Language Skills
)
1. What kind of research design is this?

Rather than just blurt out the answer, let me run through the thought process you should have gone through to arrive at a correct answer. First of all, from the Methodology Section, it's clear that the researcher is forming two groups for this study. Therefore, we can rule out descriptive and correlational research designs. It should also be clear that the researcher is manipulating the independent variable, determining which group will receive which treatment (the DISTAR or the basal reading program). Therefore, we can also rule out the causal-comparative research design. That only leaves us with true or quasi-experimental designs. To decide which of these two it is, we have to check to see whether or not subjects are randomly assigned to the experimental and control groups. We can also see from the Methodology Section that the experimental group was selected from one school while the control group was selected from another school. The lack of random assignment of subjects makes this a quasi-experimental design.

Let me bring something to your attention before leaving this question. If you look on page 291 in the first paragraph under the Procedures Section, you will see where the researcher talks about randomly selecting students from each of the schools. Don't be misled by this. Random selection is not the same thing as random assignment. Regardless of what the researcher did to the subjects in each school, the fact remains that only children from one school had the opportunity to be in the experimental group, and only children in the other school had the opportunity to be in the control group. Thus, the attribute of equality was violated in assigning this sample to the various groups. This eliminates the possibility of random assignment.

2. What is the research hypothesis, objective, or question(s), or if none, so state.

This study has a research question in it, found on page 290, right-hand column, last sentence in the Introductory Section: Would the effects of DISTAR be better for children with lower initial language ability than for children with higher ability as measured by the Metropolitan Readiness Test (MRT)?

3. To what population would you feel comfortable generalizing results of this study?

If you look on page 291 in the first two paragraphs under the Procedures Section, you will see a description of the populations from which the sample in each school was selected. The researcher says he listed the names of 80 first graders, from which he randomly selected 40 after eliminating those who might have difficulties reading English. Are we to assume that this school only had 80 first graders to choose from? Lacking any other information, this is the only assumption we can make. Thus, the question to be answered is whether we can confidently generalize the results of this study to the 40 students in the experimental school who were not chosen for the study. Since subjects were randomly selected and a realtively large sample was selected (50% of the population), we could feel fairly confident in generalizing to this group (except those who are repeaters, participate in special education, and possess Spanish surnames -- because these were excluded from the study in the first place).

Now, let's expand to a larger level by asking whether we can confidently generalize to future first-grade students in this school? To answer that question, we need some additional demographic information about the sample used in the study. From the Subjects Section on page 290, we find that the researcher provides us with gender and racial makeup of the groups. We also know that both school districts are from relatively poor socio-economic status areas of the country (16th and 5th lowest on a scale of 164). If you believe this is sufficient demographic information to give you confidence in generalizing to future first-grade students attending these school districts, then you should use that as your rationale for justifying your answer. If, however, you believe there is insufficient demographic data provided, then you should identify what additional demographic information you would need and why it would be needed to increase your confidence. The key here is that to be necessary, demographic data should have a direct impact on reading achievement (the dependent variable in this study).

4. Identify the strengths and threats to validity in this study.

Strengths:
  • Reliability reported for the SIT and MRT tests used in the study. Also, since standardized tests were used, there is no real need for the researcher to assess the tests' validity.

  • Researcher omitted students who could potentially confound the study results because of poor language skills caused by special disabilities (special education classes), extra learning caused by repeating the same grade, and language problems caused by not having English as a native language (students with Spanish surnames).

  • Researcher use a parametric inferential statistic (Analysis of Covariance -- see first paragraph of Results Section on page 291). He used this statistic on data from the MRT and SIT. Since I am not sure if the MRT and SIT generate continuous data, I would site the following rule: "If the data from the MRT and SIT are continuous, then the researcher used an appropriate inferential statistic; but if the MRT and SIT produce discreet data, the the researcher used an inappropriate inferential statistic."

Internal Threats:
  • History: Since language ability is the dependent variable, we have to look for any historical event that could affect language ability. We aren't told in the report of any historic events occurring during the study. So, we need to look for any historic events that are reasonable to assume could be occurring to these subjects during the year. About the only historic event I can think of that is reasonable to assume would be after-school TV programs like Sesame Street, Reading Rainbow, etc. These kinds of programs would definitely affect a child's language acquisition and reading ability. So, I've got a historic event. Now, is it reasonable to expect these subjects to watch these shows? They're from very low SES areas of the country (no matter, they may very well still have TVs). Would it be reasonable to expect their parent(s) would encourage them to watch such programs? And, if so, would it be reasonable to expect one group might make more use of these TV programs than the other group (so that history would cause a differential effect between the groups)? Depending on how you answer these questions, you would either say that history could be a threat in this study or not. Whichever answer you select, you must support it with reasoning like this.

  • Maturation: The study is a year long and we are dealing with first grade students. Surely they will mature mentally and physically throughout the year. The question to be answered is could that maturation explain an improvement in the subjects' reading ability. I could argue that as the human body matures, so do fine motor coordination skills. The muscle groups that control fine motor coordination also control things like eye movement and tracking -- which helps children follow words along a page as they read. So, maturation definitely has an effect on reading ability. The question to answer now is do we expect these two groups to be so different in the maturation rates that there would be a differential effect between the groups caused by maturation? From the Subjects Section on page 290, we know that both groups of students are from the same school district (even though they are from two different schools). Consequently, there should be no difference in the ages of the students (there would be no difference in the entrance age policies within the same school district). In fact, the only real difference is in the racial makeup between the two groups. Since maturation is not affected by race, and since we can expect the mean age of students in both groups to be roughly equivalent, there is probably no difference in maturational rates between the two groups. Thus, even though maturation will affect this study, it is probably not a threat to the results since its effect will cancel out between the two groups.

  • Testing: This is a pretest-post test non-equivalent group design. However, the pretest and post test are two different tests. The MRI (a reading readiness test) served as the pretest, while the SIT (an intelligence test) was the post test. Since the pretest and post test were different from one another, there is no possibility that testing could be a threat in this design.

  • Instrumentation: Reliability is reported for both the SIT and MRT tests used in the study. Also, since they are standardized tests, there is no need for the researcher to assess the tests' validity. 

  • Statistical Regression: There is no indication that those selected for this study were chosen because they were particularly at risk or exceptional in reading ability. Therefore, we can conclude that extreme-performing groups were NOT used in this study. Consequently, statistical regression cannot be a threat in this study. There is also another reason why statistical regression is not a threat -- because the pretest and post test are not the same test (a requirement for statistical regression to exert an effect). Citing either reason (or both) would bring you to the same conclusion and would earn you credit on a test.

  • Differential Selection: There are two possible answers here depending on how sophisticated you are. The simplest answer is to say that differential selection is a threat in this study because subjects were NOT randomly assigned to groups. If you were to make a more sophisticated analysis of the situation, you would ask yourself this question: "Even though subjects are not randomly assigned to groups, is there any evidence that they are still equivalent on the important variables in the study?" The "important variables" would be those that afffect the dependent variable (reading ability). To answer this question, you would look at the demographic data collected and reported by the researcher and realize that the subjects were roughly the same age, gender distribution between the two groups was about equal, and their families' SES levels were about equally low (all of these variables have some effect on reading ability and readiness). Given these things being roughly equal, and since both groups were from the same school district, you could argue that despite non-random assignment, there is reason to believe that these two groups were roughly equal on the aforementioned variables. Therefore, you could conclude that differential selection was NOT a threat in this study. Either answer would receive credit if given on a test as long as it was suitably justified with the proper rationale.

  • Selection-Maturation Interaction: Since we argued that maturation is probably not a threat in this study, there is nothing for differential selection to interact with. Thus, this is not a threat. Of course, if you also reasoned that differential selection was not a threat either, then that's two reasons why this interaction could not be a threat.

  • Mortality: There is no indication from the study that anybody dropped out before the study concluded. So, we can say that there is no evidence of mortality being a threat in this study. Note that you should also check the tables looking at the sample sizes reported there. They should add up to 80 since that's the number of subjects in the study. But, if you look at Table 1 or Table 2, you will see something peculiar. The columns labeled df stands for degrees of freedom. It is always 1 minus the number of subjects in the sample. So, if you add up the df columns in either table, you come up with a total of 79. This means that 80 subjects were used in each of these statistical tests, and confirms that 80 people took the post test (SIT).
     
  • Appropriate Use of Inferential Statistics:  Identify the dependent variable in this study.  It is reading ability.  What form of data are being generated by the instrument that measures reading ability (the SIT)?  If you can determine whether the SIT is generating continuous or discrete data, then you can invoke the proper side of the rule we learned in Week 3.  However, if you cannot determine what form the data from the SIT are in, then simply state both sides of the rule with a statement something like this:

    I'm not sure what form the data being analyzed from the SIT are in, but if the data are continuous, then the most appropriate inferential statistic to use would be parametric; however, if the data are discrete, then the most appropriate inferential statistic to use would be non-parametric.

External Threats:
  • Population Validity: Since subjects were randomly selected from the population of first grade students at each school (see Procedures Section on page 291), and since 50 percent of the experimental population was used in the study, and about 35 percent of the control population was used, we could say that relatively large samples were used. Thus, population validity was relatively high.  This is a strength.

  • Personological Variable Validity: If you felt that all (or the majority of) the important demographic variables were collected and reported in this study, you rightly could say personological variable validity was high. On the other hand, if you could identify (and defend the importance of) one or more variables the researcher failed to collect, you could argue that personological variable validity was low. Either way, you could receive credit if your answer were logically supported.

  • Ecological Validity: Since this study was done in a typical school environment, its ecological validity is very high. This is a strenth.

5. Are there any ethical problems in this study?
 
There are no glaring ethical problems in this study, since both groups are receiving legitimate instruction in reading. No group is denied anything they would not ordinarily get, nor is either group subjected to a treatment that is potentially harmful to them.

If you have any questions concerning this evaluation (if you found things I didn't discuss here, or if you don't understand something I've discussed here), talk with other members of the class to see if you can resolve the issues with them. If not, discuss your questions with the instructor in class.


Proceed to Part 2 of 2 of the Week 6 lesson.