When I heard that the College Board was going to remove Tests 1, 2, and 3, and that three of the four new tests would contain items from those three deprecated tests, I was worried that the ‘new’ tests weren’t going to be very new. Would Test 7 be the only new test? But I’ve been pleasantly surprised!
Some notes on Tests 8 - 10:
A little over half of the items (106/198) had never appeared on previous Bluebook tests.
The percentage of new items is highest on Test 8 (59% vs 52% for Test 9 and 48% for Test 10).
The percentage of new items is highest on the hard second modules (58%) and lowest on the easy second modules (48%). On the first modules, 53% of the items are new.
They reused 92 total items, but only 11 of them came from Test 1! By contrast, they reused 34 from Test 2 and 47 from Test 3.
I don’t know why, but I assumed that the old items on Test 8 would mostly come from Test 1, with Test 9 borrowing from Test 2 and Test 10 borrowing from Test 3. However, items from the old tests are scattered between the new ones. They also used one from PSAT 1.
Thank you very much to Danny Pernik, Noah Samotin, Josh White, and Mishkaat Rawjee, who assembled a lot of data for their own project and very nicely shared it with me.
I think there is a key message that comes from the fact that the CB removed some tests and did major editing: The practice SAT tests are not as strictly representative of actual tests as released tests from the old paper SAT or the ACT. (This is not unexpected, and actually, not terrible, but important to recognize.) (Additional evidence for this would be seen in the differences between the practice tests and test bank material not used on those tests.)
What this means is that variation in scores among practice tests is explained by variation in the tests themselves (as opposed to variation in student ability +/- progress) to a larger degree than such variation would have been explained in the more stable and representative released tests. This gives me more (possibly unearned) confidence in my thinking that there is a lot of variation across the practice tests that doesn't reflect likely variation in real tests in the future. This is important in discussing practice tests outcomes with students.
Presumably the current crop of tests does a better job of reflecting reality than the former body of tests, but we can't assume they are perfectly representative.