What can the first four SATs from 2015 tell us about the four Bluebook digital SATs?
Lessons learned from the last College Board revamp
We only have four digital SATs, with two more promised1 for 2024. Are they just as good as real tests? I analyzed the last batch of “pre-tests” and compared them to 10 actual tests that were released through the QAS program.
Background
When the College Board changed the SAT in 2015, they released four tests2. These tests had not been administered to students – they didn’t produce scores that were then sent to schools. Rather, they were released to give students an idea of what was on the test. Now that this version of the SAT is about to disappear, it’s worth taking another look at these tests. How accurate were they? What lessons can we take from them as we attempt to work with the four digital SATs in Bluebook, which are also not official tests?
While I do think they were extremely useful (and I think the Bluebook tests will be too), the four tests from 2015 were quite inaccurate in a lot of ways.
The Ugly
Some questions just never3 appeared on a real test. Polynomial long division showed up 3 times in the first four tests, but never on real ones. Also, two of the tests (Test 2 and Test 4) were eventually removed from Khan Academy, presumably because they did not reflect the content of the test.
The Weird
A lot of questions were just…off. Look at this Algebra Moves question from Test 1:
The skill is worth covering – they ask students to isolate variables all the time – but it’s rare to see such complicated expressions.
Here’s another one from Test 3:
These are still useful questions. They cover concepts that are on the test (isolating variables, vertical angles, 180 degrees in a line), but the style is strange. Many of the questions were very wordy, or took a long time to solve. In all, I marked 17 questions as weird, and, for what it’s worth, they were distributed fairly evenly between the four tests – it’s not like all of them were in the outcast Tests 2 and 4.
The Inaccurate
Let’s start with linear equations. They are a staple of the current test. If you don’t include any systems (only questions that are limited to slope and y-intercept), they still account for about 18% of the questions. Yet on Tests 1 - 4, they accounted for only 9% of the questions. Quadratics were under-represented as well: just under 9% of the questions were quadratic on Tests 1 - 4, but over 12% of the recent test questions have been related to quadratics. Meanwhile, over 10% of the questions on Test 1 - 4 pertained to systems (on current SATs it’s closer to 7%).
The differences were even more obvious at the more granular ‘question type’ level. If you went by the first four tests, you’d think inequalities would appear 2- 3 times per test (in fact, it was closer to once per test), and that no solution questions were relatively uncommon (they showed up almost every test). Here are several more examples:
Over-represented4: circle proportions, absolute value equations, radians, systems with three equations, sin(x) = cos(90 - x).
Under-represented5: line of best fit, median, probability with a table, slope, x and y intercepts, value/frequency tables, radicals, margin of error, infinite solutions, fractional exponents.
And these differences do matter. Suppose you spent 30 minutes, via instruction and homework, helping a student learn how to rationalize a denominator with imaginary numbers and how to graph and shade a system of inequalities. In the same amount of time, you could have covered no solution/infinite solution, fractional exponents, table probability, and perpendicular lines. The first two concepts were very unlikely to appear, but 3 or 4 of the latter concepts were almost certain to appear. Getting 3 or 4 more questions right will often translate to at least 30 or 40 extra points.
The Good
Frustrating as their inaccuracies were to many of us at the time, these first four tests were nonetheless extremely helpful. They provided a way for our students to take mock tests that yielded somewhat-accurate scores. They also conveyed the types of things that were likely to appear. With very few exceptions, every problem type that was to become a staple of the new SAT was represented in these four tests. The discriminant, -b/2a, completing the square, standard deviation, exponential growth – all of these question types appeared in Tests 1 - 4.
Takeaways
So how should we view and use the Bluebook tests?
I think the Bluebook tests will be more accurate than Tests 1 - 4 were, partly because the last overhaul was explicitly focused on content. This time around, the College Board has said that they have made only minor changes to the content – so minor, in fact, that they believe you should be able to superscore a paper-based SAT score and a digital SAT score (the schools will decide whether they’ll allow this). International test-takers have confirmed that the tests were fairly accurate6. I also revamped the Mathchops question base to match the frequencies I saw in Bluebook tests, and early reports from our test-takers have been very positive.
I still think it’s likely that these Bluebook tests are inaccurate and misleading, even if less so. Will the real tests really have so few medians and ratios? Will anything like that insane surface area question from the hard module of the fourth test show up? We won’t know for sure until more of our students take these tests or the College Board releases some real ones.
But after thinking more about Tests 1 - 4, I think I could have avoided most of their downsides and reaped all of their benefits by following four simple rules:
Take anything that shows up at least once per test very seriously.
Consider covering the ‘family members’ of these concepts – question types that don’t appear themselves, but are obviously related to the ones that do.
Anything that shows up even once is worth spending a little time on, but…
…if a student is really struggling with a question, move on as quickly as possible. In fact, if every student seems to struggle with a certain question, you might try to preemptively skip it with new students during test reviews.
Until I have more information, I’ll be following these rules with the Bluebook tests too.
This was mentioned in National Test Prep Association meeting.
On Khan Academy, they refer to them as “Test 1”, “Test 2”, etc.
Pranoy Mohapatra mentioned that he had seen this on an actual test. It didn’t appear on the 17 QASs I looked at, but maybe it was on one of the others. I am confident that you did not need to know this technique (picking numbers would have worked in a pinch).
question types that either showed up at least twice as frequently in the initial batch as they did on actual tests
showed up less than half as frequently in the initial batch as they did on actual tests
I’ve heard rumblings that the first Bluebook test is too easy.
Awesome analysis, Mike. If it helps, someone from College Board directly assured me that they learned from those concerns about how the first four released SATs in the previous revision reflected the tests to come. According to CB, the first four Bluebook tests should absolutely reflect the official ones. That said, I wonder what our colleagues working with students overseas would say about that concordance!