Math Praxis 5161

I recently took the Math Praxis 5161 and received a total raw score of 36. What are the chances that my scaled score is higher than a 160? I was hoping you guys can give me a good "guesstimate" because I am trying to decide whether to start applying for Math jobs for next school year.

Also, it says on my account that the score will not be available until May 27. Has anyone had any luck with getting their official scaled score earlier than the date they say, or is that a for sure locked in date?

The Study Companion says that 5161 has 60 questions; the sample questions it gives suggest that, while the test isn't just a straight-ahead multiple choice test, it's nevertheless the case that each question is worth one raw point; the remark that some questions "may not" count toward your score nearly always means that some questions do not count (because they're being field-tested for future use), and I think we have credible evidence from other Praxis exams that the raw score on the screen takes those questions into account. If we assume that 10% of the questions you saw don't count, that would make your score 36/54, which is 66.6%: that's borderline for passing, but I wouldn't rule it out that you'd passed.

It does say at the end of the test that for one section I had 26/34 possible raw points and the other section 10/16 raw points, so it should be out of 50. 72% of the raw points should be a passing score, huh?

Can be, yes.

Historically 32 scales to 160. It can go up/down by 1 or 2 raw scores I guess, because each test is different and if a bunch of good test takers took your version of the test (i.e. test score average goes up for your version) then it would take a higher raw score to scale to 160.

Generally this isn't an issue because enough people take each version of the test (i.e. enough samples during each testing window), and the distribution of ability of those taking the test is relatively uniform during each testing window, that the historical figures should be accurate. I'm 99.9% sure that 36 would scale to 160 or above.

Scaled scores don't have to do solely with test taker performance overall, though, ReverseSpin. If you've ever tried writing test questions, you'll know how darned difficult it is to write even two questions that are exactly equivalent in difficulty, let alone 60. The algorithm that converts raw scores to scaled takes this into account as well.

It does have to do with test taker performance overall.

How does the algorithm figure out how to scale the June test to match a January test (just examples here)? There is no way an algorithm can just go through the questions and figure out if one is harder than the other - you instead do the sensible thing and go through the answers - i.e. the test taker's performance.

For instance, let's say you expect, based on historical data, that a 50 question test would have an average of 30 correct with a std dev of 6. And you peg 32, or 1/3rd of a std deviation over average, as 160 (the typical state passing standard).

Now if a new version of the test results in a group average of 28 with a s.d. of 6, the "algorithm" would likely peg a 30 raw score as 160 scaled. This makes complete sense as the scaling would equate this new test with historical standards (i.e. 1/3rd s.d. over average is passing).

Therefore, the collective group's test taking performance will affect how raw scores are scaled, because the collective will affect the test average and test standard deviation.

This is exactly the sort of algorithm that ETS uses to scale (actually, any test company would use something similar to scale and equalize testing).

What I said above is that (a) if your test cohort is better than average then it will push up the average and mess with that test's scaling and (b) it is not likely that your test cohort is better than average because enough people take the test within each testing window to make the testing cohort average in abilities compared to prior tests.

Note that the numbers I used above (30/28, s.d. of 6) are just illustrative. I was making them up to demonstrate the concept.

Twenty years ago, you might have been right; Praxis scores varied from test to test, and "harder" tests (math, science) could be passed with lower scaled scores than "easier" ones (elementary education)... except for the nagging point that Praxis was and most likely still is quite emphatic that one's testing cohort on a given test date makes no difference to whether one passes or not: it's all about hitting the scaled score.

In recent years Praxis seems to be moving toward Pearson's model of choosing a scaled score that represents "barely passing" for a number of tests, and then figuring out how to align each of those tests' raw results to that arbitrary number (TExES, which is an ETS product but isn't Praxis, uses 240 on a scale from 100 to 300, unless a test is pass-no pass; passing scores for Praxis's physics content-knowledge exam, which is old enough that it still has a paper-based version, range from a low of 126 to a high of 153 over 34 states; the mode is 141 (six states) and the next most frequent score is 130 (three states); in contrast, while 5161's passing scores do range from 132 to 160, in 33 of the 38 states and territories the required score is 160. And I remember an older distribution of math passing scores that was much more like the physics test.

It continues to be the case that scaling of scores is touted by both Praxis (and the rest of ETS' teacher testing operation) and Pearson as ensuring that each barely-passing scaled score on a given test or subtest represents the same level of achievement, no matter the version of the test. In California, the initial score-setting process for CSET subtests (or major revisions of subtests) began by passing each question in the first version of a test by a panel of teachers and teacher educators, who voted on the percentage of just-barely-qualified test takers that they would expect to get each question right, from 100% down to 10%. These ratings served as input in the process of deciding which raw scores for which subtests would correspond to California's passing score of 220. That this is so makes more sense of the fact that teacher tests can be passed with 60% to 75% correct, a percentage that would be resoundingly failing in most college courses.

I should point out that a 1990s administration of a Praxis test delivered up one question that sticks in my mind to this day because it provided such a delightful challenge compared to the rest of the exam. Questions DO differ in difficulty; it makes perfect sense that the algorithm that converts raw scores to scaled would be provided fudge factors based on ratings like those provided in the score-setting process (though probably less thoroughly vetted).

(I do spend rather a lot of time thinking about teacher tests.)

Apologies, but you write a lot and say nothing.

32 will more or less scale to 160, bottom line.

ETS has a very simple way to make each version of the test more or less like the prior versions - they have those 10 potential questions that they try out on everyone, along with the 50 (???) potential questions that they ask you to take after the test for a chance at winning a gift card. Pretty easy to correlate a candidate's performance on actual questions with potential questions, do so over a large cohort and figure out which potential questions are too hard, too easy or just right. Not a perfect science mind you, as there are only 4K or do test takers yearly for the 5161.

My 31 scaled to a 165 last week.

Congratulations, zb29!

Oh, sorry: I'll write shorter paragraphs and simpler sentences.

This ETS report's standard-setting primer, on pages 57-58, shows that Praxis, at least as of 2011, follows pretty much the same method for setting scores as Pearson does. Like Pearson's method, the Praxis method has a panel of experienced teachers rate test questions for difficulty. This suggests that both companies seem to allow, and maybe even prefer, that questions will differ in difficulty.

It's quite true that fewer people take 5161 per year than take the other teacher tests that Praxis offers, so the way Praxis does things is shaped more by all of its tests than by its math tests alone. A question gets field-tested in part so Praxis can figure out whether it needs to be rewritten, before the question actually counts as part of a test taker's score. Suppose that a test includes question X that is being field tested. If test takers who score high overall, and who do well with the challenging questions, tend to choose the same wrong answer B for question X, this tells Praxis that something is wrong with the stem of question X or the answer choices or the whole thing.

The more a person teaches, the more she learns to marvel at her students' ability to find bugs she can't deny in questions that she was sure were perfectly clear.

I'll get banned for this 'cause you're a moderator and all, but frankly, you just seem to regurgitate a lot of junk while thinking you are correct.

In how many threads do people say 'I just got 39 on the 5161, what does it mean' 'I just got 48 on the 5161, what does it mean' and your response is both boilerplate and additionally wrong. You say stuff like 'Well, you received 39 out of 60 so you did quite well...' or 'You received 48 out of 60...'

5161 has 60 questions, but only 50 of them are scored for your score at the end - the other 10 as you know are field tested practice questions. Folks have mentioned this on other threads and yet you still don't comprehend that the raw score is out of 50 and not 60.

And what is with this cut-and-paste job about standard setting? Do you even understand what that means?

Standard setting is the STATE's job - the various states determine, via their panel study, what the pass/fail standard is for their teacher licensure requirement. Standard setting determine what scaled score is a passing score, but does not try to determine what raw score equates to what scaled score.

ETS is very clear that a scaled score is used to equalize between testing periods (i.e. between each test form). They equalize by trying to determine how a person who just took one test (one form) would have done if they turned around and took another form, without additional studying.

Obviously this is not easy to do and at the end of the day it is just an educated guess. One pre-test way to equalize the forms is to try to create equalized test forms by using validated field tested questions. One post-test way to equalize the forms is to work with a data point you have - the average performance of test takers during each testing period.

If one testing period/test form has a pretty high average score compared to the historical rest, then ETS will think that that test form is 'easy' and therefore will scale it like an easy test. An easy test means you need more correct raw scores to get the same scaled score as a previous test form.

ETS scaling is not perfect and they fully admit this. In one of their reports they state that the same test taker can see a variance of 7.4 scaled points between test forms, for the 5161. This variance is lower on most other Praxis / Praxis II tests and only higher on a handful of other Praxis subject tests. ETS ideally wants to design their test forms with lower variance - but the variance tells you that if you failed within its window, then take the test again (and a bit more study wouldn't hurt either).

I'm not trying to be argumentative here, but you are just wrong and you pile on with extraneous stuff.

When did you get your score? I am anxiously awaiting my score!

I received my raw and scaled score at the end of the exam. I was really worried at first when I saw a 31, but then it scaled to a 165, which is exactly what I needed here in Arkansas.

Oh okay... I took my last exam on May 7th, and I think that I passed, but not 100% sure... It is driving me crazy not knowing.. I dreamed one night that I scored a 159 and needed a 160 to pass.. I am really hoping to teach next year, so I need it to be a passing score!

From someone who claims not to be trying to be argumentative, that's rather a lot of argument.

I've given my sources: what are yours?

It's true that I haven't memorized how many questions Praxis doesn't count on test 5161. The number of non-scoring questions, and even the percentage, differs from test to test even within Praxis, let alone outside it, and most teacher test programs are a bit coy in making the number public. If you can direct me to a source online that lists how many questions Praxis doesn't count per test, I would be delighted to add it to my list of bookmarks.

As for whether you should be banned, I'll leave that to the site owner and the other moderators.

Good hunting in your test taking, and good luck to your future students.

Congrats zb29!
Has anyone that did not get their scaled score upon completion get their score earlier than what they tell you over the phone or via email? The anticipation may kill me....or cost me a job.

The anticipation is driving me crazy as well! I check it everyday and several times a day! Lol

