Recommendations for conducting high-range intelligence tests

Introduction

These refer to both supervised and unsupervised tests for high intelligence, especially including admission tests for I.Q. societies.

Consistent scoring

There must be consistency in which answers are counted wrong and which right, so that scores of different candidates are comparable. Subjectivity and arbitrariness must be avoided, as must credit for "alternative answers".

To achieve this consistency and objectivity, test items should be created so that the intended correct answer is unique and can only be found by actually solving the problem, and that therefore this answer is very unlikely to be found by chance. This eliminates the need to require candidates to provide explanations to prove they have really solved the problem; judging explanations for that purpose is always subjective and therefore undesirable. Explanations may still be of use to detect bad items though, but they must never be "scored"; only the answers must be scored.

Multiple-choice items are objective but this is spoiled by the risk one chooses the right answer by chance (guessing); if this risk is eliminated by cleverly and creatively inventing a multiple-choice format that greatly reduces the probability of guessing right, multiple-choice items are perfectly acceptable.

Acting in the interest of correct statistics

Correct statistics, norms, scores and admission procedures rest upon objective, blind, emotionless, unempathic, unyielding, unhelpful, "cold", "unkind" scoring and score reporting. One must be practicing science, not helping people.

This is a difference with regular psychological testing, where there is a doctor/patient or consultant/client relationship. There, one is always acting in the interest of the patient or client, resulting in practices like congratulating the candidate with one's score, allowing multiple retests (but reporting as if it were a first attempt), helping the candidate during the test, giving (partial) credit for answers that are "almost right", giving in to pressure and intimidation to report too high scores, giving feedback on which answers were wrong or right, giving the correct answers to the candidate, and other forms of help out of empathy. All of that is against the interest of science and a corruption of scores, tests, statistics and admission procedures.

Such empathic, corrupting behaviours characterize the deep amateur test scorer, and betray that one has absolutely no idea what one is doing. We see these aberrations ever more on score reports written by the Toms, Dicks, and Harrys that reckon themselves high-range test creators in these days of decline; the de-cadence of right-tail psychometrics. A few of the greatest hallmarks of scorer incompetence will now be discussed:

Congratulating or praising the candidate

"Impressive! Congratulations with this excellent score!" Without exception, this betrays the deep unsuitability of the would-be test creator. An intelligence test score is objective, never "good" or "bad", never a reason for praise or congratulation. The one score is not "better" than the other, psychometrics know no value judgment. In case this is not at once clear to the reader, imagine the scorer saying to one with a low score, "What a rotten performance, you imbecile!" If one does not the one, then also not the other.

Candidates who receive a report which praises and congratulates them with their score may safely discard it as the botch of a bungler and go on to take a real test from a real expert to know their real I.Q.

Revealing which answers were wrong

"Your errors were mainly in the first few problems of the test..."; "Your answers to problems 12 and 16 were brilliant and creative, but unfortunately I expected different answers so I can not give you credit for them..." Apart from the fact the latter betrays that those items are bad and must be removed or revised, such remarks help the candidate to know what was wrong and to find out the intended answers. And how dangerous that is is explained under "Secrecy of solutions and the motivation for secrecy" below.

Revealing the actual intended answers

"Your reasoning was good and your answers very creative, but my intended answers are just a little bit more logical; to put your mind at ease I herewith attach a list of my answers with explanations, so that you can see for yourself that the intended answers are indeed better." Again, see under "Secrecy of solutions and the motivation for secrecy".

Allowing retests

The reasons for not doing so are explained under "Retesting".

Giving all candidates about the same score from a narrow range

This bizarre phenomenon occurs when bad psychologists do not trust the objective statistical results from their own tests, which in turn is caused by their bad mastery and understanding of psychometrics. They observe that incoming test scores do not match their intuitive impression of the candidates' intelligence, panic, and decide to report their intuitive notion as if it were a test score. They may include personality-type questions in what is meant as an I.Q. test for this purpose. Typically, the resulting scores display a more narrow range than real scores would, as these less than skilful statisticians are unable to recognize intelligence levels above their own (which inherently is less than high), and thus assign almost all of their candidates I.Q.'s around or somewhat below their own.

Atomic items

Test items should be atomic; that is, they should be units that yield a simple wrong/right item score, rather than a multi-valued score. Complex problems with multiple aspects that can be wrong or right independently may still very well be included, but should be treated as multiple items with each their own wrong/right score and item number. This has two advantages: (1) Statistical item analysis becomes much easier because the separate aspects of the problem are isolated from the start on, and (2) The complex problem naturally receives a greater weight than a single-item problem, thus reducing the need for artificial item weighting.

Retesting

Retests must not be allowed for the following reasons:

They are not comparable to first attempts, so accepting retests as the candidate's true score (which is usual in some circles) implies that the first score is not the true score, and therefore means to oblige all candidates to take the test twice in order to know their true score, and to require them to destroy their possible first score report (or not issue it at all).
Considering the retest score to be the true score implies that only the retest scores can be used for statistical purposes such as norming, and the first attempt scores are useless statistically; it means to throw away the biggest part of the work one is doing, of the data one is gathering. In practice, of course, those who allow retests do use the first attempt scores for statistical purposes, sometimes even in combination with the retests to arrive at a larger sample size, thus corrupting their statistics.
In practice, candidates and test scorers involved in retests use the highest of the two scores, rather than the actual retest score (which should be used in all cases even when it is lower), and therefore add to the above mentioned two problems the inflation of scores caused by having "two chances", as well as the levelling between candidates resulting from the same (Inflation and levelling, when using the highest of two scores, are the necessary result of the imperfect test-retest correlation; and this correlation is imperfect or there would be no point in retesting to begin with). In case it is not at once apparent why using the highest of two scores causes inflation and levelling, one may imagine that "the highest of two" is on average higher than "always the retest", because the retest score is sometimes lower than the first score.
Candidates with a perfect or near-perfect score on first attempt are excluded from knowing their true score this way as there is no or too little room for their retest score to differ positively from their first score.
Through retests candidates can verify the value (score) of particular answers or answer sets (more or less like in the game "Mastermind"), which endangers the secrecy of the test's answers. From two scored submissions, very much more information can be derived than from one.

If a retest is allowed in some rare case for a special reason, the score report must mention it concerns a retest, to prevent it from being used for admission or for statistical purposes as if it were a first attempt. If the retest report does not mention it concerns a retest, this makes it impossible to distinguish it from a first attempt, and therefore reports from test scorers who fail to identify retests on their reports can never be trusted or accepted for admission purposes.

Secrecy of solutions and the motivation for secrecy

The correct answers to a test should be kept secret and not revealed to anyone outside the test creators and scorers. This is especially important because people who have not found a particular solution to a difficult problem themselves but been given it lack the motivation for secrecy that comes natural to those who have solved it. Who solves a hard problem and receives a score for it will normally not want to help others of lesser ability to undeservedly score at that level, or above their true level, as that would reduce the meaning of the first person's own score. But who receives the solution effortlessly without having found it will almost certainly abuse that information and pass it on to others, thus in one's own interest reducing the meaning of the scores of who are above one in ability. It is important that the second part of the previous sentence is well understood.

Distrust reported scores

When norming a test on reported scores from other tests, one should be aware that these reported scores are not representative but inflated and would result in (far) too generous norms if used for norming purposes. Although few people really lie about their scores, almost all who have taken multiple tests report only their higher scores and leave out the lower ones, and if they have taken a particular test multiple times they typically only report the highest attempt. For a true picture of the candidates' scores and the correlations of one's test with other tests, all of the scores must be known. My articles on the norming of high-range tests say more about this.

A quotation from a candidate's message that ruthlessly betrays the self-obviousness with which only the highest scores are reported: Tomorrow or some day I 'm going to take another IQ test and I will send you the result if it should be significantly higher than what I earned on your test.

Unacceptable conditions on unsupervised tests

Fraud through cooperation can not be entirely eliminated with unsupervised tests, but by far the worst problems occur when a test is self-timed and/or forbids reference aids. Many or most will use too much time and/or use reference aids anyway then, thus greatly increasing their score. These problems are solved by allowing reference aids and as much time as the candidate needs to find all the solutions one can.

[More articles on intelligence]