Recommendations for conducting high-range intelligence tests

© March 2007 to December 2009 Paul Cooijmans

These refer to both supervised and unsupervised tests for high intelligence, especially including admission tests for I.Q. societies.

Consistent scoring

There must be consistency in which answers are counted wrong and which right, so that scores of different candidates are comparable. Subjectivity and arbitrariness must be avoided, as must credit for "alternative answers".

To achieve this consistency and objectivity, test items should be created so that the intended correct answer is unique and can only be found by actually solving the problem, and that therefore this answer is very unlikely to be found by chance. This eliminates the need to require candidates to provide explanations to prove they have really solved the problem; judging explanations for that purpose is always subjective and therefore undesirable. Explanations may still be of use to detect bad items though, but they must never be "scored"; only the answers must be scored.

Multiple-choice items are objective but this is spoiled by the risk one chooses the right answer by chance (guessing); If this risk is eliminated by cleverly and creatively inventing a multiple-choice format that greatly reduces the probability of guessing right, multiple-choice items are perfectly acceptable.

Acting in the interest of correct statistics

Correct statistics, norms, scores and admission procedures rest upon objective, blind, emotionless, unempathic, unyielding, unhelpful, "cold", "unkind" scoring and score reporting. One must be practicing science, not helping people.

This is a difference with regular psychological testing, where there is a doctor/patient or consultant/client relationship. There, one is always acting in the interest of the patient or client, resulting in practices like allowing multiple retests (but reporting as if it were a first attempt), helping the candidate during the test, giving (partial) credit for answers that are "almost right", giving in to pressure and intimidation to report too high scores, giving feedback on which answers were wrong or right, giving the correct answers to the candidate, and other forms of help out of empathy. All that is against the interest of science and a corruption of scores, tests, statistics and admission procedures.

Atomic items

Test items should be atomic; that is, they should be units that yield a simple wrong/right item score, rather than a multi-valued score. Complex problems with multiple aspects that can be wrong or right independently may still very well be included, but should be treated as multiple items with each their own wrong/right score and item number. This has two advantages: 1) Statistical item analysis becomes much easier because the separate aspects of the problem are isolated from the start on, and 2) The complex problem naturally receives a greater weight than a single-item problem, thus reducing the need for artificial item weighting.

Retesting

Retests must not be allowed for the following reasons:

If a retest is allowed in some rare case for a special reason, the score report MUST mention it concerns a retest, to prevent it from being used for admission or for statistical purposes as if it were a first attempt. If the retest report does not mention it concerns a retest, this makes it impossible to distinguish it from a first attempt, and therefore reports from test scorers who fail to identify retests on their reports can never be trusted or accepted for admission purposes.

Secrecy of solutions and the motivation for secrecy

The correct answers to a test should be kept secret and not revealed to anyone outside the test creators and scorers. This is especially important because people who have not found a particular solution to a difficult problem themselves but been given it lack the motivation for secrecy that comes natural to those who have solved it. Who solves a hard problem and receives a score for it will normally not want to help others of lesser ability to undeservedly score at that level, or above their true level, as that would reduce the meaning of the first person's own score. But who receives the solution effortlessly without having found it will almost certainly abuse that information and pass it on to others, thus in one's own interest reducing the meaning of the scores of who are above one in ability. It is important that the second part of the previous sentence is well understood.

Distrust reported scores

When norming a test on reported scores from other tests, one must be aware that early on in one's career as a test scorer these reported scores are not representative but inflated and would result in (far) too generous norms. Although few people really LIE about their scores, almost all who have taken multiple tests report only their higher scores and leave out the lower ones, and if they have taken a particular test multiple times they typically only report the highest attempt. For a true picture of the candidates' scores and the correlations of one's test with other tests, all of the scores must be known. My articles on the norming of high-range tests say more about this.

Unacceptable conditions on unsupervised tests

Fraud through cooperation can not be entirely eliminated with unsupervised tests, but by far the worst problems occur when a test is self-timed and/or forbids reference aids. Many or most will use too much time and/or use reference aids anyway then, thus greatly increasing their score. These problems are solved by allowing reference aids and as much time as the candidate needs to find all the solutions he can.