Issues in the Norming of High-Range Tests

© 2003 Paul Cooijmans

The first issue occurring when one tries to norm a high-range test is simply that it is hard to get enough people to take a very difficult test. Only a small fraction of the population is willing to do that, and the beginning test designer tends to underestimate the difficulty of his test and the limitations of his testees. Experience shows that the harder a test is, the fewer people will take it. "Hardness" is here: the percentage of the items on a test that people on average fail to solve. Therefore, relatively easy items must be included, next to very hard ones, to get enough submissions for norming.

Next is the phenomenon of testees not doing their best, e.g. not using reference aids while this is allowed or rushing too much, and thus scoring below their true level. This occurs especially when the test is free, and also when there is nothing to gain by scoring high (like society membership). Also, when testees take the test mainly to aid in norming, rather than out of desire to score high, they tend to underperform. A norming based on such submissions will be (far) too generous. Therefore it seems advisable to ask a compensation, and offer some kind of "reward" to high scorers. And it is better to wait patiently for truly interested testees than to actively recruit a "norming sample". The best norming is one that is based on submissions from people who did their utmost. Such submissions typically drip in slowly over the years.

Then there is the problem that testees tend to withhold their lower prior scores (in reporting prior scores for norming). In fact, when people have taken many tests before, it is quite rare to find them honestly reporting all those scores. This selective reporting of scores too has an inflanatory effect on a possible norming. The best solution is to keep records of all testees' scores, and consult those whenever a testee occurs in a norming sample. Over the years, more and more testees will emerge of whom all scores are known, so that this type of inflation is prevented. Also it helps to select the prior tests used for norming by their correlation with the object test; tests that suffer the most from score withholding tend to have lower correlations and are thus kept from exercising their boosting effect on the norms.

Something to be aware of is the difference in approach between high-range tests and regular psychological tests. High-range tests basically have one set of norms for all who take them, regardless of nationality, sex or age. They are "absolute".

But regular tests, like WAIS, have different sets of norms for different age groups, and are normed on the local (national) population. A WAIS IQ of 100 is not the same for a 20-year old as for a 40-year old, and not the same for an American as for a Frenchman or Korean. It's an entirely different concept; the norms are relative to a narrow population segment. On top of that, individual psychologists may deviate from the test manual in score reporting. Using prior scores from regular tests therefore is a bit like throwing dice to determine the norms. Again, selecting tests by their correlation helps, as regular tests often have lower correlations with a high-range test than do other high-range tests. When it comes to standardization of IQ at high levels, high-range tests are superior to regular tests.

One more thing one encounters with high-range tests is the discrepancy between male and female scores. In the past this has been rarely mentioned as sex differences were more or less a taboo, but more recently societies have emerged - e.g. East Coast Mega, UltraHIQ, Grail and Mega HIQ Girls - that have separate norms per sex (some prefer the term "gender").

It is known that males and females on average have about the same level in "g", but males have greater variance, so at the low and high end we find more males than females. At or above the 98th percentile there are almost twice more males, and at the 99.9th percentile about fifteen times more. On an IQ scale with an SD of 15, the male SD is really about 16, and the female SD about 12. This is a fundamental biological difference that probably has its cause in the first months of pregnancy, when the brain is formed. The different testosterone levels in males and females regulate the formation of the brain in different ways (regardless of genetic aptitude). A male and female with otherwise identical DNA would still have different brains.

This places who deal with high-range tests for a few dilemmas; should pass levels of societies be set within-sex or not? Should possible group percentiles directly based on high-range testees be normed within-sex or not? Should perhaps even IQs be normed within-sex? These things will be decided in the next few years.

Perhaps the most important question is: do high-range tests measure "g", the general factor in mental tests? Or does g break down into its factors at high levels, so that high-range tests measure various kinds of specificity rather than g? To answer this one needs to obtain a correlation matrix between a number of high-range tests of different content types, and perform some kind of factor analysis. This is a process that will take years, because it requires that each test in the matrix is taken by each individual from a group of testees. For the moment it might be useful to have a separate name for the possible general factor in high-range tests, to illustrate we don't know if it is the same as g; I suggest " q ".

Finally there is the question of possible overpresence of high scores. Such overpresence has been found in childhood scores, but it is yet unknown if it also occurs in adult IQ scores. In other words, are high-range scores the upper end of a normal distribution centered at IQ 100, or is there a "bump" in the high end? And if so, what is its cause? A problem here is that when one forces IQs into a normal distribution, as is the case on regular psychological tests that give deviation IQs, there is no room for possible overpresence to show up. High-range tests, normed on known prior scores, do give room to overpresence. So we will study the distribution of IQs on high-range tests. Since these tests are typically taken by too few people to do this for individual tests (except for the Mega, Titan and LAIT), a solution may be to combine a number of them. To avoid arbitrariness, the upper ones from the correlation matrix as it is to date could be chosen. A problem is then that some testees will be present more than once in the sample; if this occurs all over the range equally it is not disastrous, but otherwise (e.g. if very high scorers tend to take more tests than lower scorers) it distorts the distribution. A possible way around this is to take only one score from each testee, to wit that from the the test taken by him or her that is highest in the correlation matrix.