© 2002, somewhat revised 2005 Paul Cooijmans
High-range tests aim to measure high levels of g, the general factor in mental testing, discovered by Spearman and researched by Jensen and others. However, research on standard tests, that measure IQ from about 70 to 140, has shown at higher levels the correlation between IQ and g goes down. It might be there is a point of diminishing return where saturation occurs, and beyond which only differences in specific factors or specific fields of crystallization are relevant. We don't know if g is measurable at all at the level high-range tests are aiming for. Until it has been proven if or not they measure g, e.g. by factor analysis on a variety of high-range tests, this field of testing resides somewhere between science and the occult.
Once a high-range test has been created and submissions are coming in, two basic norming methods come in view:
Also a complex method is possible:
For method 1, about 30 submissions are needed at least. Method 2 requires about 200, and method 3 requires a total of 200 submissions over a number of tests that have already been normed with method 1 or 2.
This is self-obvious, but a quick example to be certain: if a testee reports two scores, say 133 on CTMM and 140 on Mega, and has a raw score of 27, 2 score pairs are formed, to wit 27-CTMM133 and 27-Mega140. So the number of score pairs is not the same as the number of submissions/testees. In practice it may not be needed to write these pairs out; a good working mode is to have a form or record for each submission, containing that testee's raw score and his/her prior scores.
In selecting pairs to be used, be as objective as possible. Avoid arbitrary decisions, especially those based on subjective notions. E.g., "this is a bad test so I better not use it" is not the right approach. Neither is, "this testee probably did not do his best on this particular prior test, so I better not use it". Do not judge; let objective statistics prevail. If decisions are unavoidable, be consistent.
A few selection methods:
The scale most used in the High-IQ world is that with a total population mean of 100 and standard deviation of 16. The prior scores to be used have to be converted to this scale, if they are not already expressed on it. So you need to know what scales are employed by every particular test. This is a matter of collecting information from anywhere you can. For college admission tests, that do not give IQs, you may need to study available statistics and derive percentiles therefrom, and convert those to IQs. For tests that only report general population percentiles, convert those to IQs using the normal distribution.
For conversion of childhood mental/biological age ratio IQs, one may use "Sare's Prediction". They cannot be used verbatim as the distribution of ratio IQs in the high range deviates sharply from normal.
A remark: note very well that the IQs used in the high-IQ world are really percentiles in disguise. They are by definition linked to percentiles by the normal distribution. Use of the actual mean and standard deviation of college admission tests will not result in proper IQs! Actual distributions of scores rarely have a true normal curve, especially not beyond about 1.5 or 2 SD above or below the mean.
To equate distributions of raw and prior scores, two basic methods exist:
Equating ranks results in norms with a nonlinear relation between IQ and raw score. This is good, as it is unlikely that this relation would be exactly linear (unless the test is constructed purposely to be linear, which is unusual with high-range tests).
To equate ranks, two columns are formed. One contains the raw scores in numerical order from highest to lowest, the other contains the prior scores in numerical order from highest to lowest. Both of course contain the same number of scores. Instead of columns, histograms may used, but for simplicity I speak of columns here.
To obtain the norm for a raw score, simply take the prior IQ of the corresponding rank, where "rank" means the position of the IQ or raw score in its column. When a raw score occurs more than once ("tied ranks"), determine the median of its ranks, and use for norming the prior IQ corresponding to that rank. If a raw score is missing from the column, norm it by interpolation. It is better not to extrapolate, and the norms resulting for the very highest and very lowest raw scores in the column may be off, especially when there was only one score pair for those.
Weighting of score pairs is possible with this method; simply use each score pair from a prior test that deserves a particular weight more than once, depending on the weight. If the weight is 3, use the pairs 3 times. Easiest to use for weights are whole numbers; fractions are possible too but awkward.
Equating z-scores always results in a linear norm table and therefore is only recommended when there is a reason to assume the object test scores are in linear relation to IQ. This is the case when the object test has internal item weighting, where items of different difficulty get different weights based on their statistic behavior (like e.g. in Kevin Langdon's LSFIT and LIGHT). It is also the case when the object test has been constructed carefully to contain equal numbers of items of each level of difficulty (this is only possible by selecting them statistically from a larger pool of items for which certain statistics are already known).
As in rank equating, two columns are formed with the raw and prior score pairs. Weighting of score pairs is possible in the same way as described for rank equating. For each column, mean and standard deviation are computed.
To obtain norms, mean raw score is set equal to mean prior IQ. And for each raw score point above or below that, PSD/RSD is added or subtracted to or from the mean IQ (PSD = Prior Standard Deviation, RSD = Raw Standard Deviation). For practicality, norms may be expressed in a formula like:
IQ = RawScore * PSD/RSD + (IQ for RawScore = 0)
It is recommended not to extrapolate outside the raw score distribution for a distance greater than half the square root of its range.
The simplest way of direct norming is to compute percentiles for each raw score based on the actual test submissions. To do that for a particular raw score, take the number of submissions below that score, add to it half the number of submissions with exactly that score, divide by the total number of submissions, multiply by 100 and round off to the nearest whole number (unless two or more consecutive raw scores would get the same percentile this way, in which case the value should be expressed to the nearest tenth of a percentile). About 200 submissions or more are needed to get stable percentiles.
As percentiles do not have a linear character they are intuitively unsatisfactory. To arrive at linear scale, two methods are thinkable:
Following the assumption that if a distribution is normal, its standard scores can be taken as a linear scale, converting the percentiles to standard scores via the normal distribution should result in a good intuitive scale of high-range intelligence. The mean and SD of the scale can be defined in any desired way, but I recommend a mean of 50 and SD of 10 (t-scores).
To base a standard scale directly on the actual raw scores is only recommended if the test has been constructed purposely to yield linear scores as meant in step 1.4.2. And mean and SD can be defined in any desired way.
To enable scores on tests normed with method 1 to be expressed as high-range percentiles and t-scores as meant in Method 2, the submissions of several tests already normed with method 1 or 2 can be combined to arrive at a sample size sufficient to make a direct norming as in method 2. The IQ norms from the original normings are treated as "raw score" in the new method 3 norming.
If the scores from the already normed tests are combined just like that, a problem is that some testees will occur more than once in the sample, which distorts the distribution. This may or may not be seen as a serious problem. A possible solution is to create a correlation matrix of the tests insofar possible, and for each testee use only the score from the test that is highest in the matrix (that is: that has the highest average of correlations with the other tests).