Built to give you an honest score

Most free IQ tests are designed to flatter — short question sets, generous scoring, no confidence intervals. High scores get shared. AurorIQ is built differently. This page explains exactly how.

  • Item Response Theory (IRT) scoring — same framework as clinical assessments
  • Adaptive question selection — difficulty adjusts in real time to your responses
  • Normed on a representative sample — mean 100, SD 15, clinical standard scale
  • Confidence intervals reported with every score — not a single inflated number
  • Explicit limitations — we state clearly what this test cannot measure
01

Methodology

AurorIQ uses Item Response Theory — the same mathematical framework as the WAIS-IV and Stanford-Binet 5 — to estimate cognitive ability from a short sequence of adaptively selected questions.

Item Characteristic Curve (ICC) — how IRT works
3PL
Model — discrimination, difficulty & guessing parameters per item
25
Questions — adaptively selected at your ability level
±8 pts
Typical 95% CI width — reported with every score
Start at the population midpoint

The first question is calibrated to IQ 100. Your response gives the algorithm an initial signal about your ability level (θ).

Update ability estimate after every response

Using maximum likelihood estimation, the algorithm re-estimates θ and its uncertainty after each answer. It selects the next item with the highest Fisher information at that θ — the question that most reduces uncertainty.

Converge on a stable estimate

After 25 questions the algorithm has accumulated enough information to produce a stable θ. This is converted to the IQ scale (IQ = 100 + 15θ) and reported with a 95% confidence interval.

Domain breakdown from within-category responses

Questions are drawn from 5 cognitive domains. Per-domain performance generates the breakdown shown in your results — though domain-level estimates carry wider confidence intervals than the full-scale IQ.

The key advantage of IRT over classical test theory (CTT): an IRT ability estimate is not tied to the specific questions asked. A different 25-item adaptive test from the same item bank would produce a comparable score. CTT scores depend on the items administered, which is why easy-question tests produce inflated scores.

AurorIQ's item bank was calibrated on a representative adult sample. Item parameters (difficulty b, discrimination a, guessing c) were estimated via marginal maximum likelihood. Items with poor fit or low discrimination were excluded from the operational bank.

02

Cognitive Domains

Five domains drawn from the Cattell–Horn–Carroll (CHC) model of cognitive abilities. Together they provide broad coverage of the abilities that load on general intelligence (g). Click any domain to expand.

03

Validity

How closely does AurorIQ measure what it claims to measure? We assess validity by comparing scores against established criterion measures.

Correlation with criterion measures (r)
WAIS-IV Full Scale
r = 0.82
Raven's Matrices
r = 0.76
Academic achievement
r = 0.58
Typical free test vs WAIS
r ≈ 0.38
AurorIQ
Typical free online test
r = 0.82
Correlation with WAIS-IV
Criterion validity against the gold-standard clinical instrument for adults aged 16–90
0.89
Test-retest reliability
Correlation across two sessions 4 weeks apart with no intervening test exposure
±8 pts
Typical 95% CI width
Narrower near the mean; wider at distribution extremes where fewer calibrated items exist

These figures come from our internal validation study. Because participants are self-selected, estimates should be treated as approximate. The WAIS-IV criterion validity (r=0.82) reflects participants who took both assessments — not a fully representative adult sample.

For comparison: the WAIS-IV has published test-retest reliability of 0.94–0.96. AurorIQ's 0.89 is lower — reflecting the reduced precision of a 25-item unproctored online test versus a 90-minute clinically administered battery. We consider this an honest trade-off for accessibility.

04

Limitations

A platform willing to state what it cannot do is more trustworthy than one that claims perfection. This section is deliberate.

Transparency notice
What this test cannot tell you

IQ tests — including AurorIQ — measure a specific and limited set of cognitive abilities under specific conditions. These are genuine limitations you should understand before interpreting your result.

  • Not a clinical assessment

    AurorIQ scores cannot be used for Mensa applications, educational placement, disability assessments, employment screening, or any clinical or legal purpose. Only a proctored assessment by a licensed psychologist using a validated instrument (WAIS-IV, Stanford-Binet 5) qualifies for those purposes.

  • Condition sensitivity

    Your score is sensitive to testing conditions. Fatigue, distraction, anxiety, time of day, and recent illness all affect performance. A single result on a single day is not definitive. If you took the test in poor conditions, retake it — your results are not stored on our servers.

  • Language and cultural assumptions

    The verbal domain has material cultural loading. The test was developed in English and normed on English-speaking adults. Non-native English speakers may receive scores that underestimate their true fluid intelligence. We partially mitigate this by down-weighting verbal items relative to pattern and spatial items.

  • What IQ doesn't measure at all

    Creativity, emotional intelligence, practical wisdom, character, motivation, domain expertise, and most of what determines whether a person lives a good life are not measured by IQ tests. A high score is an advantage in specific contexts — it is not a measure of your worth or your ceiling.

  • Extreme score reliability

    Scores below 80 or above 130 have wider confidence intervals than scores near the mean. The item bank has fewer highly discriminating items at the extremes. Treat extreme scores as directional indicators, not precise measurements.

What we do instead: Every score is reported with a 95% confidence interval and labelled explicitly as an estimate. We do not inflate scores, gate results behind a paywall, or encourage over-identification with the number.
05

Scoring

How your raw responses are converted into an IQ score on the standard mean-100, SD-15 scale.

Population distribution — AurorIQ norms (mean 100, SD 15)
IQ Range
Classification
Percentile
Population
140+
Genius
99.6th+
0.4%
130–139
Very Superior
97.8–99.6th
2.2%
120–129
Superior
91–97.8th
6.7%
110–119
High Average
75–91st
16.1%
90–109
Average
25–75th
50.0%
80–89
Low Average
9–25th
16.1%
70–79
Borderline
2.2–9th
6.7%
<70
Extremely Low
<2.2nd
2.2%

The IRT ability estimate (θ) is a z-score on the latent ability scale, converted to IQ via IQ = 100 + (15 × θ). The confidence interval around θ is derived from the test information function and transformed identically.

Norms are based on an adult sample aged 18–65 from English-speaking countries, with approximately equal representation across age deciles. Scores are interpreted against the full adult population, not age-specific subgroups — unlike the WAIS-IV, which uses age-stratified norms.

06

AurorIQ vs Other Tests

How does AurorIQ compare to typical free online tests and to a clinical WAIS-IV assessment?

Feature AurorIQ Typical free test Clinical (WAIS-IV)
IRT scoring
Adaptive questions
Confidence interval reported
Representative norms ~
Inflation-free scoring
Cognitive domains 5 domains 1–2 domains 5+ domains
Validity vs WAIS-IV r = 0.82 r ≈ 0.38 r = 1.0
Test-retest reliability 0.89 ~0.60–0.70 0.94–0.97
Valid for Mensa / clinical use
Free, no account required ~
Typical cost Free Free–£20 £200–£600
07

Technical FAQ

Questions from sceptical users about our methodology, scoring, and claims.

Ready?

Take the most honest free IQ test online

25 adaptive questions. IRT scoring. A confidence interval with your result. No email. No paywall. No inflated score.

Begin the Test
Free  •  No account  •  ~12 minutes