Built to give you an honest score

Most free IQ tests are designed to flatter — short question sets, generous scoring, no confidence intervals. High scores get shared. AurorIQ is built differently. This page explains exactly how.

Item Response Theory (IRT) scoring — same framework as clinical assessments
Adaptive question selection — difficulty adjusts in real time to your responses
Normed on a representative sample — mean 100, SD 15, clinical standard scale
Confidence intervals reported with every score — not a single inflated number
Explicit limitations — we state clearly what this test cannot measure

Item Characteristic Curve (ICC) — how IRT works

3PL

Model — discrimination, difficulty & guessing parameters per item

Questions — adaptively selected at your ability level

±8 pts

Typical 95% CI width — reported with every score

Start at the population midpoint

The first question is calibrated to IQ 100. Your response gives the algorithm an initial signal about your ability level (θ).

Update ability estimate after every response

Using maximum likelihood estimation, the algorithm re-estimates θ and its uncertainty after each answer. It selects the next item with the highest Fisher information at that θ — the question that most reduces uncertainty.

Converge on a stable estimate

After 25 questions the algorithm has accumulated enough information to produce a stable θ. This is converted to the IQ scale (IQ = 100 + 15θ) and reported with a 95% confidence interval.

Domain breakdown from within-category responses

Questions are drawn from 5 cognitive domains. Per-domain performance generates the breakdown shown in your results — though domain-level estimates carry wider confidence intervals than the full-scale IQ.

The key advantage of IRT over classical test theory (CTT): an IRT ability estimate is not tied to the specific questions asked. A different 25-item adaptive test from the same item bank would produce a comparable score. CTT scores depend on the items administered, which is why easy-question tests produce inflated scores.

AurorIQ's item bank was calibrated on a representative adult sample. Item parameters (difficulty b, discrimination a, guessing c) were estimated via marginal maximum likelihood. Items with poor fit or low discrimination were excluded from the operational bank.

Correlation with criterion measures (r)

WAIS-IV Full Scale

r = 0.82

Raven's Matrices

r = 0.76

Academic achievement

r = 0.58

Typical free test vs WAIS

r ≈ 0.38

AurorIQ

Typical free online test

r = 0.82

Correlation with WAIS-IV

Criterion validity against the gold-standard clinical instrument for adults aged 16–90

0.89

Test-retest reliability

Correlation across two sessions 4 weeks apart with no intervening test exposure

±8 pts

Typical 95% CI width

Narrower near the mean; wider at distribution extremes where fewer calibrated items exist

These figures come from our internal validation study. Because participants are self-selected, estimates should be treated as approximate. The WAIS-IV criterion validity (r=0.82) reflects participants who took both assessments — not a fully representative adult sample.

For comparison: the WAIS-IV has published test-retest reliability of 0.94–0.96. AurorIQ's 0.89 is lower — reflecting the reduced precision of a 25-item unproctored online test versus a 90-minute clinically administered battery. We consider this an honest trade-off for accessibility.

IQ tests — including AurorIQ — measure a specific and limited set of cognitive abilities under specific conditions. These are genuine limitations you should understand before interpreting your result.

Not a clinical assessment

AurorIQ scores cannot be used for Mensa applications, educational placement, disability assessments, employment screening, or any clinical or legal purpose. Only a proctored assessment by a licensed psychologist using a validated instrument (WAIS-IV, Stanford-Binet 5) qualifies for those purposes.
Condition sensitivity

Your score is sensitive to testing conditions. Fatigue, distraction, anxiety, time of day, and recent illness all affect performance. A single result on a single day is not definitive. If you took the test in poor conditions, retake it — your results are not stored on our servers.
Language and cultural assumptions

The verbal domain has material cultural loading. The test was developed in English and normed on English-speaking adults. Non-native English speakers may receive scores that underestimate their true fluid intelligence. We partially mitigate this by down-weighting verbal items relative to pattern and spatial items.
What IQ doesn't measure at all

Creativity, emotional intelligence, practical wisdom, character, motivation, domain expertise, and most of what determines whether a person lives a good life are not measured by IQ tests. A high score is an advantage in specific contexts — it is not a measure of your worth or your ceiling.
Extreme score reliability

Scores below 80 or above 130 have wider confidence intervals than scores near the mean. The item bank has fewer highly discriminating items at the extremes. Treat extreme scores as directional indicators, not precise measurements.

What we do instead: Every score is reported with a 95% confidence interval and labelled explicitly as an estimate. We do not inflate scores, gate results behind a paywall, or encourage over-identification with the number.

Population distribution — AurorIQ norms (mean 100, SD 15)

140+

Genius

99.6th+

0.4%

130–139

Very Superior

97.8–99.6th

2.2%

120–129

Superior

91–97.8th

6.7%

110–119

High Average

75–91st

16.1%

90–109

Average

25–75th

50.0%

80–89

Low Average

9–25th

16.1%

70–79

Borderline

2.2–9th

6.7%

<70

Extremely Low

<2.2nd

2.2%

The IRT ability estimate (θ) is a z-score on the latent ability scale, converted to IQ via IQ = 100 + (15 × θ). The confidence interval around θ is derived from the test information function and transformed identically.

Norms are based on an adult sample aged 18–65 from English-speaking countries, with approximately equal representation across age deciles. Scores are interpreted against the full adult population, not age-specific subgroups — unlike the WAIS-IV, which uses age-stratified norms.

Feature	AurorIQ	Typical free test	Clinical (WAIS-IV)
IRT scoring	✓	✗	✓
Adaptive questions	✓	✗	✓
Confidence interval reported	✓	✗	✓
Representative norms	~	✗	✓
Inflation-free scoring	✓	✗	✓
Cognitive domains	5 domains	1–2 domains	5+ domains
Validity vs WAIS-IV	r = 0.82	r ≈ 0.38	r = 1.0
Test-retest reliability	0.89	~0.60–0.70	0.94–0.97
Valid for Mensa / clinical use	✗	✗	✓
Free, no account required	✓	~	✗
Typical cost	Free	Free–£20	£200–£600

Why is 25 questions enough for an accurate result?

How do you know your norming sample is representative?

Why is my score different from another test I took?

Why doesn't AurorIQ qualify for Mensa?

Can I game the test by taking longer or looking up answers?

Ready?

Take the most honest free IQ test online

25 adaptive questions. IRT scoring. A confidence interval with your result. No email. No paywall. No inflated score.

Begin the Test

Free • No account • ~12 minutes

Built to give you an honest score

Methodology

Cognitive Domains

Validity

Limitations

Scoring

AurorIQ vs Other Tests

Technical FAQ

Take the most honest free IQ test online