About This Project

Why IP geolocation needs an independent benchmark

IP geolocation underpins a remarkable amount of the modern internet — content localization, fraud detection, ad targeting, regulatory compliance. The market spans thousands of companies and billions of daily API calls. Yet the consumers of this data have no reliable, independent way to compare providers.

Different personas, shared problem

Those who depend on IP geolocation data don't have the information they need to make a confident decision:

The default user

Grabs a free tier and hopes for the best

Picks the first free API or database, embeds it, and never validates. The assumption: "good enough" or "every provider is the same" goes entirely untested.

The skeptic

Distrusts every provider equally

After encountering geolocations that are simply wrong some teams distrust the whole category. They build workarounds or avoid location-dependent features. No data to quantify the actual risk.

The enterprise buyer

Pays thousands with no way to audit

Compliance-driven organizations spend $10K–$100K+/year on commercial geolocation. Vendor selection is based on sales engagement and self-reported whitepapers.

All three share the same problem: no neutral source of truth. Accuracy claims come from the providers themselves, tested against their own datasets, under conditions they control, with methodology they rarely disclose.

What the research says

The limitations of geolocation databases are not new. The academic literature has documented them for over a decade:

First ground-truth study using ISP data from a large European network. Conclusion: geolocation databases can claim country-level accuracy, but certainly not city-level. Finer-grained entries actually make accuracy worse.

Evaluated databases using ~100K IPs grouped into Points of Presence. Cross-database consistency is poor at city level. MaxMind reported losing accuracy at roughly 1.5% per month due to IP block reassignment.

Studied 1.64M router IPs from CAIDA's Ark dataset. Found 95.8% country-level consistency across databases — but only 71% at city level. Accuracy varies significantly by region, with ARIN (North America) performing particularly poorly at city-level.

Proposed a delay-based evaluation method. Confirmed that database reliability is not uniform across regions, with significant inconsistencies remaining among leading commercial providers.

Studied RFC 8805 geofeeds — a mechanism for network operators to self-publish their IP geolocation. Even this self-reported data contains significant inaccuracies.

Each study made important contributions. But they share structural limitations: they test a fixed set of IPs at a single point in time, they rely on ground truth that is either private (ISP data) or synthetic (WHOIS, DNS hostnames, known landmarks), and they mostly focus on infrastructure IPs rather than end-user consumer traffic.

A recent paper from the University of Chicago explored using device location data from consumer speed tests as ground truth — the first study to validate IP geolocation against actual user-reported locations at scale. They found that accuracy varies significantly by geography, carrier, and access mode — nuances only visible with user-location-grade ground truth.

How IP Accuracy Arena works

The Arena builds on the user-location-as-ground-truth approach, adapted for continuous crowdsourced collection:

  1. A contributor visits the Arena and grants location permission via the browser Geolocation API. On mobile devices this typically uses GPS (5–15m accuracy). On desktops, Wi-Fi positioning or other signals provide an approximate location — less precise, but meaningful for city-level comparison.
  2. The device coordinates are reverse-geocoded to a reference city, region, and country. This is the ground truth.
  3. The contributor's public IP address is sent simultaneously to all tested providers. We capture each provider's returned coordinates, city, region, and country.
  4. Each provider's coordinates are reverse-geocoded through the same service used for ground truth, ensuring consistent naming. The reverse-geocoded results are normalized and compared for city, region, and country match. Haversine distance error is calculated between provider coordinates and device coordinates.
  5. Results are aggregated into the live leaderboard using inverse-variance weighting (1/accuracy²), so higher-accuracy GPS measurements contribute more to the rankings. Repeated tests of the same IP from the same location are deduplicated within 7-day windows, keeping only the latest result.

Comparison logic

City name matching is non-trivial. Rather than comparing raw provider-returned names directly, we reverse geocode every provider's coordinates through the same geocoding service used for ground truth. This assures both sides use the same naming conventions, administrative boundaries, language, etc.

Distance error uses the Haversine formula. We report weighted median distance error per provider, where each submission is weighted by 1/accuracy² (inverse-variance weighting). Submissions with GPS accuracy above 5 km are excluded to prevent IP-based browser fallback from polluting results. Repeated tests of the same IP from the same location are deduplicated within 7-day rolling windows.

How it is different

DimensionTraditional studiesIP Accuracy
Ground truthISP data, WHOIS, DNS hostnames, landmarksUser device location (GPS, Wi-Fi positioning)
IP typesOften router / infrastructure IPsEnd-user consumer IPs
FreshnessStatic snapshot, published onceContinuously updated, live rankings
ScopeTypically 1–2 regions or a single ISPGrows with contributions
NetworksUsually one network typeResidential, mobile, corporate, hotspot
ReproducibilityOften requires private dataOpen methodology, public results
Providers2–4 databases15 providers, expandable

Privacy

Device location data is sensitive. The user's coordinates are used exclusively for real-time comparison and are never persisted.

Stored per test

Timestamp, IP address, ground truth city/region/country and accuracy (no coordinates), provider-returned city/region/country and coordinates, city match result, country match, distance error.

Never stored

Device coordinates, device fingerprint, user identity are not stored.

Known limitations

Location accuracy varies by device. Mobile GPS provides 5–15m accuracy. Desktop Wi-Fi positioning may achieve 50–200m. Our weighted aggregation mitigates this by giving more weight to high-accuracy measurements.

Sample bias. Crowdsourced contributions are not geographically uniform. Underrepresented regions should be interpreted with caution.

VPN, proxy traffic. Users on a VPN will show a deliberate mismatch between device location and IP location. Submissions flagged as anonymous, proxy, or hosting traffic are automatically excluded from the leaderboard.

Provider API tiers. Some providers offer different accuracy across tiers. We test the ones we are able to gain access.

Contributing

The simplest way to contribute is to run a test. Tests from underrepresented regions, mobile networks, and non-Western geographies are especially valuable — this is where providers diverge most and not much data exists.

The methodology is open. If you're a provider who wants to be included, or a researcher interested in the dataset, reach out at arena@ipaccuracy.com.