Data Coverage
Where BrandMappr's location data comes from, what's in and out of scope, and how to tune query results for precision vs recall.
Coverage transition in progress. Production is mid-transition from a single-source feed (OpenStreetMap) to the four-source reconciled pipeline described below. Live responses today reflect the OSM-only baseline (~2.3M rows total) and may undercount well-known chains relative to canonical figures — for example, McDonald's US currently returns ~4,600 locations against a real-world total closer to 13,800. The four-source reconciled dataset (~6.2M rows) is built and validated; the cutover is operator-gated. Until it lands, the min_confidence, quality, and include_provenance parameters validate but stay no-ops.
Sources
Four sources contribute rows. Each is licensed for commercial reuse and credited in the X-Attribution response header.
| Source | License | What it brings |
|---|---|---|
| OpenStreetMap (OSM) | ODbL 1.0 | Long-tail brand breadth. Ingested via Geofabrik PBF extracts. Strongest where the OSM community is active, weaker where brand tagging is sparse. |
| Overture Maps Places | CDLA Permissive 2.0 | Top-brand coverage and stable cross-release identity via GERS IDs. ~33% of branded rows carry Wikidata QIDs for clean brand-identity joins. |
| Foursquare Open Source Places | Apache 2.0 | Strong category metadata; complements Overture's brand-first model. |
| All The Places (ATP) | CC0 | Weekly web scrapes of retailer store-locator pages. Carries retailer-internal store IDs (ref). Highest freshness signal. |
Reconciliation
Rows from all four sources are joined per brand and region, deduplicated within a 100m bbox tolerance, and assigned a confidence score that reflects source agreement and a sparse-region noise penalty. When the same physical store appears in multiple sources, contributing source IDs are preserved on the reconciled row (see alias_sources).
The reconciliation runs as a pipeline against a recent snapshot of each source, producing a deterministic Parquet output that's then loaded into the production database. The order in which sources contribute to a brand is per-brand and per-region — there is no single "primary" source globally.
Scope
The v1 data product covers consumer-facing storefronts and POIs only. Corporate footprint is out of scope.
| In scope | Out of scope (v1) |
|---|---|
| Retail stores, restaurants, bank branches, hotels | Corporate HQ, regional offices |
| Gas stations, pharmacies, supermarkets | Distribution centers, warehouses |
| Service locations with public access | Manufacturing and processing facilities |
| EV chargers, fitness clubs, telecom retail | Private B2B sites |
Multi-company maps requested via stock ticker (a corporate identifier) return the company's storefront locations, not its corporate sites. A corporate-footprint product line is planned but separate.
Coverage strength by category
Coverage is uneven across categories. Roughly:
- Strong — fast food, coffee, gas stations, mass-market retail chains (clothing, electronics, home improvement), pharmacies in developed markets.
- Moderate — banks, hotels, supermarkets, specialty retail, telecom retail.
- Weaker — regional and local-only chains, niche specialty retail, professional services with storefronts.
- Sparse or absent — categories with little public street presence (B2B services, finance back office, manufacturing) and any non-storefront business type.
Coverage is also stronger in developed markets than in emerging markets, reflecting the underlying source data.
To check the live location count for any specific brand before committing to a query, use GET /api/v1/brands/summary?brand=X. The endpoint is public, returns counts plus a 5-row sample, and does not consume credits.
Precision vs recall
Every reconciled row carries a confidence score on [0,1]. Defaults bias toward recall (include everything). Two request parameters let a caller shift toward precision.
quality profile
A named threshold for callers who'd rather pick a profile than a number.
| Profile | Min confidence | Use case |
|---|---|---|
verified | 0.8 | High-accuracy work — due diligence, regulatory filings, public-facing reports |
balanced | 0.6 | Recommended default for analytical use |
recall | 0 | Completeness audits, wide-net research, internal tooling |
curl -X POST https://api.brandmappr.com/api/v1/location-map \
-H "Content-Type: application/json" -H "x-api-key: YOUR_KEY" \
-d '{"brand":"Starbucks","country":"US","quality":"verified"}'min_confidence
An explicit confidence floor in [0,1]. When both min_confidence and quality are present, min_confidence wins.
curl -X POST https://api.brandmappr.com/api/v1/location-map \
-H "Content-Type: application/json" -H "x-api-key: YOUR_KEY" \
-d '{"brand":"Starbucks","country":"US","min_confidence":0.75}'storefront_type
Filter to a specific kind of consumer-facing location. Values: full_retail | small_format | pickup | studio | other. Useful when a brand operates a mix — for example, separating standard IKEA stores from IKEA Planning Studios.
Provenance fields
When include_provenance: true, JSON and GeoJSON responses gain five extra fields per row. CSV always omits these (CSV stays at the fixed 11-field shape for compatibility).
| Field | Type | Meaning |
|---|---|---|
source_types | string array | All sources that corroborate this row, e.g. ["overture", "fsq"] |
confidence | number (0–1) | Reconciled confidence score |
storefront_type | enum | full_retail | small_format | pickup | studio | other |
alias_sources | object array | Corroborating {source_type, source_id} pairs |
last_verified | ISO timestamp | When the row was last cross-checked against a source (distinct from last_updated) |
curl -X POST https://api.brandmappr.com/api/v1/location-map \
-H "Content-Type: application/json" -H "x-api-key: YOUR_KEY" \
-d '{"brand":"Starbucks","country":"US","include_provenance":true}'Rows that haven't yet been touched by the multi-source pipeline return null for every provenance field.
Response headers
Every response carries two attribution headers:
X-Attribution— All four contributing sources, alphabetical:All The Places, Foursquare, Overture, OpenStreetMap contributors. Returned on every response regardless of which sources actually contributed, for license compliance.X-Data-Sources— Comma-separated list of sources that actually contributed rows to the current response, alphabetical.
A response-level completeness signal (coverage_confidence — "is this result set complete relative to ground truth?") is planned. It depends on a curated ground-truth reference set that's still being built and will return null until the metric is wired.
Refresh cadence
The reconciled dataset is refreshed on a regular cadence by a multi-stage pipeline. Per-source freshness varies: ATP runs weekly (highest), Overture and Foursquare update monthly, OSM is pulled on the same monthly schedule. Aliases (ticker mappings, name variants, CJK transliterations) are recomputed on the same cadence.
Every row carries a last_updated value reflecting its most recent reconciled refresh. Post-multi-source, last_verified distinguishes "when this row was last cross-checked against a source" from "when its fields were last touched."
Known limits
- Total dataset: ~6.2M reconciled rows across ~75K brands across 242 countries (post-cutover).
- Long tail: the deeper into per-brand rank-order, the more coverage relies on OSM alone.
- No retroactive backfill: brands or regions added to source feeds appear in the next refresh, not retroactively in historical responses.
- Closure detection is lagging: a permanently closed store may persist in source feeds for several refresh cycles before falling out.
- Pin precision varies by source: Overture and FSQ provide resolved centroids; OSM nodes are mapper-precision; ATP locations follow whatever the retailer publishes. The 100m reconciliation tolerance reflects this.
Related
response-json— full per-row field referenceparameters— all request parameters in one placebrands— brand search, aliases, and top brands by count