Data Coverage

Where BrandMappr's location data comes from, what's in and out of scope, and how to tune query results for precision vs recall.

⚠

Coverage transition in progress. Production is mid-transition from a single-source feed (OpenStreetMap) to the four-source reconciled pipeline described below. Live responses today reflect the OSM-only baseline (~2.3M rows total) and may undercount well-known chains relative to canonical figures — for example, McDonald's US currently returns ~4,600 locations against a real-world total closer to 13,800. The four-source reconciled dataset (~6.2M rows) is built and validated; the cutover is operator-gated. Until it lands, the min_confidence, quality, and include_provenance parameters validate but stay no-ops.

Sources

Four sources contribute rows. Each is licensed for commercial reuse and credited in the X-Attribution response header.

Source	License	What it brings
OpenStreetMap (OSM)	ODbL 1.0	Long-tail brand breadth. Ingested via Geofabrik PBF extracts. Strongest where the OSM community is active, weaker where brand tagging is sparse.
Overture Maps Places	CDLA Permissive 2.0	Top-brand coverage and stable cross-release identity via GERS IDs. ~33% of branded rows carry Wikidata QIDs for clean brand-identity joins.
Foursquare Open Source Places	Apache 2.0	Strong category metadata; complements Overture's brand-first model.
All The Places (ATP)	CC0	Weekly web scrapes of retailer store-locator pages. Carries retailer-internal store IDs (`ref`). Highest freshness signal.

Reconciliation

Rows from all four sources are joined per brand and region, deduplicated within a 100m bbox tolerance, and assigned a confidence score that reflects source agreement and a sparse-region noise penalty. When the same physical store appears in multiple sources, contributing source IDs are preserved on the reconciled row (see alias_sources).

The reconciliation runs as a pipeline against a recent snapshot of each source, producing a deterministic Parquet output that's then loaded into the production database. The order in which sources contribute to a brand is per-brand and per-region — there is no single "primary" source globally.

Scope

The v1 data product covers consumer-facing storefronts and POIs only. Corporate footprint is out of scope.

In scope	Out of scope (v1)
Retail stores, restaurants, bank branches, hotels	Corporate HQ, regional offices
Gas stations, pharmacies, supermarkets	Distribution centers, warehouses
Service locations with public access	Manufacturing and processing facilities
EV chargers, fitness clubs, telecom retail	Private B2B sites

Multi-company maps requested via stock ticker (a corporate identifier) return the company's storefront locations, not its corporate sites. A corporate-footprint product line is planned but separate.

Coverage strength by category

Coverage is uneven across categories. Roughly:

Strong — fast food, coffee, gas stations, mass-market retail chains (clothing, electronics, home improvement), pharmacies in developed markets.
Moderate — banks, hotels, supermarkets, specialty retail, telecom retail.
Weaker — regional and local-only chains, niche specialty retail, professional services with storefronts.
Sparse or absent — categories with little public street presence (B2B services, finance back office, manufacturing) and any non-storefront business type.

Coverage is also stronger in developed markets than in emerging markets, reflecting the underlying source data.

💡

To check the live location count for any specific brand before committing to a query, use GET /api/v1/brands/summary?brand=X. The endpoint is public, returns counts plus a 5-row sample, and does not consume credits.

Precision vs recall

Every reconciled row carries a confidence score on [0,1]. Defaults bias toward recall (include everything). Two request parameters let a caller shift toward precision.

`quality` profile

A named threshold for callers who'd rather pick a profile than a number.

Profile	Min confidence	Use case
`verified`	0.8	High-accuracy work — due diligence, regulatory filings, public-facing reports
`balanced`	0.6	Recommended default for analytical use
`recall`	0	Completeness audits, wide-net research, internal tooling

curl -X POST https://api.brandmappr.com/api/v1/location-map \
  -H "Content-Type: application/json" -H "x-api-key: YOUR_KEY" \
  -d '{"brand":"Starbucks","country":"US","quality":"verified"}'

`min_confidence`

An explicit confidence floor in [0,1]. When both min_confidence and quality are present, min_confidence wins.

curl -X POST https://api.brandmappr.com/api/v1/location-map \
  -H "Content-Type: application/json" -H "x-api-key: YOUR_KEY" \
  -d '{"brand":"Starbucks","country":"US","min_confidence":0.75}'

`storefront_type`

Filter to a specific kind of consumer-facing location. Values: full_retail | small_format | pickup | studio | other. Useful when a brand operates a mix — for example, separating standard IKEA stores from IKEA Planning Studios.

Provenance fields

When include_provenance: true, JSON and GeoJSON responses gain five extra fields per row. CSV always omits these (CSV stays at the fixed 11-field shape for compatibility).

Field	Type	Meaning
`source_types`	string array	All sources that corroborate this row, e.g. `["overture", "fsq"]`
`confidence`	number (0–1)	Reconciled confidence score
`storefront_type`	enum	`full_retail \| small_format \| pickup \| studio \| other`
`alias_sources`	object array	Corroborating `{source_type, source_id}` pairs
`last_verified`	ISO timestamp	When the row was last cross-checked against a source (distinct from `last_updated`)

curl -X POST https://api.brandmappr.com/api/v1/location-map \
  -H "Content-Type: application/json" -H "x-api-key: YOUR_KEY" \
  -d '{"brand":"Starbucks","country":"US","include_provenance":true}'

Rows that haven't yet been touched by the multi-source pipeline return null for every provenance field.

Response headers

Every response carries two attribution headers:

X-Attribution — All four contributing sources, alphabetical: All The Places, Foursquare, Overture, OpenStreetMap contributors. Returned on every response regardless of which sources actually contributed, for license compliance.
X-Data-Sources — Comma-separated list of sources that actually contributed rows to the current response, alphabetical.

A response-level completeness signal (coverage_confidence — "is this result set complete relative to ground truth?") is planned. It depends on a curated ground-truth reference set that's still being built and will return null until the metric is wired.

Refresh cadence

The reconciled dataset is refreshed on a regular cadence by a multi-stage pipeline. Per-source freshness varies: ATP runs weekly (highest), Overture and Foursquare update monthly, OSM is pulled on the same monthly schedule. Aliases (ticker mappings, name variants, CJK transliterations) are recomputed on the same cadence.

Every row carries a last_updated value reflecting its most recent reconciled refresh. Post-multi-source, last_verified distinguishes "when this row was last cross-checked against a source" from "when its fields were last touched."

Known limits

Total dataset: ~6.2M reconciled rows across ~75K brands across 242 countries (post-cutover).
Long tail: the deeper into per-brand rank-order, the more coverage relies on OSM alone.
No retroactive backfill: brands or regions added to source feeds appear in the next refresh, not retroactively in historical responses.
Closure detection is lagging: a permanently closed store may persist in source feeds for several refresh cycles before falling out.
Pin precision varies by source: Overture and FSQ provide resolved centroids; OSM nodes are mapper-precision; ATP locations follow whatever the retailer publishes. The 100m reconciliation tolerance reflects this.

response-json — full per-row field reference
parameters — all request parameters in one place
brands — brand search, aliases, and top brands by count

Brand Directory Get every location for a brand