Blog··12 min read

MLS

TRREB data quality: what we found inside 1.2 million listings

An engineer's tour of TRREB's AMPRE API — what's great, what's broken, and how we ended up rewriting two of their fields before any listing showed up on a BrokerFold site.

MH

Mousa Husseini

Founder, BrokerFold

Every agent website in the GTA is, in one way or another, a thin UI over the Toronto Regional Real Estate Board's data feed. That feed used to be a proprietary RETS server and a PDF of field mappings; in 2024 TRREB finished migrating to AMPRE, a modern RESO Web API implementation that serves the same data as JSON over HTTPS with an OAuth bearer flow. It's a big upgrade. It is also, like every MLS feed on earth, full of quiet landmines.

We built BrokerFold on top of AMPRE. Our ingest pipeline has pulled roughly 1.2 million TRREB listings — active, sold, and archived — through its nightly and incremental syncs over the last year. This is a tour of what we found, what we had to fix, and what it means for the websites Canadian realtors run.

The shape of the feed

AMPRE is a Property resource with 260+ fields per record, plus separateMedia, OpenHouse, PropertyHistory, and Room collections. Changes flow through an incremental ModificationTimestamp filter — the same pattern every RESO-compliant feed uses. In steady state we pull deltas every 15 minutes; in catastrophic catch-up (a region we just added, a backfill after an outage) we paginate at 5,000 records per request with a bounded concurrency of eight requests.

HTTP
GET /odata/Property? $filter=ModificationTimestamp ge 2026-04-24T00:00:00Z &$orderby=ModificationTimestamp &$top=5000 &$select=ListingKey,ModificationTimestamp,...

That's the happy path. The interesting stuff shows up the moment you try to actually render a listing on an agent's website.

Problem 1 — Coordinates TRREB is not allowed to share

The first surprise: AMPRE ships Latitude and Longitude fields, but for most Ontario listings they're null. Not because TRREB refuses — they simply aren't allowed to redistribute them.

The upstream data provider is the one sitting on the geocodes, and their agreement with TRREB doesn't include redistribution to third parties like us. The effect on our side is that about two-thirds of TRREB listings arrive without coordinates at all, which is a hard stop for any feature that needs to put a pin on a map — search-by-map, draw-to- filter, school catchment overlays, even something as basic as the "listings near me" widget.

You cannot build a modern real estate website on an MLS feed that doesn't give you coordinates. But that is the feed you've been handed — so the question is what you do next.

What we do instead

For every listing without coordinates, we geocode the UnparsedAddress field ourselves, hit by hit, with a three-tier fallback:

  1. NRCan Geocoder. Free, official, and very good on residential Canadian addresses. Gets us about 91% rooftop accuracy on TRREB-style addresses.
  2. Geocodio. Paid, but has the cleanest postal-code routing and handles ambiguous addresses ("101 Main Street W") better than NRCan.
  3. Structured fallback from the address components. If both fail — and on some pre-construction listings they do — we geocode the PostalCode centroid and flag the listing as "approximate location" so the UI renders it conservatively.

The result is a derived coordinates column we control, with a coordinate_source column that records whether it came from NRCan, Geocodio, or centroid fallback. About 0.8% of listings end up at a postal-code centroid; everything else hits a real rooftop.

Problem 2 — Photo URLs that expire

AMPRE's Media resource returns photo URLs from an Azure Blob Storage bucket behind a signed-URL CDN. Those signatures have a short TTL — sometimes a day, sometimes less than an hour. If you fetch a listing on Monday and render the photo URL on Wednesday, you'll serve your users a broken image.

The fix is to mirror every photo on ingest. For BrokerFold, that means writing every TRREB photo to our own Cloudflare R2 bucket the first time we see it, with the MLS number + order index as the R2 key. We then rewrite the URL before it ever reaches a template.

TypeScript
// After fetching AMPRE Media for a listing for (const media of listing.Media) { const r2Key = `listings/${listing.ListingKey}/${media.Order}.jpg`; if (!(await r2.headObject(r2Key))) { const bytes = await fetch(media.MediaURL).then((r) => r.arrayBuffer()); await r2.putObject(r2Key, bytes, { contentType: "image/jpeg" }); } media.MediaURL = `${R2_CDN}/${r2Key}`; }

The storage cost is real — at peak, TRREB alone ships about 11 TB of photos, and the tail of archived listings adds 30 TB more — but it's the only way to build a site you won't be embarrassed by a week from now.

As a side effect, we get to serve AVIF + WebP versions of every image at edge latency, which turns out to be a meaningful SEO win for listing detail pages. Agents whose old WordPress sites hotlinked to TRREB's CDN routinely lost Core Web Vitals points to image load time; on BrokerFold, the same listings score 95+ on mobile.

Problem 3 — Address strings with structural inconsistency

The UnparsedAddress field is what you'd expect: a single string like "101 Main Street W, Toronto, ON M5V 1A1". Except when it's "101 Main St W". Or "101 MAIN STREET WEST". Or "Unit 402 - 101 Main St. W.". The same agent, the same brokerage, entering similar listings will often produce structurally different strings.

Why this matters: the address is the primary display string, the SEO slug, the og:title, the map marker label, and the key for "show me everything that's been listed at this address" queries. Inconsistency across all of those is unacceptable on a site that wants to look like it was built with intent.

We run every address through a normalizer before it lands in the database:

  • Expand compass abbreviations (W → West, NE → Northeast) consistently.
  • Normalize street-type abbreviations using a Canada Post-sourced mapping (St → Street, Ave → Avenue, Blvd → Boulevard).
  • Strip unit separators to a canonical "Unit X" prefix so "#402", "Apt 402", and "402 -" all collapse.
  • Title-case the result, with a short exceptions list for tokens like "PH" (penthouse) that should stay uppercase.

We keep the original string around as raw_address for audit, and power all rendering off the normalized form. The output is boring. That's the point.

Problem 4 — Room dimensions that are stored as strings

TRREB rooms are stored in the Room collection, and each room has RoomDimensions — a single string. Sometimes it's "4.2 x 3.8" (metric, implied). Sometimes it's "14 x 12" (imperial, implied). Sometimes it's "4.2m x 3.8m". Sometimes it's "14ft x 12ft". Occasionally it's just "14".

We parse this into length_m, width_m, length_ft, and width_ft columns, with a best-guess unit detector that falls back to feet if nothing else is obvious. The detector picks metric if either dimension exceeds 10 (typical imperial rooms don't), picks imperial otherwise, and lets the agent toggle preferred units on the listing page.

This looks like trivial work. It is not. Agents on competitor platforms routinely end up with listings showing "14ft x 12ft" on a metric listing page, or — worse — "4.2 x 3.8" labelled as feet next to a room that's the size of an SUV.

Problem 5 — Status transitions that don't emit webhooks

AMPRE's event model is pure pull — there are no webhooks on status change. A listing that went from Active to Sold Conditional to Sold last night is indistinguishable from a listing that had a minor price edit, because both move the ModificationTimestamp forward.

For most agent sites this is fine — you pick up the change at the next sync and the listing's "Sold" badge appears a few minutes later. But for features like "alert me when anything on my saved search changes status," the delta has to be computed on our side. We snapshot the StandardStatus field on every ingest and emit our own status-change events that the CRM and AI lead-response system subscribe to.

This is the kind of infrastructure that looks like overkill until your first customer asks, "can you text me when any of my listings get a firm offer?" — at which point it's the only way to answer honestly.

What this means for an agent website

The short version: you cannot ship a real-estate website by piping AMPRE into a templating engine. The feed is excellent raw material, but it's raw. Between you and a clean listing page is a pipeline of normalization, geocoding, image mirroring, and event derivation — and if you don't build that pipeline, you inherit all of its absence.

Most vendors in the Canadian market ship the absence. Their maps are missing pins for half the listings. Their photos 404 after a week. Their "semi-detached" filters miss listings whose agent spelled it "semi detached". Their room sizes show feet as metres. They know; it's just that fixing it isn't profitable at their price point.

Data quality is not a feature you add. It is the foundation on which the features stand. A listing page on a clean feed feels good in a way you can't fake.

We think Canadian realtors deserve better than the absence. So the work above is not a premium tier or an enterprise upsell on BrokerFold — it's the product. Every tenant, starting the moment their TRREB credentials land in our setup flow, gets the clean output.

If you're a realtor who has been quietly irritated by the small inconsistencies on your current website, they were never your fault. They're upstream. And they are fixable.

MLSEngineeringData
MH

Mousa Husseini

Founder, BrokerFold

← All posts

Build your agent website on the engine behind this post.

BrokerFold is the workspace the post was written from. Every clean-up we describe runs on every tenant, every night.