
The 250,000-Run UK Greyhound Dataset — What We've Learned
RateThatDog has tracked over 250,000 UK greyhound runs and 188,000 pre-race prediction snapshots. Here's the dataset's shape, what it tells us about UK racing, and the patterns it surfaces.
How big is the ratethat.dog dataset?
**250,000+ settled runs** across UK greyhound racing, plus **188,000 pre-race prediction snapshots** with full feature sets — composite scores, suitability components, field speed ratings, race confidence figures. The runs span roughly two years of continuous UK greyhound racing across all 19 active venues, captured live via the GBGB API and stored against immutable race-time predictions.
It's one of the largest open analytical pictures of UK greyhound racing. The point of having that scale isn't bragging rights; it's that statistical patterns become reliable at this size in a way they aren't at 5,000 or 50,000 runs.
What does 250k+ runs tell us about trap bias?
That trap bias is real, big, and venue-specific. **Across all UK racing**, Trap 1 wins about 20% of races (above the 16.67% baseline), Traps 2-5 sit between 17-19%, and Trap 6 sits at about 18.5%. Aggregated, the bias looks modest.
But broken down by venue, the picture is dramatic. **Hove, Monmore, Sheffield 500m: Trap 1 22.5%.** **Harlow 238m: Trap 6 24.3%.** **Yarmouth 462m: Trap 3 22.2%.** **Sunderland 450m: 2.26pp spread, no bias at all.** The aggregate hides the venue-specific edges that actually matter for betting.
What does the dataset tell us about model accuracy?
Top composite picks win **25.5% of races** in out-of-sample validation across the full dataset. At standard distance (380-480m), where most UK racing happens, the rate is **25.8%**. The Hot Dogs filter (composite 60+, sole-above-threshold) hits **28.34%**. These are the headline numbers — drawn from the 188k snapshot set, validated on a 70/30 train/test split by race date.
More importantly, the dataset has been able to identify where the model loses — handicaps, low race confidence races, formerly Dunstall Park mid-grade. Knowing the soft spots is what turns a rating into a usable filter.
What patterns surprised us most?
**The standard-distance Field Speed dominance.** The grid-search over 188k snapshots said the optimal blend at 380-480m is 65% Field Speed and 0% raw Performance. We had assumed Performance would matter more; the data said otherwise. Switching to the Field-Speed-led blend lifted top-pick strike rate from 22.6% to 25.8%.
**The sprint Trap 6 reversal.** We covered this in the sprint piece. The complete inversion of Trap 1 vs Trap 6 win rates between standard distance and sub-300m sprints is one of the most striking patterns in UK greyhound racing — and it's not subtle once you split the data by distance band.
**How flat Sunderland actually is.** Most UK tracks have at least some bias. Sunderland 450m is genuinely flat — 2.26 percentage points between best and worst trap over 16,038 runs. That kind of consistency is rare and worth knowing.
How does the dataset get used day-to-day?
Every racecard on ratethat.dog is generated from the live data, with composite scores computed against the validated blends and snapshot-frozen at race time. The Hindsight page shows prediction-vs-actual on every race so you can audit accuracy yourself. The Track Analysis pages and the broader Track Data overview surface per-track and per-distance trap-bias breakdowns aggregated from the same dataset, while the Dogs database lets you drill into any individual runner's full record.
If you want the deep dive, the methodology pieces — composite score, field speed, suitability, race confidence — explain the underlying maths. The dataset is what those pieces are validating against.
Frequently asked questions
How big is the ratethat.dog UK greyhound dataset?
Over 250,000 settled runs and 188,000+ pre-race prediction snapshots across all 19 active UK greyhound tracks, captured live via the GBGB API.
What's the best-validated insight from the dataset?
That trap bias is dramatic at venue level (e.g. Hove T1 500m at 22.5%, Harlow T6 238m at 24.3%) even though it looks modest in aggregate. Venue-specific bias is the strongest single pattern in UK greyhound racing.
How does the dataset improve the model?
By enabling out-of-sample validation. Every change to the composite score, field speed rating or race confidence is grid-searched across the snapshot set and tested on held-out data before going live.
Can I see model accuracy myself?
Yes — the Hindsight page shows prediction-vs-actual on every race. Strike rate, place rate and ROI are all auditable.
Will the dataset keep growing?
Yes. Every UK greyhound meeting is captured live; the dataset grows by roughly 1,500-2,000 settled runs per week.
