Data Resources

Datasets, APIs & Packages for Sports Analytics

A curated starting point for finding sports data. Whether you’re building a class project or exploring a research question, these sources cover the major North American sports leagues and beyond.

Public Data & Reference Sites

Free, browser-accessible statistics and historical data — no API key required.

APIs & Data Providers

Programmatic access to sports data, ranging from free tiers to professional subscriptions.

  • Sportradar — official data partner for NFL, NBA, NHL, MLB, and more; academic access available
  • StatsBomb — high-resolution soccer event data; free open data available for select competitions
  • Opta / Stats Perform — event-level data across soccer, American football, basketball, and cricket
  • SportsDataIO — multi-sport API with a free developer tier
  • The Sports DB — open, community-built sports database with a free API

R Packages

Install via install.packages().

  • nflfastR — play-by-play NFL data with expected points and win probability models
  • nflreadr — fast loading of nflverse data including rosters, contracts, and combine results
  • baseballr — scraping tools for Baseball Reference, FanGraphs, and Statcast (Baseball Savant)
  • hoopR — NBA and men’s college basketball play-by-play via ESPN and NBA Stats API
  • wehoop — WNBA and women’s college basketball data
  • worldfootballR — soccer data from FBref, Transfermarkt, and Understat
  • fastRhockey — NHL and PHF play-by-play data
  • sportyR (also available in Python) — draw scale versions of playing surfaces via ggplot2
  • MSUthemes (also available in Python) — The MSUthemes package provides colour palettes and themes for Michigan State University (MSU) and comprehensive colour support for all Big Ten Conference institutions

Python Packages

Install via pip install <package>.

General Data Repositories

Broader repositories that include sports datasets alongside other domains.

  • Kaggle Datasets — search “sports” for community-shared datasets across many sports
  • Harvard Dataverse — peer-reviewed research data deposits, including sports science studies
  • GitHub — many researchers publish cleaned datasets and scraping scripts publicly; search sports analytics data