About - College Sailing Data

Project Overview

This site aggregates college sailing results, ratings, and probabilistic forecasts to analyze performance across regattas, teams, skippers, and crews.

11397

Skippers

16626

Crews

191

Schools

3174

Regattas

54334

Races

Data Pipeline

Scrape TechScore: Regattas, divisions, races, finish orders, and sailor rosters are scraped and normalized.
Load Database: CSVs and scrape outputs populate schools, competitors, regattas, races, and race results. Missing sailors are auto-created and assigned to an “Unknown School” when needed.
Attach PMFs: Precomputed probability mass functions (PMFs) from analysis are linked to skipper race results.
Serve Pages: Flask routes query the ORM and render Jinja templates and charts.

Database Layout

schools: name, token, conference
competitors: sailor id, name, school, skipper/crew flags
regattas: name, date, location
races: regatta id, division, race number
race_results: race id, skipper competitor, crew competitor, finish position
ratings: weekly skipper ratings by competitor and week
crew_ratings: weekly crew ratings and performance metrics
race_result_probabilities: PMF JSON aligned to skipper race_result_id

Skipper Rating Methodology

Skipper ratings are derived using a Plackett-Luce model with transition loss minimization to capture relative performance across regattas.

Plackett-Luce Model

The Plackett-Luce model treats each race as a ranking over competitors, where the probability of competitor i finishing in position j depends on their skill parameter θᵢ relative to all remaining competitors. This captures the sequential nature of sailing finishes—each position depends on who has already finished ahead.

Transition Loss Minimization

To ensure ratings evolve smoothly over time, we minimize a transition loss that penalizes large week-to-week changes in skill parameters. This regularization prevents overfitting to individual race results while allowing ratings to adapt to genuine performance trends. The objective balances fit to observed rankings with temporal consistency.

Weekly Updates

Ratings are updated weekly using all available race data up to that point, with more recent results weighted more heavily. The model accounts for varying regatta sizes, division strengths, and field compositions to produce comparable ratings across different competitive contexts.

Example: Skill Values vs Win Probability

The Plackett-Luce model uses exponential transformation: P(team i wins) = exp(θᵢ) / Σ(exp(θⱼ) for all j)

Consider a 4-boat race with skill parameters θ = [3.0, 2.5, 2.0, 1.5]. The model calculates:

Boat A (θ=3.0): e^3.0 = 20.09, so 20.09/(20.09+12.18+7.39+4.48) = 45.4% win chance
Boat B (θ=2.5): e^2.5 = 12.18, so 12.18/(20.09+12.18+7.39+4.48) = 27.6% win chance
Boat C (θ=2.0): e^2.0 = 7.39, so 7.39/(20.09+12.18+7.39+4.48) = 16.7% win chance
Boat D (θ=1.5): e^1.5 = 4.48, so 4.48/(20.09+12.18+7.39+4.48) = 10.1% win chance

With negative skills θ = [-1, 1, 0.5, 0.2], the exponential transformation becomes: e^θ = [0.37, 2.72, 1.65, 1.22]. Win probabilities are:

Boat A (θ=-1): 0.37/(0.37+2.72+1.65+1.22) = 6.1% win chance
Boat B (θ=1): 2.72/(0.37+2.72+1.65+1.22) = 44.8% win chance
Boat C (θ=0.5): 1.65/(0.37+2.72+1.65+1.22) = 27.2% win chance
Boat D (θ=0.2): 1.22/(0.37+2.72+1.65+1.22) = 20.1% win chance

The exponential transformation ensures that small skill differences create meaningful competitive advantages. A 0.5 point increase in skill (3.0→3.5) would boost Boat A's win probability from 45.4% to 52.1%, while a 0.5 point decrease (3.0→2.5) drops it to 27.6%.

Crew Rating Methodology

Crew ratings are derived using a fundamentally different approach that captures the crew's contribution to team performance beyond what the skipper's skill alone would predict.

Weighted Sum Approach

Crew ratings are calculated as a weighted combination of two components:

Skipper Skill Component: The baseline performance expected from the skipper's established rating
Performance Residual Component: The difference between actual team performance and what the skipper's skill alone would predict

Residual Analysis

The performance residual represents the crew's contribution to team success. When a team performs better than the skipper's skill rating would suggest, the positive residual indicates the crew is adding value. Conversely, negative residuals suggest the crew may be limiting the team's potential performance.

Interpretation

This methodology recognizes that crew performance is inherently contextual—it's measured relative to the skipper they're sailing with. A crew rating reflects not just raw sailing ability, but their ability to enhance (or detract from) their skipper's performance. This approach captures the collaborative nature of sailing where crew and skipper work together as a team.

Important Limitations

Crew ratings rely on the assumption that the average crew on a team has a skill value roughly equal to the average skipper. This creates important limitations:

Team Context Dependency: If every crew on a team is an Olympian, then every skipper's baseline performance would be with an Olympian crew, and no result would be too significant
Relative Performance: If every crew on a team is poor and one crew is good, that crew may appear better than they actually are compared to crews from other teams
Cross-Team Comparison: These crew ratings cannot accurately be compared across teams and rather serve to indicate how well they enhance or reduce performance compared to other crews on their own team

Crew ratings are most meaningful when comparing crews within the same team or when teams have similar overall crew skill levels. They should be interpreted as relative performance indicators rather than absolute skill measures.

Rating Distributions

Skipper Ratings

Crew Ratings

Percentiles by Conference

Skippers

Conference	P10	P25	P50	P75	P90
All Conferences	-2.548	-1.390	0.005	1.373	2.422
MAISA	-2.652	-1.362	0.123	1.534	2.501
MCSA	-3.158	-2.203	-1.046	0.167	1.129
NEISA	-1.611	-0.412	0.944	2.064	2.836
NWICSA	-2.078	-1.112	-0.196	0.635	1.344
PCCSC	-2.616	-1.649	-0.370	1.101	2.242
SAISA	-2.542	-1.472	-0.090	1.141	2.335
SEISA	-2.871	-1.922	-0.919	0.372	1.661

Crews

Conference	P10	P25	P50	P75	P90
All Conferences	-0.619	-0.327	-0.043	0.270	0.578
MAISA	-0.621	-0.336	-0.043	0.277	0.577
MCSA	-0.618	-0.339	-0.071	0.222	0.519
NEISA	-0.569	-0.291	-0.003	0.298	0.609
NWICSA	-0.628	-0.329	-0.039	0.283	0.611
PCCSC	-0.682	-0.371	-0.069	0.310	0.632
SAISA	-0.631	-0.331	-0.043	0.240	0.551
SEISA	-0.577	-0.314	-0.056	0.227	0.517

Skipper Rating Distributions by Conference

Each curve shows a normal approximation for the distribution of latest skipper ratings within a conference. The peak aligns near the conference mean.