When the Model Meets Reality: Building a Draft Helper, Part 2

Logan Park·March 25, 2026

PythonMLData EngineeringGame TheoryRiot API

In Part 1, I laid out the design for a draft analysis tool — a system that profiles team compositions across multiple dimensions and suggests picks that fill the gaps. The architecture made sense on paper. The pipeline was clean. The hypothesis was sound.

Then I built it, pulled 22,000 ranked matches from the Riot API, and tested it.

The UI came together well. The radar chart showing your team's dimensional profile was immediately readable. The ban system correctly excluded champions from recommendations. The autocomplete search was fast. The role filter dropdown worked. All the mechanical parts did what they were supposed to do.

The recommendations were the problem.

What Went Wrong

The first time I ran a draft — K'Sante top, Amumu jungle, Ahri mid — and looked at the suggestions, most of them were S-tier. That was the first red flag. When everything is S-tier, nothing is. The system was ranking champions almost entirely by raw stat contribution without enough context about what actually makes a good pick in a given situation.

Worse, the suggestions were full of off-meta picks. Champions that technically scored well on paper but that no reasonable player would pick in that slot. The system didn't know the difference between a champion that legitimately fills a role and one that showed up three times in the dataset and happened to win.

I tried filtering by role. That helped surface more reasonable picks, but it introduced a different problem. When I filtered for ADC, the system was still weighting CC and tankiness as heavily as damage. It was suggesting picks based on how much they improved the team's overall profile across every dimension equally — but an ADC's job isn't to bring CC. That's the support's job. Or the jungle's. Or nobody's, depending on the comp.

And then there was the support role. The system kept pushing tanky, CC-heavy supports even when the team already had plenty of lockdown. It didn't understand that Brand support and Vel'Koz support are real picks that serve a completely different purpose — damage from the support position, freeing up other roles to play utility or engage. Senna building full lethality is a support. Pyke is an assassin played as a support. The category "support" doesn't mean one thing.

Top lane had the same issue in reverse. It leaned toward tankiness, but Gangplank, Riven, Jayce — these are glass cannon tops with high damage and sometimes true damage or utility. They're perfectly valid, and sometimes they're exactly what the team needs. But the system treated "top" as synonymous with "tanky."

Jungle was the most interesting case. The role has so much diversity now that it's almost position-agnostic. You can jungle with tanks, assassins, mages, bruisers. The meta changes every few patches, and champions that were never designed for the jungle end up viable there. I kept going back and forth on how to handle this.

Why Simplistic Fixes Don't Work

My first instinct was to add role-specific weights. Something like: when filtering for ADC, multiply the damage gap by 2x and the tankiness gap by 0.2x. For support, boost CC and utility, dampen damage.

I stopped myself. That would have made things worse.

The problem is that every role has multiple valid archetypes, and hard-coding weights assumes there's one correct way to play each position. If I weight support toward CC, Brand support never gets recommended even when the team already has three forms of hard engage and desperately needs damage. If I weight top toward tankiness, Gangplank gets buried every time.

The model would be encoding my assumptions about how the game should be played rather than learning from how it's actually played. And my assumptions were already wrong once — that's how I got here.

What the Data Needs to Tell Us

The real fix is to let the match data define what each position contributes in winning teams. Not what I think a support should do — what winning supports actually do, at each elo, on each patch.

The Riot API gives you teamPosition for every participant — TOP, JUNGLE, MIDDLE, BOTTOM, UTILITY. So instead of just scoring champions globally, you can score them per-position. The same champion might have completely different stats when played mid versus support. That's real information the system should use.

I rebuilt the scoring pipeline to aggregate per-champion, per-position stats. Now instead of one profile for Brand, there's a profile for Brand-MIDDLE and a separate profile for Brand-UTILITY. The Brand that shows up in support games does different damage, takes different amounts of punishment, and provides different CC than the Brand in mid. The data captures that.

From there, you can build position-aware winning profiles. Instead of "winning teams have X total CC," you learn "in winning teams at Gold elo, the UTILITY position averages Y CC, Z utility, and W damage." Now when someone asks for a support recommendation, the system compares candidates against what winning supports actually look like — which naturally includes both Leona-style engage and Brand-style damage, weighted by how often each archetype appears and wins.

The Archetype Problem

But position-aware profiles are still averages. The "average winning support" is a blend of engage tanks, enchanters, and damage supports — which means it describes none of them accurately. The average of Leona and Brand is a champion that doesn't exist.

This is where archetype clustering comes in. If you have enough data, you can group champions within each position by their dimensional profiles. Champions with high CC and tankiness cluster together (engage supports). Champions with high damage and low tankiness cluster together (damage supports). Champions with high utility and healing cluster together (enchanters).

Then the recommendation changes from "your team needs a support" to "your team has enough CC and engage — what it's missing is sustained damage, and here are supports that provide that." The system understands that the team's needs determine which archetype is appropriate, not the role label.

This requires volume. You can't cluster reliably from a few hundred games. I've been running a continuous data collector across Silver through Diamond, pulling matches 24/7 within the API rate limits. At 22,000 matches so far, position-level scoring is solid. Archetype clustering needs more, but it's getting there.

The Player Problem

There's another layer that's harder to solve: player skill warps everything.

A solo lane with a tough matchup might look terrible in the early game stats, but the player could be significantly better than their opponent and end up dominating teamfights. In the right hands, every champion can seem overpowered. The data doesn't know that the Riven who went 0/3 in lane came back and carried every teamfight because the player just understands the champion deeply.

The draft helper can't account for individual player skill, and it shouldn't try. The tool is designed to be player-agnostic — it helps balance the team composition's missing dimensions so that regardless of who's playing, the team has the tools it needs. The player's job is to pick something from the suggestions that they're confident playing. The system's job is to make sure those suggestions actually fill the gaps.

This is where I want to add a "your role" input. If you tell the system you're the ADC, and optionally that you're duo with the support, it can focus its recommendations specifically on your position and factor in synergy with your duo partner. The three randoms on your team are variables you can't control — but you can make sure your own pick and your duo's pick are working together and covering what the team needs.

What's Still Missing

Testing the prototype surfaced a list of things the system needs that I didn't anticipate in the original design:

Lane matchups and counter-picks. The current system only looks at team composition holistically. It doesn't know that picking Vayne into Caitlyn is going to be a miserable lane phase regardless of how well she completes the team profile. Adding matchup data — win rates for specific champion-vs-champion lanes — would let the system warn you when a suggestion looks good on paper but is going to get you crushed in lane.

Enemy team profiling. Right now the enemy team input just excludes those champions from recommendations. But there's useful information there. If the enemy team is stacking AP damage, your team needs magic resist. If they have no engage, your team might not need as much peel. Showing the enemy team's dimensional profile alongside yours would let players see where both sides are strong and weak.

Clearer communication. The gap bars work, but they need language. "Currently sufficient in: Hard CC, Tankiness" and "Team comp currently lacking: AP damage, Utility" says more than a colored bar at 60%. The tool should speak in terms the player already thinks in.

Joker picks. Off-meta champions shouldn't be eliminated — they should be flagged separately. A Swain ADC or a Karthus jungle might be exactly right in a specific situation. But the system needs to clearly distinguish between "this is a standard, well-supported recommendation" and "this works in the data but it's unconventional." Let the player decide how adventurous they want to be.

The Bigger Pattern

Every problem I ran into with the draft helper comes back to the same thing: I was encoding assumptions instead of learning from data. I assumed supports should bring CC. I assumed tops should be tanky. I assumed role labels meant something fixed. The game is more fluid than that, and the tool has to be too.

This is the same tension I wrote about in my first essay on this site — the difference between systems that make decisions for people and systems that help people make better decisions. A tool that hard-codes "support = CC" is making the decision. A tool that shows you "your team is heavy on CC already, here's what winning teams look like when they go damage support instead" is giving you information and letting you choose.

Building the second version is harder than building the first. The first version was an architecture problem — how do you pipe data through a scoring model into recommendations. The second version is a modeling problem — how do you represent the actual complexity of the game without either oversimplifying it or drowning in edge cases.

Once I test the MVP V2, I will follow up with new analysis and with actual tests of my own to see how well it fares in the real games.