Nydra’s Minutes: The history of GosuRankings' evolution and changes going forward

Today’s “Nydra’s Minutes” will be a bit different. Usually, these pages offer a concise opinion on hot esports topics, but the current one is written to present a narrative on a feature considered fundamental to the competitive Hearthstone scene—the GosuRankings—through its many iterations up to the most recent change to be implemented very soon.

The major reason behind this edition of the column is to offer transparency: The same transparency I’ve tried to abide by for all four years as head of the Hearthstone section. Although certain parts of our inner algorithms will forever remain undisclosed due to their proprietary nature, the following text should provide enough insight on how we’ve tackled the problems associated with ranking the volatile and expanding nature of competitive Hearthstone and hopefully serve as a clear enough explanation for pro players and fans of the system alike.

Fundamentals of the GosuRankings

The GosuRankings have been a defining feature of GosuGamers since the earliest days, way before Hearthstone was even a thing, and were designed to track the career records of the then two most popular esports: StarCraft: BroodWar and WarCraft 3. As the years went by and we added more games to the GosuGamers portfolio, the GosuRankings ensconced themselves in more and more scenes.

The fundamentals of the GosuRankings are simple. The feature is built on the foundations of the Elo system, where each player earns or loses points depending on who he or she plays. Beat a highly ranked opponent and you win more points. Lose to a low ranked player and see your points drop significantly.

The GosuRankings expanded on the basic Elo algorithm in a number of ways, most notably with the implementation of the so called “tournament importance”, designed to differentiate between major and minor events and adjust point gains accordingly. Certain tournaments will always matter more in esports and we wanted the GosuRankings to reflect that. For years, we’ve been using detailed reference tables to determine the ranking importance of tournaments, based on plethora of factors including level of participants, prize pool, format and so on. For Hearthstone specifically, tournaments were also ranked depending on their relation to the World Championship circuit and full invitational tournaments—which were scene-defining in 2014 but 2015 but less so in the following years—saw their importance significantly reduced.

In the end, the main goal of the rankings is always to find a balance between recent strength and career-wide results for teams and players. In Hearthstone, finding that balance proved significantly more difficult than other games and we encountered no less than five critical problems.

Problems throughout the years

The first brick wall we encountered with Hearthstone rankings was how the game itself behaved as an esport. While traditional disciplines are solely defined by skill and reaching the high echelons of the game is a product of long win streaks as a better player would always—or at least very, very consistently—defeat inferior opponents, Hearthstone threw a curve ball we should’ve predicted. While skill is still integral to identifying good players, the chance of winning or losing at the will of the top deck and regardless of players’ abilities made for rankings that couldn’t be reined in intuitively. Like other card games, Hearthstone’s players are competing against its volatility more so than their opponents, which meant it would take a long time and a heap of matches on record before the rankings would start making sense. Over five months between November 2014 and March 2015, all we did was record as many games as possible, before pushing the rankings live.

What was a good idea on paper was terrible in practice for Hearthstone.

Content at first with the results of the backlog and the initial outline of the rankings, we stumbled into two more roadblocks endemic to our algorithm and which disagreed strongly with the nature of competitive Hearthstone. Traditional Elo systems offer an initial ranking boost to new players in their first N games to help them catch-up with already established players who’ve accumulated high points count. Good idea on paper. Terrible for Hearthstone.

By the time Frederik “Hoej” Nielsen won Viagame HouseCup and ascended to the top 3 of the rankings from nowhere, we knew something had to be done about it, as other players had also had similar rocket jumps, such as Dima “Rdu” Radu after DreamHack Summer and Daniel “Danielctin14” Stanesku after DreamHack Bucharest. These players’ meteoric rises were also in conjunction with yet another coefficient in the GosuRankings equation: The high K-factor.

The K-factor is intrinsic to Elo-based systems and determines the sensitivity of the rankings change. Wikipedia explains it in layman's enough terms:

“

If the K-factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. Too low a K-value, and the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player's actual level of performance.”

Initially using the standard coefficient values determined by FIDE (the World Chess Federation), the K-factor of the GosuRankings was subsequently increased site-wide to battle stagnation in certain esports scenes, most notably Dota 2. Where it fixed Valve’s MOBA to a large extent, it came to bite Hearthstone rankings in the butt. When a game operates on chance as much, if not more, as on skill, and determines who is a good, great or terrible player only after hundreds of games, high amplitude swings are counter-productive.

Therefore, in May 2015, we both abolished the Elo boost for newer players and decreased the K-factor two and a half times, approximately.

This created a ranking system that was far more more focussed on the career-long records of players, but also just flexible enough to reflect recent changes in power. Before the mid-point of 2016, we worked around another problem—an insufficient work force to curate the rankings—in part aided by the slowing down of the tournament activity in Hearthstone.

In all those years, we also fought a different beast altogether, one way more persistent than math. With the unmatched size of the database and sounder algorithms than its competitors, the GosuRankings had become the go-to reference point for player strength and high profile tournaments such as PGL Bucharest, DreamHack, WCA and even HCT had started using and referencing them regularly.

We encountered no less than five critical problems.

This highest form of recognition came with increased involvement from the pro player community, which talked about the rankings on a daily basis, boasting their high scores or criticizing aspects of it they believed to be flawed. The need for clear communication grew immensely and the Hearthstone section was the first to break the iron rule of “no comment”. We started—to the best of our abilities—answering pro player questions while trying to keep the core of the rankings secret for proprietary reasons.

To this day, however, the biggest hurdle that we had to fight was the wide-spread misconceptions about the rankings algorithm. For all the math, workforce and database issues, nothing has hurt the integrity of the rankings more than the false assumptions.

Treacherous unreliability

The most recent problem emerged in the second half of 2016 and was connected with an ambiguous variable, the so-called reliability rating (RR). RR is a number which changes based on the quality of opponents a team or player encounters, as well as the frequency of said encounters. A maximum RR of 1.00 is achieved by playing other active opponents on a regular basis, but it drops when a high-ranked competitor plays semi-active “scrubs” or shows up for games once or twice a month.

RR’s nature wasn’t necessarily a bad thing and the theory behind it was solid—after all, how can you reliably call a player good if all his wins are against inferior opponents or come at irregular intervals—but it overlapped and contradicted with other cogwheels in the Elo system, namely the decay factor put in place to hurt inactive competitors and the K-factor and Rating Differences determining the points gain/loss based on the “quality” of the two opponents.

On top of the redundancy, RR caused a number of other issues where a high-RR player would defeat a low-RR player and gain zero points or even worse—drop down in rank. The decay factor coupled with the RR drop during inactive periods, which are very common in sparsely covered scenes, caused massive drops in every region but Europe, which still enjoyed an active LAN and invitational life in comparison to APAC or even the Americas. Having to answer how reliability worked in a way that makes sense was near impossible.

We gave up on it and looked for solutions.

Current solution and results

As of the next GosuGamers update, RR will be no more, or more precisely—it will be forever fixed on its maximum value of 1.00. As we wanted inactive players to still suffer ranking drops, we have also increased the decay factor, which previously would cost a very minimal (think 1-2) points to inactive players over very long periods of time.

The latest change will help diversify the rankings region-wise.

The obvious result of the change will be that the rankings will look different even without additional games played, as over 34,000 matches—the highest number across all GosuGamers sections—will be recalculated. Some players will rise and others will drop, but we’ve observed minimal changes for the most part, especially in the top 50—nothing which cannot be regained with a couple of matches. With three majors around the corner, including SXSW, PAX and HCT Winter, we expect that the rankings will be attuned by the end of March.

The macro change which we expect to benefit the rankings in the long run is their diversification region-wise. Currently, Europe is heavily dominating the rankings due to its active LAN and invitational scene in 2015-2016. The scarce competition in the other regions previously hurt their players’ RR which made catching up with Europe very difficult. Even though decay will be faster from now on, the decay drops will be nowhere near impactful as the RR drops. As a result, players from regions such as China and Latin America are expected to be ranked higher than before provided they stay consistent.

Before I go, one thing ought to be made clear: No ranking system will ever be perfect. This isn’t meant as a defense of the GosuRankings, or any other ranking system for that matter, but it’s the reality of the situation. Until a closed system with a fixed number of matches for all participants is established akin to traditional sports leagues, collecting data from hundreds of official and third party tournaments will be skewed. To try and fix this is a fool’s errand.

With aforementioned changes in effect, however, we will continue to monitor how the rankings behave for the next few months and keep trying to find the balance between career-wide and recent-tournament results.

I don’t believe we’re there yet, but it’s a start.

Author