Yesterday, I read with interest the article about the British Rowing Ranking, by Mel Harbour (@melharbour on Twitter).
As a data geek, this article was really interesting. The article describes how an Elo rating system could be applied to rowing to improve the ranking. After some email discussion with Mel, I decided the (long) article that Mel is referring to about the TrueSkill algorithm, that is apparently used on Xbox Live to work out rankings of the players.
As a Masters rower from the Czech Republic, the article was interesting for a different reason. We had recently closed our season and I was looking for the final ranking. The Czech Masters ranking system works as follows:
- There are 22 races where you can gain points during the season.
- You get 6 points for winning a race, 5 points for coming second, 4 points for coming third, etc
- When multiple age categories start in one race, you get 6 points for winning in your age category, but a point is substracted for each older category participant who beats you. Similarly for coming second in your category but beaten by an older person, you get 5 points.
- The Masters Nationals allocate 12 points for winning, 10 points for second place, etc.
After each race, someone from the Masters Commission sits down, fills out the results, and recalculates the entire table manually.
In total, there were 361 individuals who raced in the 22 races.
Here is the top of the ranking for 2017:
One of the characteristics of ranking this way is that people who race a lot, and people who race a lot in small races with weak opponents float to the top of the table.
I race a lot.
So I was curious to see how the TrueSkill System would rank me. The TrueSkill system makes you gain more points if you beat a strong opponent. It basically keeps track of
- Your estimated skill
- How sure it is of that skill level
Read the referenced articles if you want to know more.
How I calculated my ranking
So I had all the results in the spreadsheet that I copied above ranking from. The results pages look something like this:
I also discovered that there is a trueskill Python library. The great thing is that the library is written in such a way that it handles individual races, team races and races with varying numbers of participants. So in the ranking above, if Malecky and Roosendaal raced Pesat and Cernak in a doubles race, it would calculate points according to the expected and real outcomes, and updates individual points for each of the athletes.
So basically the only thing I had to do to answer my question was to read in the results, parse them correctly into boats and races, run all races through the algorithm and presto!
So that is exactly what I did. Luckily, the excel spreadsheet was pretty consistent. There was always an empty line between boats. So to speed up the results, instead of writing a smart algorithm to read in the entire spreadsheet, I just copy/pasted the relevant cells like this:
So this was a four boat eights race won by Mikulec/Roosendaal/Pazdera/Krocil/Novotny/Hubik/Nahodil/Doubek (my team). The second group of eight names were the second finishing boat and so forth.
It was easy to create a data structure containing athlete’s names and their TrueSkill scores, then match the names from the results to the athletes, and finally do
I did that for all the races and filled the data structure with the athlete’s scores as the season progressed. Of course there were a few errors to weed out (misspelled names), but it was a pretty swift process.
So here was the final result:
On this chart, from left to right, you see the TrueSkill score of the top 7 rowers evolve over the season. I apologize for the character gibberish in the names. The names printed correctly in Python, but the export to Excel messed up the non ASCII characters.
I plotted the top 7 of the TrueSkill ranking. In the legend, the number before the athlete’s name represents their score in the official Czech ranking. A couple of observations:
- The winner, Mr Malecky, is the same. As you can see, he raced a lot. Towards the end of the season his score stabilizes, as he is winning as expected.
- There are two guys, Zitta and Vesecky, who came 5th and 6th in my ranking, were 172th and 40th in the original ranking. Mr Zitta didn’t race a lot. He participated only in 2 of the 22 events, but he beat a lot of high ranking people there. Similar for Jan Vesecky.
Here is the similar chart following the top 7 of the original ranking.
Mr Viktora, 5th in the original ranking, ended up on 35th place in my ranking.
Seeding at the beginning of the season
In my original calculation, I let everybody start with 0 TrueSkill points (same mu and sigma in the TrueSkill calculation). I was wondering if this was fair. Perhaps you should calculate a running ranking that continues from season to season, like they do in Tennis and Chess.
To simulate this, I gave people an initial score based on the 2016 results (and a default score for new people). Interestingly, this resulted in some differences in the outcome. Here is the top 7 with the seeded process:
The top three (Malecky, Roosendaal, Mitas) is still the same, but now you can see Jiri Malecky losing some points (where he was beaten by Zitta) at the beginning of the season. You can also see Andrej Hudec, who didn’t do too well in 2016, climb up in the ranking. Here is the view of how the official top 7 faired:
Looking at Milan Viktora, you wonder if with this ranking system, he would have scored better if he hadn’t raced after May. His score is going down because he raced and didn’t finish in front of people with weaker scores.
There are different ways to rank and I cannot really say that one is better than the other. The advantage of the old system is that one can calculate it by hand. But maybe that isn’t such a big advantage. Going through the data, I discovered incorrect results as well as incorrect points being awarded.
Even the old counting system can be automated, of course.
The TrueSkill based system must be automated, but with the Python I described above, it is pretty easy to hack together a website that does it, based on the official results. Of course, people will be suspicious of me proposing a new ranking system where I jump from fourth to second place!