-------BEGIN EDIT-------
It is now official, that I am now working with the QWL team to help assist in the development of the ranking and ladder system.
http://www.quakeworld.nu/forum/viewtopic.php?pid=59732
-------END EDIT----------
Background
I am DG|Cyanide, a QW obsessed North American oldschooler from the 1996-2000 era, part of a popular USA clan (at the time) called clan Doppelganger. I returned to QW in 2010 and unsurprisingly everyone’s skills had far surpassed mine. Since the game of QW was more fun than I ever remembered, I knew that I would improve throughout the year as I practiced and bought new hardware (instead of a ball mouse).
Part of the fun of QW for me is the process of practicing, improving, and comparing myself to others. Since I am a software engineer by day, I wanted to use this coding ability to somehow quantify my improvements throughout the year. In the spring of 2010 I wrote a crude utility to parse MVDs and extract match results. I downloaded hundreds of demos from the USA servers for several months, and I came up with a very simple rating system that would calculate a player’s ‘skill’ based upon all of the match results. It turned out to be quite accurate, but I put this project on hold. Until now.
The QR System – Introduction
There are many ways to measure a player’s performance. The most common and most used is a Win/Loss ratio. Other rating systems use more detailed match data such as number of frags, efficiency, number of matches, etc. I wanted to create a rating system that was
• Simple
• Accurate
• Accounts for opponent rating
• Rating can be calculated after just 1 match
• Not affected by # of matches
One of the downsides of the W/L ratio is that it makes it almost impossible to compare the performance of two players by just looking at this ratio, since is easy to go undefeated against much less skilled players.
QR Software
The QR rating can be applied to real world data to observe and compare player performance. To allow me to test and view player ratings, I wrote a software application to analyze match data and display results, ratings, and performance.
The left side shows a list of players ordered by their QR (only players with over 50 matches are visible). It also shows their QR.C (the average of match frag percents) and their QR.O (the average of their opponents’ average of match frag percents). From this screenshot the following observations can be made.
• ParadokS won his matches big, but played too many below average opponents
• BuLaT and locust won big and played average opponents
• slabi did well and played against very skilled opponents
• urban did decent but played against unskilled opponents
The right side shows the result of each match for the selected player on the left (in this case, rikoll).
Opponent - The name of the opponent
QR – The QR of the opponent
Result – Win or Loss
Score – The match score
QR (diff) – The difference between the selected player and the opponent’s QR.
Performance – Shows a player’s amount of ‘beating the spread’ (winning more than expected)
The QR (diff) number is a very interesting way to see which player is ‘expected’ to win. In the screenshot above, rikoll is 13.9 points below Milton so rikoll is expected to lose. But rikoll can still improve his rating while losing to Milton by ‘beating the spread’ and losing less than expected. As an example, in rikoll’s 3rd match against Milton in the screenshot (4th line down), rikoll lost 12-11. He beat the spread by 11.7 QR points and thus it helped improve his rating. However, the screenshot shows many other games where rikoll did much worse than the spread as indicated by all of the red bars. The chart at the bottom right shows a graph of the performance of each match. The orange line is a 10 match average of the performance, so it is a useful way to see if a player is recently underperforming or outperforming. According to rikoll’s chart, he has been underperforming in his last 20 games or so.
DISCLAIMER: In no way is this an attack on rikoll’s skill. Unfortunately, these results do not take into account rikoll’s ping, if he was testing new equipment, if he was drunk during any of the games, or if he was not trying (just having fun as some people say). Please refer to the section below about QR downsides.
The QR System – How it Works
The QR system ([Q]uakeworld [R]ating) goes one level deeper than the W/L ratio and instead uses a player’s percentage of a match’s total score. For example, if Milton beat rickoll 6-4, Milton will have scored 60% of the match’s frags. Since the QR system uses a player’s frag percentage, if Milton were to instead have won 10-0 (thus 100% of the frags), Milton’s QR rating would instead be boosted even more by this better performance.
The QR system takes the average frag percent of all of a player’s matches. If Milton won 3 games and had 60%, 80%, and 100% of the total frags, his average percent would be 80%. This is effectively his ‘core’ rating, as it does not take into account opponent skills.
To take into account opponent skills, once a player’s average frag percent is calculated, one can then calculate the average frag percent of all opponents. This part can be confusing, so just try to remember that all players’ average frag percent has already been calculated. To give an example, if Milton played rikoll, ParadokS, and Cyanide, (and each of these other players had many other games), then each one of them will have an average frag percentage (for all of the players they played, not just Milton). Consider if these percents were 70% for rikoll, 80% for ParadokS, and 30% for Cyanide. The average of Milton’s opponents will be (70+80+30) / 3 = 60%. In summary, Milton averaged a frag percentage of 80% for each match, while the average of his opponents’ averages was 60%.
The QR system combines the 80% and the 60% so that both a player’s skill and his opponents’ skill are included in the final rating. If Milton were to play many unskilled opponents that lost big to many other players, his opponent average could be something like 35%. This would in turn hurt his overall rating. In summary, to increase one’s QR rating, one should either beat similar rated players, or do well against highly rated players.
The QR System – Calculations and Example
Consider the following match data of 4 players where each play a match against the other 3:
Milton 7 - rikoll 3
Milton 7 - ParadokS 3
Milton 9 - Cyanide 1
rikoll 2 - Cyanide 8
rikoll 6 - ParadokS 4
ParadokS 10 - Cyanide 0
Based upon these results, try to rank these players in your head. Careful, there is a trick in there even though rikoll beat ParadokS.
Average frag percents of matches
Milton = (70 + 70 + 90)/3 = 76%
rikoll = (30 + 20 + 60)/3 = 36%
ParadokS = (30 + 40 + 100)/3 = 56%
Cyanide = (10 + 80 + 0)/3 = 30%
Average opponent average frag percents
Milton = (36 + 56 + 30)/3 = 40%
rikoll = (76 + 56 + 30)/3 = 54%
ParadokS = (76 + 36 + 30)/3 = 47%
Cyanide = (76 + 36 + 56)/3 = 56%
Sum of Core and Opponent averages
Milton = 76 + 40 = 116
rikoll = 36 + 54 = 90
ParadokS = 56 + 47 = 103
Cyanide = 30 + 56 = 86
From these results, it is evident that even though rikoll beat ParadokS, ParadokS’s rating was higher because
• The score was close between him and rikoll
• Rikoll lost big to unskilled Cyanide
• ParadockS crushed Cyanide, who beat rikoll
And even though Cyanide beat rikoll, Cyanide lost big to both Milton and ParadokS, hurting his combined rating. This example only demonstrates 4 players and 6 matches, and is already quite accurate (based upon the data). As many more players and many more matches are played, the QR system becomes an even more reliable indicator of performance.
The QR System - Official Calculation
To normalize QR results around a ‘center point’ of 50%, some tweaks are made to the calculations. The normalizing of the rating makes it so that 0 represents the community ‘average’ skill. Positive numbers indicate above average and negative numbers indicate below average.
QR Core = (Average frag percent per match) – 50%
QR Opponent = (Average opponent average frag percent per match) – 50%
Final QR = QR Core + QR Opponent
Milton(QR.C) = 26
Milton(QR.O) = -10
Milton(QR) = 16
rikoll(QR.C) = -14
rikoll(QR.O) = 4
rikoll(QR) = -10
ParadokS(QR.C) = 6
ParadokS(QR.O) = -3
ParadokS(QR) = 3
Cyanide(QR.C) = -20
Cyanide(QR.O) = 6
Cyanide(QR) = -14
The interpretation of these results is as follows. Milton scored extremely high against the community, but on average played less skilled opponents than everyone else (with only 4 players in the data set, and him having the best matches, this should make sense). But Milton performed so well, his total rating was high enough to top everyone’s. Rikoll did poor in his matches, but he played tough opponents. ParadokS did well in his matches, and played slightly below average opponents. Cyanide did very poor in 2 of his 3 his matches, but he played opponents that were much better than him.
Proof the QR System Works
(This is only for the math nerds who might be skeptical of this system)
To test the QR system and its theory, I created a computer generated set of match results using ‘players’ of known skill. I created 100 players where each player played 10 matches against 10 random others. Each player in this list of 100 had a skill corresponding to his number. A higher number meant a higher skill. A match result was created by choosing 2 players and looping 100 times, and as each loop passed, a player’s frag count would increase by 1 based upon a probability factor. A player’s probability factor was his number so player 20 had a 20% probability of getting a frag each loop, and player 99 had a 99% probability of getting a frag each loop.
With just 10 matches per player, the QR system rated almost all 100 players in perfect order from 0 to 100. As more matches were generated, rating became more accurate until all players were ranked in perfect order. This concept is similar to flipping a coin. Flipping a coin will result in heads 50% of the time, but the first several flips may result in heads 75% of the time. Given enough flips, the result will eventually be 50%. The same is true for the imaginary players in the computer generated match results. The QR system was ‘correct’ even after one match, but it took at least several matches for the probabilities to average out and to properly rank the players. The same holds true for real world match results. It is common sense that one or two games is not enough to give a reliable performance rating of a player.
QR Downsides
• ‘Junk’ or drunk games can skew the results
• Players under alias can skew results
• Does not take into account ping or pl
• A low score match (2-0) holds the same weight as a higher score match (20-0)
• Playing on another continent with a much higher ping can skew results
To me the biggest flaw at the moment is the ‘low score match’ downside. A potential work-around for this problem is to add a number of frags to each player’s match result so that 2-0 becomes 7-5 and 20-0 becomes 25 to 5. This will allow close low scoring games to represent an even match, yet still preserving most of the blow-out match results.
To get around the high ping foreign games skewing the results, a player could play with a .na, .eu, or .ru extension so that their original name will not be tainted with poor performance games due to high ping. I plan to play as dgcyanide.eu when I play my end games on the geeky servers.
QR Benefits
• Teams can be rated
• Individuals in 2on2s and 4on4s can still be rated
• Should work awesome for tournaments such as EQL where there is no data to skew the results
• Skill of opponents is factored into the final rating
• Can become very good indicator even after just several games
• A QR is calculated even after 1 game (tie Milton for 1 game and you will have his rating)
• Gives an incentive for lesser skilled players to practice and improve their rating
• Allows less skilled players to enjoy games against skilled opponents by trying to ‘beat the spread’
My favorite feature is the ability for me to now 1on1 players much better than me and get an idea of a score I need to beat in order to increase my rating. No longer do I need to shy away from a game that might lead to a blowout. By preventing the blowout I can increase my rating.
Future Plans
I hope to soon find a way to allow the QR software to download match results from stats.quakeworld.nu so that anyone could run the software and track their own progress (or view others). The QR software will be a free download. Alternatively, the rating system should be simple enough it might be able to be integrated into the stats.quakeworld.nu site. Tournament admins could use the software for ranking players or teams.
Feedback
Feel free to offer criticism or comments. The more flaws that are found and fixed in this rating system, the more robust it becomes.
It is now official, that I am now working with the QWL team to help assist in the development of the ranking and ladder system.
http://www.quakeworld.nu/forum/viewtopic.php?pid=59732
-------END EDIT----------
Background
I am DG|Cyanide, a QW obsessed North American oldschooler from the 1996-2000 era, part of a popular USA clan (at the time) called clan Doppelganger. I returned to QW in 2010 and unsurprisingly everyone’s skills had far surpassed mine. Since the game of QW was more fun than I ever remembered, I knew that I would improve throughout the year as I practiced and bought new hardware (instead of a ball mouse).
Part of the fun of QW for me is the process of practicing, improving, and comparing myself to others. Since I am a software engineer by day, I wanted to use this coding ability to somehow quantify my improvements throughout the year. In the spring of 2010 I wrote a crude utility to parse MVDs and extract match results. I downloaded hundreds of demos from the USA servers for several months, and I came up with a very simple rating system that would calculate a player’s ‘skill’ based upon all of the match results. It turned out to be quite accurate, but I put this project on hold. Until now.
The QR System – Introduction
There are many ways to measure a player’s performance. The most common and most used is a Win/Loss ratio. Other rating systems use more detailed match data such as number of frags, efficiency, number of matches, etc. I wanted to create a rating system that was
• Simple
• Accurate
• Accounts for opponent rating
• Rating can be calculated after just 1 match
• Not affected by # of matches
One of the downsides of the W/L ratio is that it makes it almost impossible to compare the performance of two players by just looking at this ratio, since is easy to go undefeated against much less skilled players.
QR Software
The QR rating can be applied to real world data to observe and compare player performance. To allow me to test and view player ratings, I wrote a software application to analyze match data and display results, ratings, and performance.
The left side shows a list of players ordered by their QR (only players with over 50 matches are visible). It also shows their QR.C (the average of match frag percents) and their QR.O (the average of their opponents’ average of match frag percents). From this screenshot the following observations can be made.
• ParadokS won his matches big, but played too many below average opponents
• BuLaT and locust won big and played average opponents
• slabi did well and played against very skilled opponents
• urban did decent but played against unskilled opponents
The right side shows the result of each match for the selected player on the left (in this case, rikoll).
Opponent - The name of the opponent
QR – The QR of the opponent
Result – Win or Loss
Score – The match score
QR (diff) – The difference between the selected player and the opponent’s QR.
Performance – Shows a player’s amount of ‘beating the spread’ (winning more than expected)
The QR (diff) number is a very interesting way to see which player is ‘expected’ to win. In the screenshot above, rikoll is 13.9 points below Milton so rikoll is expected to lose. But rikoll can still improve his rating while losing to Milton by ‘beating the spread’ and losing less than expected. As an example, in rikoll’s 3rd match against Milton in the screenshot (4th line down), rikoll lost 12-11. He beat the spread by 11.7 QR points and thus it helped improve his rating. However, the screenshot shows many other games where rikoll did much worse than the spread as indicated by all of the red bars. The chart at the bottom right shows a graph of the performance of each match. The orange line is a 10 match average of the performance, so it is a useful way to see if a player is recently underperforming or outperforming. According to rikoll’s chart, he has been underperforming in his last 20 games or so.
DISCLAIMER: In no way is this an attack on rikoll’s skill. Unfortunately, these results do not take into account rikoll’s ping, if he was testing new equipment, if he was drunk during any of the games, or if he was not trying (just having fun as some people say). Please refer to the section below about QR downsides.
The QR System – How it Works
The QR system ([Q]uakeworld [R]ating) goes one level deeper than the W/L ratio and instead uses a player’s percentage of a match’s total score. For example, if Milton beat rickoll 6-4, Milton will have scored 60% of the match’s frags. Since the QR system uses a player’s frag percentage, if Milton were to instead have won 10-0 (thus 100% of the frags), Milton’s QR rating would instead be boosted even more by this better performance.
The QR system takes the average frag percent of all of a player’s matches. If Milton won 3 games and had 60%, 80%, and 100% of the total frags, his average percent would be 80%. This is effectively his ‘core’ rating, as it does not take into account opponent skills.
To take into account opponent skills, once a player’s average frag percent is calculated, one can then calculate the average frag percent of all opponents. This part can be confusing, so just try to remember that all players’ average frag percent has already been calculated. To give an example, if Milton played rikoll, ParadokS, and Cyanide, (and each of these other players had many other games), then each one of them will have an average frag percentage (for all of the players they played, not just Milton). Consider if these percents were 70% for rikoll, 80% for ParadokS, and 30% for Cyanide. The average of Milton’s opponents will be (70+80+30) / 3 = 60%. In summary, Milton averaged a frag percentage of 80% for each match, while the average of his opponents’ averages was 60%.
The QR system combines the 80% and the 60% so that both a player’s skill and his opponents’ skill are included in the final rating. If Milton were to play many unskilled opponents that lost big to many other players, his opponent average could be something like 35%. This would in turn hurt his overall rating. In summary, to increase one’s QR rating, one should either beat similar rated players, or do well against highly rated players.
The QR System – Calculations and Example
Consider the following match data of 4 players where each play a match against the other 3:
Milton 7 - rikoll 3
Milton 7 - ParadokS 3
Milton 9 - Cyanide 1
rikoll 2 - Cyanide 8
rikoll 6 - ParadokS 4
ParadokS 10 - Cyanide 0
Based upon these results, try to rank these players in your head. Careful, there is a trick in there even though rikoll beat ParadokS.
Average frag percents of matches
Milton = (70 + 70 + 90)/3 = 76%
rikoll = (30 + 20 + 60)/3 = 36%
ParadokS = (30 + 40 + 100)/3 = 56%
Cyanide = (10 + 80 + 0)/3 = 30%
Average opponent average frag percents
Milton = (36 + 56 + 30)/3 = 40%
rikoll = (76 + 56 + 30)/3 = 54%
ParadokS = (76 + 36 + 30)/3 = 47%
Cyanide = (76 + 36 + 56)/3 = 56%
Sum of Core and Opponent averages
Milton = 76 + 40 = 116
rikoll = 36 + 54 = 90
ParadokS = 56 + 47 = 103
Cyanide = 30 + 56 = 86
From these results, it is evident that even though rikoll beat ParadokS, ParadokS’s rating was higher because
• The score was close between him and rikoll
• Rikoll lost big to unskilled Cyanide
• ParadockS crushed Cyanide, who beat rikoll
And even though Cyanide beat rikoll, Cyanide lost big to both Milton and ParadokS, hurting his combined rating. This example only demonstrates 4 players and 6 matches, and is already quite accurate (based upon the data). As many more players and many more matches are played, the QR system becomes an even more reliable indicator of performance.
The QR System - Official Calculation
To normalize QR results around a ‘center point’ of 50%, some tweaks are made to the calculations. The normalizing of the rating makes it so that 0 represents the community ‘average’ skill. Positive numbers indicate above average and negative numbers indicate below average.
QR Core = (Average frag percent per match) – 50%
QR Opponent = (Average opponent average frag percent per match) – 50%
Final QR = QR Core + QR Opponent
Milton(QR.C) = 26
Milton(QR.O) = -10
Milton(QR) = 16
rikoll(QR.C) = -14
rikoll(QR.O) = 4
rikoll(QR) = -10
ParadokS(QR.C) = 6
ParadokS(QR.O) = -3
ParadokS(QR) = 3
Cyanide(QR.C) = -20
Cyanide(QR.O) = 6
Cyanide(QR) = -14
The interpretation of these results is as follows. Milton scored extremely high against the community, but on average played less skilled opponents than everyone else (with only 4 players in the data set, and him having the best matches, this should make sense). But Milton performed so well, his total rating was high enough to top everyone’s. Rikoll did poor in his matches, but he played tough opponents. ParadokS did well in his matches, and played slightly below average opponents. Cyanide did very poor in 2 of his 3 his matches, but he played opponents that were much better than him.
Proof the QR System Works
(This is only for the math nerds who might be skeptical of this system)
To test the QR system and its theory, I created a computer generated set of match results using ‘players’ of known skill. I created 100 players where each player played 10 matches against 10 random others. Each player in this list of 100 had a skill corresponding to his number. A higher number meant a higher skill. A match result was created by choosing 2 players and looping 100 times, and as each loop passed, a player’s frag count would increase by 1 based upon a probability factor. A player’s probability factor was his number so player 20 had a 20% probability of getting a frag each loop, and player 99 had a 99% probability of getting a frag each loop.
With just 10 matches per player, the QR system rated almost all 100 players in perfect order from 0 to 100. As more matches were generated, rating became more accurate until all players were ranked in perfect order. This concept is similar to flipping a coin. Flipping a coin will result in heads 50% of the time, but the first several flips may result in heads 75% of the time. Given enough flips, the result will eventually be 50%. The same is true for the imaginary players in the computer generated match results. The QR system was ‘correct’ even after one match, but it took at least several matches for the probabilities to average out and to properly rank the players. The same holds true for real world match results. It is common sense that one or two games is not enough to give a reliable performance rating of a player.
QR Downsides
• ‘Junk’ or drunk games can skew the results
• Players under alias can skew results
• Does not take into account ping or pl
• A low score match (2-0) holds the same weight as a higher score match (20-0)
• Playing on another continent with a much higher ping can skew results
To me the biggest flaw at the moment is the ‘low score match’ downside. A potential work-around for this problem is to add a number of frags to each player’s match result so that 2-0 becomes 7-5 and 20-0 becomes 25 to 5. This will allow close low scoring games to represent an even match, yet still preserving most of the blow-out match results.
To get around the high ping foreign games skewing the results, a player could play with a .na, .eu, or .ru extension so that their original name will not be tainted with poor performance games due to high ping. I plan to play as dgcyanide.eu when I play my end games on the geeky servers.
QR Benefits
• Teams can be rated
• Individuals in 2on2s and 4on4s can still be rated
• Should work awesome for tournaments such as EQL where there is no data to skew the results
• Skill of opponents is factored into the final rating
• Can become very good indicator even after just several games
• A QR is calculated even after 1 game (tie Milton for 1 game and you will have his rating)
• Gives an incentive for lesser skilled players to practice and improve their rating
• Allows less skilled players to enjoy games against skilled opponents by trying to ‘beat the spread’
My favorite feature is the ability for me to now 1on1 players much better than me and get an idea of a score I need to beat in order to increase my rating. No longer do I need to shy away from a game that might lead to a blowout. By preventing the blowout I can increase my rating.
Future Plans
I hope to soon find a way to allow the QR software to download match results from stats.quakeworld.nu so that anyone could run the software and track their own progress (or view others). The QR software will be a free download. Alternatively, the rating system should be simple enough it might be able to be integrated into the stats.quakeworld.nu site. Tournament admins could use the software for ranking players or teams.
Feedback
Feel free to offer criticism or comments. The more flaws that are found and fixed in this rating system, the more robust it becomes.