QuakeWorld.nu - A Rating System for Measuring Player or Team Performance

View forums

Rules


Index � General Discussion � A Rating System for Measuring Player or Team Performance

21 posts on 1 page 1

General Discussion

2011-03-06, 08:06

dgCyanide

Member
112 posts

Registered:
Nov 2010

-------BEGIN EDIT-------

It is now official, that I am now working with the QWL team to help assist in the development of the ranking and ladder system.
http://www.quakeworld.nu/forum/viewtopic.php?pid=59732

-------END EDIT----------

Background

I am DG|Cyanide, a QW obsessed North American oldschooler from the 1996-2000 era, part of a popular USA clan (at the time) called clan Doppelganger. I returned to QW in 2010 and unsurprisingly everyone�s skills had far surpassed mine. Since the game of QW was more fun than I ever remembered, I knew that I would improve throughout the year as I practiced and bought new hardware (instead of a ball mouse).

Part of the fun of QW for me is the process of practicing, improving, and comparing myself to others. Since I am a software engineer by day, I wanted to use this coding ability to somehow quantify my improvements throughout the year. In the spring of 2010 I wrote a crude utility to parse MVDs and extract match results. I downloaded hundreds of demos from the USA servers for several months, and I came up with a very simple rating system that would calculate a player�s �skill� based upon all of the match results. It turned out to be quite accurate, but I put this project on hold. Until now.

The QR System � Introduction

There are many ways to measure a player�s performance. The most common and most used is a Win/Loss ratio. Other rating systems use more detailed match data such as number of frags, efficiency, number of matches, etc. I wanted to create a rating system that was

� Simple
� Accurate
� Accounts for opponent rating
� Rating can be calculated after just 1 match
� Not affected by # of matches

One of the downsides of the W/L ratio is that it makes it almost impossible to compare the performance of two players by just looking at this ratio, since is easy to go undefeated against much less skilled players.

QR Software

The QR rating can be applied to real world data to observe and compare player performance. To allow me to test and view player ratings, I wrote a software application to analyze match data and display results, ratings, and performance.

The left side shows a list of players ordered by their QR (only players with over 50 matches are visible). It also shows their QR.C (the average of match frag percents) and their QR.O (the average of their opponents� average of match frag percents). From this screenshot the following observations can be made.

� ParadokS won his matches big, but played too many below average opponents
� BuLaT and locust won big and played average opponents
� slabi did well and played against very skilled opponents
� urban did decent but played against unskilled opponents

The right side shows the result of each match for the selected player on the left (in this case, rikoll).

Opponent - The name of the opponent
QR � The QR of the opponent
Result � Win or Loss
Score � The match score
QR (diff) � The difference between the selected player and the opponent�s QR.
Performance � Shows a player�s amount of �beating the spread� (winning more than expected)

The QR (diff) number is a very interesting way to see which player is �expected� to win. In the screenshot above, rikoll is 13.9 points below Milton so rikoll is expected to lose. But rikoll can still improve his rating while losing to Milton by �beating the spread� and losing less than expected. As an example, in rikoll�s 3rd match against Milton in the screenshot (4th line down), rikoll lost 12-11. He beat the spread by 11.7 QR points and thus it helped improve his rating. However, the screenshot shows many other games where rikoll did much worse than the spread as indicated by all of the red bars. The chart at the bottom right shows a graph of the performance of each match. The orange line is a 10 match average of the performance, so it is a useful way to see if a player is recently underperforming or outperforming. According to rikoll�s chart, he has been underperforming in his last 20 games or so.

DISCLAIMER: In no way is this an attack on rikoll�s skill. Unfortunately, these results do not take into account rikoll�s ping, if he was testing new equipment, if he was drunk during any of the games, or if he was not trying (just having fun as some people say). Please refer to the section below about QR downsides.

The QR System � How it Works

The QR system ([Q]uakeworld [R]ating) goes one level deeper than the W/L ratio and instead uses a player�s percentage of a match�s total score. For example, if Milton beat rickoll 6-4, Milton will have scored 60% of the match�s frags. Since the QR system uses a player�s frag percentage, if Milton were to instead have won 10-0 (thus 100% of the frags), Milton�s QR rating would instead be boosted even more by this better performance.

The QR system takes the average frag percent of all of a player�s matches. If Milton won 3 games and had 60%, 80%, and 100% of the total frags, his average percent would be 80%. This is effectively his �core� rating, as it does not take into account opponent skills.

To take into account opponent skills, once a player�s average frag percent is calculated, one can then calculate the average frag percent of all opponents. This part can be confusing, so just try to remember that all players� average frag percent has already been calculated. To give an example, if Milton played rikoll, ParadokS, and Cyanide, (and each of these other players had many other games), then each one of them will have an average frag percentage (for all of the players they played, not just Milton). Consider if these percents were 70% for rikoll, 80% for ParadokS, and 30% for Cyanide. The average of Milton�s opponents will be (70+80+30) / 3 = 60%. In summary, Milton averaged a frag percentage of 80% for each match, while the average of his opponents� averages was 60%.

The QR system combines the 80% and the 60% so that both a player�s skill and his opponents� skill are included in the final rating. If Milton were to play many unskilled opponents that lost big to many other players, his opponent average could be something like 35%. This would in turn hurt his overall rating. In summary, to increase one�s QR rating, one should either beat similar rated players, or do well against highly rated players.

The QR System � Calculations and Example

Consider the following match data of 4 players where each play a match against the other 3:

Milton 7 - rikoll 3
Milton 7 - ParadokS 3
Milton 9 - Cyanide 1
rikoll 2 - Cyanide 8
rikoll 6 - ParadokS 4
ParadokS 10 - Cyanide 0

Based upon these results, try to rank these players in your head. Careful, there is a trick in there even though rikoll beat ParadokS.

Average frag percents of matches

Milton = (70 + 70 + 90)/3 = 76%
rikoll = (30 + 20 + 60)/3 = 36%
ParadokS = (30 + 40 + 100)/3 = 56%
Cyanide = (10 + 80 + 0)/3 = 30%

Average opponent average frag percents

Milton = (36 + 56 + 30)/3 = 40%
rikoll = (76 + 56 + 30)/3 = 54%
ParadokS = (76 + 36 + 30)/3 = 47%
Cyanide = (76 + 36 + 56)/3 = 56%

Sum of Core and Opponent averages

Milton = 76 + 40 = 116
rikoll = 36 + 54 = 90
ParadokS = 56 + 47 = 103
Cyanide = 30 + 56 = 86

From these results, it is evident that even though rikoll beat ParadokS, ParadokS�s rating was higher because

� The score was close between him and rikoll
� Rikoll lost big to unskilled Cyanide
� ParadockS crushed Cyanide, who beat rikoll

And even though Cyanide beat rikoll, Cyanide lost big to both Milton and ParadokS, hurting his combined rating. This example only demonstrates 4 players and 6 matches, and is already quite accurate (based upon the data). As many more players and many more matches are played, the QR system becomes an even more reliable indicator of performance.

The QR System - Official Calculation

To normalize QR results around a �center point� of 50%, some tweaks are made to the calculations. The normalizing of the rating makes it so that 0 represents the community �average� skill. Positive numbers indicate above average and negative numbers indicate below average.

QR Core = (Average frag percent per match) � 50%
QR Opponent = (Average opponent average frag percent per match) � 50%
Final QR = QR Core + QR Opponent

Milton(QR.C) = 26
Milton(QR.O) = -10
Milton(QR) = 16
rikoll(QR.C) = -14
rikoll(QR.O) = 4
rikoll(QR) = -10
ParadokS(QR.C) = 6
ParadokS(QR.O) = -3
ParadokS(QR) = 3
Cyanide(QR.C) = -20
Cyanide(QR.O) = 6
Cyanide(QR) = -14

The interpretation of these results is as follows. Milton scored extremely high against the community, but on average played less skilled opponents than everyone else (with only 4 players in the data set, and him having the best matches, this should make sense). But Milton performed so well, his total rating was high enough to top everyone�s. Rikoll did poor in his matches, but he played tough opponents. ParadokS did well in his matches, and played slightly below average opponents. Cyanide did very poor in 2 of his 3 his matches, but he played opponents that were much better than him.

Proof the QR System Works

(This is only for the math nerds who might be skeptical of this system)
To test the QR system and its theory, I created a computer generated set of match results using �players� of known skill. I created 100 players where each player played 10 matches against 10 random others. Each player in this list of 100 had a skill corresponding to his number. A higher number meant a higher skill. A match result was created by choosing 2 players and looping 100 times, and as each loop passed, a player�s frag count would increase by 1 based upon a probability factor. A player�s probability factor was his number so player 20 had a 20% probability of getting a frag each loop, and player 99 had a 99% probability of getting a frag each loop.

With just 10 matches per player, the QR system rated almost all 100 players in perfect order from 0 to 100. As more matches were generated, rating became more accurate until all players were ranked in perfect order. This concept is similar to flipping a coin. Flipping a coin will result in heads 50% of the time, but the first several flips may result in heads 75% of the time. Given enough flips, the result will eventually be 50%. The same is true for the imaginary players in the computer generated match results. The QR system was �correct� even after one match, but it took at least several matches for the probabilities to average out and to properly rank the players. The same holds true for real world match results. It is common sense that one or two games is not enough to give a reliable performance rating of a player.

QR Downsides

� �Junk� or drunk games can skew the results
� Players under alias can skew results
� Does not take into account ping or pl
� A low score match (2-0) holds the same weight as a higher score match (20-0)
� Playing on another continent with a much higher ping can skew results

To me the biggest flaw at the moment is the �low score match� downside. A potential work-around for this problem is to add a number of frags to each player�s match result so that 2-0 becomes 7-5 and 20-0 becomes 25 to 5. This will allow close low scoring games to represent an even match, yet still preserving most of the blow-out match results.

To get around the high ping foreign games skewing the results, a player could play with a .na, .eu, or .ru extension so that their original name will not be tainted with poor performance games due to high ping. I plan to play as dgcyanide.eu when I play my end games on the geeky servers.

QR Benefits

� Teams can be rated
� Individuals in 2on2s and 4on4s can still be rated
� Should work awesome for tournaments such as EQL where there is no data to skew the results
� Skill of opponents is factored into the final rating
� Can become very good indicator even after just several games
� A QR is calculated even after 1 game (tie Milton for 1 game and you will have his rating)
� Gives an incentive for lesser skilled players to practice and improve their rating
� Allows less skilled players to enjoy games against skilled opponents by trying to �beat the spread�

My favorite feature is the ability for me to now 1on1 players much better than me and get an idea of a score I need to beat in order to increase my rating. No longer do I need to shy away from a game that might lead to a blowout. By preventing the blowout I can increase my rating.

Future Plans
I hope to soon find a way to allow the QR software to download match results from stats.quakeworld.nu so that anyone could run the software and track their own progress (or view others). The QR software will be a free download. Alternatively, the rating system should be simple enough it might be able to be integrated into the stats.quakeworld.nu site. Tournament admins could use the software for ranking players or teams.

Feedback

Feel free to offer criticism or comments. The more flaws that are found and fixed in this rating system, the more robust it becomes.

2011-03-06, 10:33

ParadokS

Administrator
334 posts

Registered:
Jan 2006

A lot of information to take in. Having such a genious as yourself working on this projects for endless hours, and us mere mortals only a few moments to take it all in.

I suspect others like me are overwhelmed and have a hard time to give a proper response besides... "wow" =D

I think it's brilliant. With the data we have available from sooo many eql seasons and other tournaments i can't wait to see this implemented later and get a nice history overview of many clans and players.

Ofcourse this will be applicable on all future tournaments and even for stats page.
Albeit using it on stats page have the greatest data sample, it's also the most tainted, but it can still be used as a reference somewhat. Especially at the startup of the ladder for example.

We don't need to play 100 games to know our rating, simply put a "stats.qwnu" rating for new players that signup, side by side on the ladder ranking for example

ready!

2011-03-06, 10:35

Kalma

Member
485 posts

Registered:
Feb 2006

Nice.

dgCyanide wrote:

� A low score match (2-0) holds the same weight as a higher score match (20-0)

Gather data from all matches played for a typical score on each map then compare to that. Yes, everyone's ratings would change for every match played, but after a while it should settle.

2011-03-06, 11:28

JohnNy_cz

Member
1435 posts

Registered:
Jan 2006

You should to talk to foogs, grump and pleuraXeraphim, they're working on something related.

2011-03-06, 16:28

dgCyanide

Member
112 posts

Registered:
Nov 2010

ParadokS: yeah there is a ton of information. I would not expect anyone to read the entire thing. That is why there is a screenshot.

I just decided to write as much of it down for the record so that people like pleuraXeraphim have enough information should they desire to use it.

I have been in contact with pleuraXeraphim, and I hang out in #QWL. This full write-up was needed so I could explain it to them.

2011-03-06, 16:36

dgCyanide

Member
112 posts

Registered:
Nov 2010

And I am not a genius. I just have a fever. And the only cure is more QW.

The real geniuses are the ezQuake developers. I am blown away how far the QW client has come in 10 years.

Kalma: Thanks for the idea, I will look into that.

2011-03-06, 16:54

fenris

Member
226 posts

Registered:
Jun 2006

Really cool stuff going on here! I'm not sure if we're going to use it for our ladder that we've been working on right off the bat but maybe in the future we could implement it somehow. I'm pretty sure Grump and Pleura also saw this post.

ParadokS wrote:

genious

dgCyanide wrote:

genious

It's genius by the way. edit: Damn you cyanide you fixed it before I could post!

--irc.quakenet.org #telefrag.me and #QWL | foogsQuakeWorld Ladder

2011-03-06, 17:10

dgCyanide

Member
112 posts

Registered:
Nov 2010

What? I never typed genious. How dare you.

I am going to download and analyze povdmm4 data right now. I am curious as to who is #1 on pov.

2011-03-06, 17:44

swiNg

Administrator
181 posts

Registered:
Jan 2006

IMHO the stats presentation at stats.quakeworld.nu is pretty useless, but creds to developers for trying. Although, the match log section on the site is awesome and very useful.

Anyway... these algorithms looks more accurate and will probably generate better stats, so an implementation of this system on the stats site sure would make it more interresting.

2011-03-06, 17:47

#10

dgCyanide

Member
112 posts

Registered:
Nov 2010

swiNg: Yes the stats.quakeworld.nu site is awesome for having logs of all the matches. I am hoping to get access to this data so that this QR utility can download the data and store it locally for analyzing.

2011-03-06, 17:52

#11

dgCyanide

Member
112 posts

Registered:
Nov 2010

Here are the results for POV matches in the past year:

It was interesting to scan through the matches of xenic, BuLaT, and maga. I think maga beat BuLaT 4 out of 5 games, but I do not know what the pings were. But, BuLaT beat xenic 2 out of 3. Again, pings are not factored in. What does matter is how much each player won each game, and the skill of his opponent.

2011-03-06, 18:11

#12

Hagge

Member
398 posts

Registered:
Feb 2006

Cool stuff indeed, good job 8) I would like to see some of these stats for clans! I beat tVS should have pretty neat stats 8)

2011-03-06, 23:09

#13

dimman

Administrator
1025 posts

Registered:
Apr 2006

dgCyanide wrote:

The real geniuses are the ezQuake developers. I am blown away how far the QW client has come in 10 years.

Doesn't tell the full story though. A lot of features and stuff is originally developed by other dudes from FTEQW project, DarkPlaces, FodQuake to mention a few, which should also get credit

EDIT: Lol, forgot: Great job!

2011-03-07, 05:12

#14

MatriX

Member
150 posts

Registered:
Nov 2006

I always though a Ranking System is the key to maintain players active and attract newcommers.

We could do 2 things here:

1 - Create 1 Official Ranking System, that every game counts (when voted-explained below), but with official Tourneys would just give exta points to ranking.

2 - Create one non-Official Ranking for a daily basis games and one Official Ranking for tourneys only (wich requires more work, i prefer the 1st one).

This is an incentive for admins to create more Tourneys, and players to become more active.

To create a fair system will require things to take into account.

� Accounts for opponent rating
� Not affected by # of matches

These 2 you mention are the most important, so to keep and increase your ranking will need to play against lower but close or higher skill players.

If a player cant play often, this will help secure his ranking longer.

Can you show us how lower ranking can a player challenge other to gain lets say 10% to his ranking ? (to help people know who they can play against and gain something)

If a player is ranked in 10 place, can he play against ranking 15 and still improve ?

But i see an issue (if people decide its an issue) with the calculation in long term games. If Milton keeps winning and his ranking gets so high that even playing with ranking number 2 player (assuming he is n1) will not improve his ranking right ?

This will create a sort of barrier of how distant a ranking can go from each other.

About the QR Downsides:

� A low score match (2-0) holds the same weight as a higher score match (20-0)

I honestly don't see an issue here, because what matters is the frag % right ? (games are not played with fraglimit)

Since you calculate with frags % and not total frags number, this just means that while game is in progress, if you are winning by 2-0 and enemy gets 1 frag you will only get 75%, with 20-0 you will have more chances to take more %, if enemy gets 1 frag, 20-1 is not the same as 2-1. A low score game means it was intense

So, playing to get 50 frags will give you no advantage in frags number, but ensures you get the most % of the game to your ranking. It't not about frags numbers, its about frags diference.

� Does not take into account ping or pl
� Playing on another continent with a much higher ping can skew results

This is an issue that cannot be solved, antilag helps with higher pings, so if players from another continent wants to participate in Ranking, will have to accept their pings.

Just like on real life sports, weather conditions sometimes can favour one team/player, just like ping/pl variations (unless extreme situations).

Only thing to do is choose a server players agree and play best they can.

� �Junk� or drunk games can skew the results
� Players under alias can skew results

For the Ranking System to work, it should only collect data when "asked", this would require that players have to vote for ranking active with maybe "voteranking" with 100% of the votes to be active.

So when people just want to pracc or have fun, they can do so without worrying every match will affect their ranking.

This would also require, that as soon the match ends, the information is updated, so if i play another game it won't screw the calculations.

If players set mode dmm4 on normal maps and vote for ranking, this should not count for ranking or the setting would not get activated when using dmm4 wich such maps.

Also, to avoid nick "unnamed and/or player" to get into the Ranking Table, such information would also be ignored.

As for fakenicks, unless there is some sort of login system, it will happen, but its pointeless anyway, because what a player wants is to see his nick Ranked.

2011-03-07, 08:15

#15

fenris

Member
226 posts

Registered:
Jun 2006

http://www.quakeworld.nu/forum/viewtopic.php?pid=59732

Please look at this thread too!

Me Pleura and Grump are working on a ranking / match system for all players of QuakeWorld. So before you start on anything could you please take a look at the above thread. We have made tremendous progress on this already but I think we are all open for suggestions and ideas. If you have some ideas and suggestions for a rank/match system you should try and join forces with us. After all this is supposed to be a community thing!

Buzz me on IRC (quakenet foogs) and I would love to talk about this stuff. This is what we've been working on for 2 maybe 3? months now.

--irc.quakenet.org #telefrag.me and #QWL | foogsQuakeWorld Ladder

2011-03-08, 04:17

#16

dgCyanide

Member
112 posts

Registered:
Nov 2010

Hi foogs. After my aha moment today, and our discussions in IRC, I edited the post above to clarify a bit on QWL and my understanding. Thanks for all your hard work.

Thanks MatriX for the feedback. I will try to respond in detail later this week when I get more time.

2011-03-08, 12:39

#17

xer

Member
2 posts

Registered:
May 2006

Thought it about time to make my first post

) nice work cyanide

2011-03-08, 19:38

#18

dgCyanide

Member
112 posts

Registered:
Nov 2010

I am now working with QWL admins to assist the development of the ranking and ladder system.

2011-03-10, 01:40

#19

d4rin

Member
370 posts

Registered:
Mar 2008

dgCyanide - Your doings would make me want to play QW a lot more...

2011-03-10, 12:18

#20

otto

Member
21 posts

Registered:
Apr 2007

I'm speechless. great work, dude!

Can't wait to see QR reports of the regular season of the EQL divisions!
Does this accept .logs created by ezquake? Or do you need mvd parser to create logs from .mvds and then use those logs at QR?

2012-02-20, 09:11

#21

chu082011

Member
1 post

Registered:
Feb 2012

otto wrote:

I have got some my ideals. I'll share it when i finishes it. I'll return back.

We also find them more same at: Performance ranking

Rgs.