|
|
|
Member 112 posts
Registered: Nov 2010
EDIT: Updated 1/24/2013 to include zero vs ganon games.EDIT #2: Updated 1/27/2013 to include a new 'Teamplay' rating algorithm.EDIT #3: Updated 1/28/2013 to include all group-games, and div2 ratings.EDIT #4: Updated 1/29/2013 to include extra ratings (some sub-components of 'Teamplay' which are RLs Killed, Armors, RL drop %, RL Time, and Avg Power).EDIT #5: Updated 1/30/2013 to include some weapon ratings (RL/LG vs RL/LG, RL/LG vs noweap, etc. FYI, X stands for 'anything but RL/LG').EDIT #6: Updated 1/31/2013 to include a bunch of new stats Click SHIFT-Refresh to load latest image. FPM - Frags per minute FPM (w/ RL) - Frags per minute while one has an RL FPM (w/o RL) - Frags per minute while not having an RL DPM - Deaths per minute Average High Fragstreak - (per game) Average Team Damage - (per game) Average Spawn Kills - (per game) Average Killed RL - Average # of times you killed an RL per game Average Killed Quad - Average # of times you killed quad per game Average Killed Team - Average # of times you killed a teammate per game Average Quads - Average # times you took quad per game Scroll down to see how the new Teamplay rating algorithm is calculated. Rating results from div1 group-games: (Raw Data)Demolyzer was 9 for 12 predicting group-game outcomes. The ones it was wrong were expected to be close games. Rating results from div2 group-games: (Raw Data)(Edited 2013-01-31, 06:03)
Administrator 1025 posts
Registered: Apr 2006
Question is if you have some kind of rounding error in your calculations? I thought that if it expects you to get 69 frags but you get 70, you'd be +1 on performance there? That isn't the case in all situations, sometimes it calculates it right, sometimes it doesn't.
Perhaps its just the presentation format thats in integers, but on the other hand it feels like performance should be calculated on the (int)frags_actual - (int)frags_exp? Do I miss something?
Member 112 posts
Registered: Nov 2010
dimman yes it is just rounding errors when writing the data to file or screen... I've known about it but did not think much of fixing it. But good catch Maybe I'll see if I can fix that in next release. There was also mention about the total frags of 350. Tthe 350 frags is just an arbitrary # that I typed in, as ELO does not predict total score, but rather percentages of total score. So if I choose a score of 350, ELO might say RED will score 200 and BLUE will score 150. 350 seems like a typical 4on4 total score. I am going to post expected scores for next week's games, using this past week's official data. Anyone have the zero vs ganon demo?
Member 386 posts
Registered: Apr 2006
All demos are required to be uploaded to the salvation site as part of the match reporting. Just click the corresponding "D" to the right of the latest games section. Here's Zero vs Ganon.
Member 459 posts
Registered: Mar 2008
Thanks for this, it is really interesting to follow the ratings. The ratings seems to be an OK approximation also.
Do you take into account number of RAs / YAs / Megas taken, RL / LG dropped and also enemy RL / LG / Quad / Ring killed, or is it just damage given, frags and efficiency? At least those factors are also good indicators on how well you have performed on a map, and are just as much a contribution to your team as frags, damage and eff.
Member 459 posts
Registered: Mar 2008
If its not, just an idea:
mega taken equals 100 dmg done ya taken equals 150 dmg done ra taken equals 200 dmg done
or just give all members in one team the same score based on how much armors they took compared to the opponent team. Of course, this would just be a simple approximation, the actual value of the taken item can be worth a lot more or less than that.
The value of killing an RL or LG is abit worse approximate, but maybe it would be possible to calculate based on how much damage that particular player is expected to do with RL or LG once he has it. So for example, killing Milton with RL would result in more dmg given points than killing someone else with RL.
Administrator 1025 posts
Registered: Apr 2006
dimman yes it is just rounding errors when writing the data to file or screen... I've known about it but did not think much of fixing it. But good catch Maybe I'll see if I can fix that in next release. Yeah I think you should fix it since it affects the ratings randomly. It doesn't count much with damage etc since the numbers are really high, but regarding frags. I just happened to notice it when I performed 1 frag better than it expected but the "performance" was +-0, some other game I was +4 but the result showed +3 For some reason it affected me mostly in our games, hehe. I am going to post expected scores for next week's games, using this past week's official data.
Sweet, keep it up!
Moderator 383 posts
Registered: Jan 2006
Thanks for this, it is really interesting to follow the ratings. The ratings seems to be an OK approximation also.
Do you take into account number of RAs / YAs / Megas taken, RL / LG dropped and also enemy RL / LG / Quad / Ring killed, or is it just damage given, frags and efficiency? At least those factors are also good indicators on how well you have performed on a map, and are just as much a contribution to your team as frags, damage and eff. As far as I know besides all things which you mentioned, it also consider how many RL did you donated to mates, how many frags opponents did using that RL/LG which you dropped, how many health did you pick, etc.
Member 112 posts
Registered: Nov 2010
All demos are required to be uploaded to the salvation site as part of the match reporting. Just click the corresponding "D" to the right of the latest games section. Here's Zero vs Ganon. Stev thanks, I actually just missed that demo when I was downloading them from the Salvation site the first time.
Member 112 posts
Registered: Nov 2010
If its not, just an idea:
mega taken equals 100 dmg done ya taken equals 150 dmg done ra taken equals 200 dmg done
or just give all members in one team the same score based on how much armors they took compared to the opponent team. Of course, this would just be a simple approximation, the actual value of the taken item can be worth a lot more or less than that.
The value of killing an RL or LG is abit worse approximate, but maybe it would be possible to calculate based on how much damage that particular player is expected to do with RL or LG once he has it. So for example, killing Milton with RL would result in more dmg given points than killing someone else with RL. Rikoll these are great ideas... I am actually in the process of thinking of a new rating using ideas such as these. It would be another rating to go along with the other 3, not to replace them. I've been in talks with ParadokS on it, but I would like to get some ideas from you and Milton and some others on your opinions. Perhaps this weekend I can put together the details of what would make up this new rating, and you guys can make your cases for + or - features. Things like: # pents # eyes # lgs killed # rls killed # rls dropped would be - points (only if they go to enemy, to rule out team drops on purpose) Time with RL/LG # quads killed Armors picked up
Member 112 posts
Registered: Nov 2010
dimman yes it is just rounding errors when writing the data to file or screen... I've known about it but did not think much of fixing it. But good catch Maybe I'll see if I can fix that in next release. Yeah I think you should fix it since it affects the ratings randomly. It doesn't count much with damage etc since the numbers are really high, but regarding frags. I just happened to notice it when I performed 1 frag better than it expected but the "performance" was +-0, some other game I was +4 but the result showed +3 For some reason it affected me mostly in our games, hehe. I am going to post expected scores for next week's games, using this past week's official data.
Sweet, keep it up! dimman, the ratings calculations are 100% floating point. All those rounding errors are purely cosmetic (UI). I double checked the algorithm tonight.. The algorithm uses % of total frags, so this is where it gets converted to float for (int i = 0; i < teamPlayerCount; ++i) { teamAPercents[i] = (float)scoresTeamA[i] / totalScore; teamBPercents[i] = (float)scoresTeamB[i] / totalScore; } and after that is the final ELO calculation (this is for a single match): for (int index = 0; index < teamPlayerCount; ++index) { eloPointsTeamA[index] = ELOCoefficient * (teamAPercents[index] - teamAExpectedPerformance[index]); eloPointsTeamB[index] = ELOCoefficient * (teamBPercents[index] - teamBExpectedPerformance[index]); }
Member 112 posts
Registered: Nov 2010
Thanks for this, it is really interesting to follow the ratings. The ratings seems to be an OK approximation also.
Do you take into account number of RAs / YAs / Megas taken, RL / LG dropped and also enemy RL / LG / Quad / Ring killed, or is it just damage given, frags and efficiency? At least those factors are also good indicators on how well you have performed on a map, and are just as much a contribution to your team as frags, damage and eff. Rikoll at the moment all Demolyzer takes into account is just: # of Frags Damage Given Efficiency That is it. I am working on a fourth rating to include many finer grain stats, which I mentioned in the other reply.
Member 112 posts
Registered: Nov 2010
OK the screenshots/ratings were updated with the zero vs ganon games. click SHIFT-refresh to refresh them if needed.
Administrator 1025 posts
Registered: Apr 2006
dimman, the ratings calculations are 100% floating point. All those rounding errors are purely cosmetic (UI). Allright, good to know. A fairly easy and accurate enough solution would be to just (conceptually): f < 0 ? f - 0.5 : f + 0.5; for displaying it
Member 112 posts
Registered: Nov 2010
dimman, the ratings calculations are 100% floating point. All those rounding errors are purely cosmetic (UI). Allright, good to know. A fairly easy and accurate enough solution would be to just (conceptually): f < 0 ? f - 0.5 : f + 0.5; for displaying it Yeah I don't even do any calculation for the display, I just 'bind' to the number and let WPF/.NET render it. But, I will look into manually doing such calculation on the backend. Thanks
Member 112 posts
Registered: Nov 2010
Week 2 predictions are in! Remember, the 'total score' here is just an arbitrary number. ELO and iELO does not predict total scores, just how much % of total score each player is 'expected' to get (based upon their current ELO rating). Plus, different maps have different scoring characteristics. zero vs kingpin - Looks like these will be some close games bps vs paradoks - Bps is predicted to win this series, but don't underestimate para milton vs ganon - Milton's team is expected to win huge here, likely 2-0. If ganon gets a win that will be quite a feat. carapace vs rikoll - Rikoll's team is expected to win, but not by much, so anything can happen. Should be some good games to watch.
Member 112 posts
Registered: Nov 2010
I finally added an experimental 'Teamplay" rating. Scroll to top to see the updated stats with the new rating. This rating is not a measure of how well you work with your team. Instead, for each game points are accumulated (or taken) based upon some statistic. For example, killing an enemy RL will get you 50 points. Dropping an RL (only to enemy) will result in -50 points. Dropping an RL to teammate is +50 points. Then there are other more advanced metrics such as time with RL and average power. The full list of items determining points for each game is as follows: + 50 * # RL killed + 50 * # LG killed + 50 * # Quad killed + 30 * # RA Taken + 20 * # YA Taken + 10 * # GA Taken + 50 * # Quads Taken + 100 * # Pents Taken + 50 * # Eyes Taken - 50 * # RL dropped to enemy - 50 * # LG dropped to enemy + 50 * # RL dropped to team + 50 * # LG dropped to team + 40 * # minutes with RL during game + Average Power (over entire game) Power is another calculation which is: (Health + Armor) * 2 [if RL or LG] * 4 [if quad or pent] So if you run around the map with a shotgun for most of the game, your average power will be low. Or, if you are like Milton and run around stacked for the game, your average power will be high. Thus, in my opinion, average power is a good statistic to include in the 'teamplay' rating. Lets take a look at an example where someone has a low frag rating but they really contributed a lot to the team in regards to stats. laksoNotice how even though he has a low frag rating, he still did a lot to contribute to the team. Lets dig in to the games and stats to see how he contributed. There are 4 games, and his team lost all 4. He did not frag high, thus his frag rating is low. However, take a look at his rating in second game down and last game. In the second game down he had a good Teamplay rating because he killed many RL/LGs (18, most of anyone), and also took many quads (5, most of anyone). Still, his team lost (most frags win ). In the last game, again he killed many RL/LGs (8), killed 2 quads, took 4 quads, and had RL for a decent amount of time (7 minutes on e1m2). Now lets take a look at someone with a high frag rating but a low Teamplay rating. Again, this Teamplay rating is not a measure of how well you work with your team. It is just a points based system to estimate contribution. There is more to teamplay and winning than just this rating system. (That is why there are now 4 ratings: frags, efficiency, damage given, and now teamplay). mojemoje probably has a high frag rating since he is playing on Milton's team and thus he scored well during the games due to some blowouts. Even though the iELO "frags" rating takes into account teammate ratings and opponent ratings, if you play on a good team and blow out some opponents, it significantly helps your frag rating. But lets take a look at moje's Teamplay stats to see why those were not very high: It really boiled down to not taking many RAs or quads, dropping some weapons, and some other low teamplay points. But I can't stress enough that this does not mean moje is hurting his team. Milton's team worked great together and if moje is standing next to quad guarding it for Milton or another stacked teammate, obviously that will hurt moje's Teamplay rating, but help their chance of winning. Feel free to criticize the Teamplay algorithm and suggest new stats or point weightings. I am open to all suggestions.
Administrator 1025 posts
Registered: Apr 2006
I think you can only do an algorithm like this "so so" accurate I mean, I'd personally think that dropping an LG is much more OK in most cases than an RL. On the other hand if you drop LG:100 to enemy quad that's even worse than dropping an RL in most cases. So perhaps cells should be taken into account for LG atleast? Currently if you kill enemy 200/200 rl dude with 0/100 lg but die too and drop the lg (nmy pick it up), you get +-0 if I understand your calculations correct, which is a bit weird Perhaps drop LG with more than 50 cells should give -50, otherwise minus nbr-of-cells? Haven't thought it through, but it probably works better for the most of it. This is starting to get really interesting, keep it up Cyanide
Member 112 posts
Registered: Nov 2010
Those are some good points. So, do you think that with its current flaws and all, should I keep the Teamplay rating or get rid of it until I can tweak it some more.
My hope is that as long as it is 'averaged out' over the course of many games, the Teamplay stats become more meaningful. But, those scenarios you describe can't be denied as important. I think I can easily make those LG modifications.
However, the challenge with obtaining those other stats is that when parsing the demo, when a player is killed, only information at the current moment is known, but not earlier information. For example, if milton kills rikoll and rikoll had RL, it is easy to determine that milton killed an RL. But what is not known is:
1. How much health did rikoll have when the battle started? 2. When did the battle between milton and rikoll start? 3. Was it just milton vs rikoll in the battle?
Those are tough to answer unless some very intelligent logic is built into the MVD parser. Some of it may not even be able to be extracted. One crude way of answering #1 is to maybe keep a 3 second trailing window of health/armor stats. So if rikoll dies I can look back to see his status 3 second prior. But, that certainly will not cover all cases. Battles can be very complex.
Administrator 1025 posts
Registered: Apr 2006
Those are some good points. So, do you think that with its current flaws and all, should I keep the Teamplay rating or get rid of it until I can tweak it some more.
My hope is that as long as it is 'averaged out' over the course of many games, the Teamplay stats become more meaningful. But, those scenarios you describe can't be denied as important. I think I can easily make those LG modifications.
However, the challenge with obtaining those other stats is that when parsing the demo, when a player is killed, only information at the current moment is known, but not earlier information. For example, if milton kills rikoll and rikoll had RL, it is easy to determine that milton killed an RL. But what is not known is:
1. How much health did rikoll have when the battle started? 2. When did the battle between milton and rikoll start? 3. Was it just milton vs rikoll in the battle?
Those are tough to answer unless some very intelligent logic is built into the MVD parser. Some of it may not even be able to be extracted. One crude way of answering #1 is to maybe keep a 3 second trailing window of health/armor stats. So if rikoll dies I can look back to see his status 3 second prior. But, that certainly will not cover all cases. Battles can be very complex. I think you should keep it, even if its a "rough" indication only, I still like it . Perhaps its enough just to tweak what you currently have with the given set of limitations etc.
Member 123 posts
Registered: Mar 2006
Average power should not apply to team play rating. I think you should look at what the person did and took compared to their frags, damage and efficiency. If you take a lot of armor you are expected to get some frags, as armor > weapons usually in QW. Taking items (especially armor) is not a good thing if they are wasted. Yet there is some element there to deny it to the enemy team. Example often in 4v4 you have a guy who runs around the whole time as "garbage man" dying over and over but he usually has an important job like taking LG or RL on DM3. Doesn't matter if he dies as long as he takes the items and denies them to the enemy team. I don't know if you can judge "enemy territory" or "proximity to enemy" to reward players who take items the enemy is expected to get.
I'll use an example I know, you know Cyan. In Hoes vs IMM in the final E1M2 I basically gave my team mates stuff I'd usually go for and take. I took 3 YAs to Vegeta's 17. The whole game I ran around with SNG mainly because my ping was terrible and I knew taking RL wouldn't be good. Yet I had 3 quad kills, 5 enemy RL kills along with lowest total armor taken and lowest RL time of anyone. I ended up with 2nd most kills and 2nd best efficiency. Teamplay iELO should look to boost up players who do much with little. For instance in that match my average power was 111 (2nd lowest) while Veg's was 252 (highest by far) yet I had 79% of his frags. Demolyzer seems focused on giving huge ELO to quad runners and such while ignoring the players who enable the quad runner to succeed or players who contribute a lot while not using up many resources.
When I was thinking of how to analyze teamplay in QW better I realized how complex it was so I sort of gave up. Yet I'll give my opinion on what you decided to use though I'm unsure of any point values -
+ 50 * # RL killed + 50 * # LG killed + 50 * # Quad killed
The above are good.
+ 30 * # RA Taken + 20 * # YA Taken + 10 * # GA Taken + 50 * # Quads Taken + 100 * # Pents Taken + 50 * # Eyes Taken
These are pointless because often the best teamplay is to let a team mate take them, especially if they have RL, etc and you don't. Taking them is only better if the enemy is nearby attempting to steal. If you do end up taking these you should get some results from them. If you end up taking many quads in a game and lag behind your team mates in frags that's probably bad teamplay. If you take these items you need to get frags, etc.
- 50 * # RL dropped to enemy - 50 * # LG dropped to enemy + 50 * # RL dropped to team + 50 * # LG dropped to team
These are good, you should add in how many frags the dropped weapon resulted in as a major factor.
+ 40 * # minutes with RL during game + Average Power (over entire game)
I think these are misleading as well. Instead of focusing on who is strong, look at who is weak and contributing. We already have all the other iELO's focused on strength mainly.
News Writer 280 posts
Registered: May 2006
Table for div2? Only 1 game was moved to Monday after 1 map. Other games was played in time.
Member 112 posts
Registered: Nov 2010
Div 2 stats will go up later tonight. Div1 will be updated once the milton and zero demos are uploaded to draft site.
Member 112 posts
Registered: Nov 2010
pg: Great points and I agree with most, and don't disagree with any. Accurately measuring performance or contribution is not a trivial cut-and-dry task. I'll think about your suggestions and see what I can come up with.
Member 112 posts
Registered: Nov 2010
Member 112 posts
Registered: Nov 2010
Updated 1/29/2013 to include extra stats (some sub-components of 'Teamplay' which are RLs Killed, Armors, RL drop %, RL Time, and Avg Power). Interesting how the best players have the highest RL drop percentages. I confirmed it appears accurate, as I looked at rikoll's stats and for several games he dropped a high percentage of RLs he obtained. Sorry rikoll Scroll to top: http://www.quakeworld.nu/forum/topic/5998#92909
Member 459 posts
Registered: Mar 2008
Updated 1/29/2013 to include extra stats (some sub-components of 'Teamplay' which are RLs Killed, Armors, RL drop %, RL Time, and Avg Power). Interesting how the best players have the highest RL drop percentages. I confirmed it appears accurate, as I looked at rikoll's stats and for several games he dropped a high percentage of RLs he obtained. Sorry rikoll Scroll to top: http://www.quakeworld.nu/forum/topic/5998#92909It is quite natural that the players constantly trying to pressure also drops the most, but it is apparent that I need to be a bit more careful It's stats, no reason to apologize. Actually, the only thing provoking me in this post is your urge to apologize! Keep the stats the coming!
News Writer 1267 posts
Registered: Jun 2007
I have been terrible. Did not need these stats to figure that out But perhaps it would look different with some of PGs suggestions. It is hard to measure all the situations in a game that eventually render in a win for your team and how much you did to contribute. Looks its one game missing for bps2 atleast. Me locktar and lice should be on 9 maps. 4-5. Its the bps2-para2 game. Defcons stats are correct he only played 1 game but flinta is missing 3 maps like the above mentioned players. Btw, maybe i missed it somewhere above, but does it take into account if you kill an enemy rl/lg and a teammates gets a weapon from the pack?
Member 112 posts
Registered: Nov 2010
I have been terrible. Did not need these stats to figure that out But perhaps it would look different with some of PGs suggestions. It is hard to measure all the situations in a game that eventually render in a win for your team and how much you did to contribute. Looks its one game missing for bps2 atleast. Me locktar and lice should be on 9 maps. 4-5. Its the bps2-para2 game. Defcons stats are correct he only played 1 game but flinta is missing 3 maps like the above mentioned players. Btw, maybe i missed it somewhere above, but does it take into account if you kill an enemy rl/lg and a teammates gets a weapon from the pack? Hooraytio: I will add that stat eventually (kill and enemy and teammate gets pack)
Member 112 posts
Registered: Nov 2010
rikoll: well spoken. One interesting thing i noticed in stats tonight, is that the good players drop the same amount of RLs as everyone else. But, since they pick up fewer RLs (they hold on to them longer), they have a higher RL drop %. So, that explains the bad performance of the good players i that stat. I thought about having just RLs dropped... but then if you rarely have an RL you will have a low number, and thus have a high rating. These stats can be tricky. I just updated stats to include some weapon stats (RL/LG vs RL/LG, RL/LG vs noweap, etc.). Scroll to top and refresh to see the bigger image. The X in RL/LG vs X stands for 'anything but RL/LG' http://www.quakeworld.nu/forum/topic/5998#92909
|
|
|
|