Thread: [WIP] Big thread of statistical goodies - Substantial update, Dec 04

Page 1 of 2 12 LastLast
Results 1 to 20 of 37
  1. #1
    Offline
    Account Icon
    Join Date
    Dec 2009
    Location
    Zurich, Switzerland
    Posts
    202

    [WIP] Big thread of statistical goodies - Substantial update, Dec 04

    Table of contents


    1. Correlation of game stats and mmr
    • 1.1 Motivation
    • 1.2 Some comments about the method
    • i) Data and sampling
    • ii) mmr distribution of players
    • 1.3 Wards and mmr correlation
    • 1.4 Duration of the game and mmr correlation
    • 1.5 Deny count and mmr correlation
    • 1.6 APM and mmr correlation
    • 1.7 GPM, XPM and mmr correlation
    • 1.8 Games played and mmr correlation
    • 1.9 Win ratio and mmr correlation
    • 1.10 (K+A)/D and mmr correlation
    • 1.11 Are the calculations consistent despite using sample or Is the sample big enough?

    2. Correlation of time when game is played and mmr

    • 2.1 Motivation
    • 2.2. Some notes about the method
    • 2.3. When is tmm played
    • 2.4 What is average mmr of these matches; conclusion

    3. Duration distribution of the matches
    • 3.1. Motivation
    • 3.2. Short comment about method and general distribution of matched duration
    • 3.3. Decrease of average match time with mmr
    • 3.4. Average duration distribution in different mmr brackets; conclusion

    4. Account age and mmr correlation
    • 4.1. Motivation
    • 4.2. Method and how to deduce when was account created
    • 4.3. mmr distribution of players with respect to their account age; conclusion

    5. Heroes picked e and mmr correlation
    • 4.1. Motivation
    • 4.2. Comment about method
    • 4.3. Distribution of picking rate of agility heroes
    • 5.4. Distribution of picking rate of intelligence heroes
    • 5.5. Distribution of picking rate of strength heroes
    • 5.6. Distribution of picking rate of ranged/melee heroes
    • 5.7. Distribution of picking rate for different heroes (3 examples)
    • 5.8. Distribution of picking rate for different heroes (all heroes - download)

    6. Future directions - some additional questions that might be interesting to analyze?
    7. Other threads that might interest you
    8. Contact

    1. Correlation of game stats and mmr

    1.1 Motivation

    As a regular forum user I have spent a lot of time lurking and reading different topics and post such as relation of APM with MMR, importance of denies, KDA ratio etc. Often these topic finish in flame and users defend their opinion based on personal or anecdotal evidence at best. This thread may shed just a bit of light on these topics as the correlation between of mmr and other game statistics is explored.

    1.2 Some comments about the method

    i) Data and sampling

    For various problems, data is acquired from various sources.

    Main source is player ladder (heroesofnewerth.com/player_ladder.php), which was scrapped on two occasions, once with data that were generated on Oct 29, 2012 and once with data that were generated on Nov 29, 2012. From ladder we can take information about players MMR, number of wins and loses, their average kills, death and assists (in further text K/D/A), their average XPM (experience per minute), average GPM (gold per minute) and average APM (actions per minute).

    Second source was heroesofnewerth.com/player_stats.php. This source gave me information about which heroes has player picked during his lifetime.

    Last source was honedge.com. Scraping this site with players names gave me information about average number of wards per game, average deny count and average length of the game and their last matches (6 of them.) Scraping with match IDs gave me information of time of match (when was it played) and what was average mmr of players in the match.

    Standard sample size was 3 000 different, unique players. All of them are what S2 calls "active players", defined by the fact that they have played more than 10 games overall and have logged in the last 30 days.
    Players for sample were found using random number generator. Random number generator was modified in such a way that the "close to uniform" distribution of player mmr was achived.

    Image below shows what would be rating distribution of players if one would choose 3000 players absolutely randomly from the pool of players.


    That is not what I want. If I did that, I would have mass of data points in the 1500 range, and no data points on the edges of mmr range. For instance, in the above graph, one can see random number generator did not select a single player above 1900 range. What I want is to have sample that is uniform in the mmr, so that each mmr range gives approximately same (large) number of points.

    One possible solution would be to test much larger number of points; well pretty much all players, or a large subsection of the player base (my first estimate that one would have to test at least 1/3 of player base to get satisfactory results, which is around 100 000 players). That would be computationally hard and again would miss large number of players with high mmr and low mmr (the ones in which I am interested) and give me bunch of points around 1500 range.

    That is why the random seed is modified, that it picks players in a way that it has larger probability to pick players with high or low mmr. I have used one set of parameters of box distribution, which is seen below.



    One resulting usual sample would like on image below. There is a bit of dropdown towards the ends of the graphs, simply because there are not enough players in these brackets to achieve uniformity across the board. For instance, there are only 14 players above 2000 mmr (in October 19 data, now there are more), so naturally, algorithm can not find approximately 100 data points for players in 2000-2050 bracket. If you are worried that sample is not big enough, see section 1.11.



    When looking at graphs in following sections, notice that some of the graphics are Log-Linear, while some are Linear-Linear. This is also always stated above the graph, but remember to check bar and their values to really understand what is going on.

    Fitting curve is for (almost) all graphs made by fitting data to 1+a*mmr+b*mmr^{2}, where a and b are constants. Where that is not the case it is explicitly mentioned. Even though it might not be best fit for all graphs, I have used it for all of them in sake of simplicity and consistency. Fits have been drawn on all graphs for mmr between 1050-1950. Outside these values I feel that statistics is not strong enough allow for extensions of fits.



    ii) mmr distribution of players





    Distribution of players, according to their MMR. This is same graph that you can see for yourself at heroesofnewerth.com/player_ladder.php. This is situation as recorded on Nov 29, 23:59 ETC. There are 298 534 (active) players shown. Gaussian curve that best describes data is given with (x=1533.5, sigma=113.7).
    Data that record situation at Oct 19 is best fitted with parameters (x=1527.5, sigma=112.2). There were 333 681 active players at that moment.
    Notice small bumps of players at 1600, 1700, 1800 and 1900 -> players that have reached new mmr milestone and have left their account on that mmr for some time (for instance, like me; I have been playing hard from the time I hit 1680 until I reached 1700 and now I have stopped playing for some time.) This effect is also noticeable at 2000 although it can not be seen on graph. For instance, at the time of writing (data generated on Dec 3), there are 4 players with either 2000 or 2001 MMR, but none with 1998 or 1999.
    In general, most of analysis in sections 1 and 2 has been done with October data, while in sections 3,4 and 5 with November data.




    1.3 Wards and mmr correlation



    Notice that scale is logarithmic. Upper thick black line is set at 0.4 wards per minute, which is (I believe) approximately maximum of wards one can set in normal game. ([2 wards at start and another 14 wards till minute 35 - 2 wards not counted because wards in last 5 minutes do not count]/35 minutes). Grey line is set at 0.08 wards per minute, which is 0.4 wards per minute / 5 and shows the value that we would expect a perfect player to have that supports in every fifth match.

    First that is observed is that number of wards rises steeply up to aprox. 1500 mmr. Interestingly, above 1700 number of wards starts to stagnate or even decrease. Also, you can see that very best players have very low wards per game statistics. This is probably connected to the fact that it is common for orange/brown players to support in high tmm matces.

    This graph might be hard to interpret so I suggest to study figure below as well.



    Above one can see heat map of this problem. This map shows where the concentration of points is higher with brighter colors. Also, regions that value of the graph is very low has been cut off, for clarity. Also, notice this time scale is linear (not logarithmic as in the first figure). This figure should show clearly players with low MMR do not ward as much as their high raking counterparts. As one can see, high ranking players are spread over very large area, meaning that portion of them ward a lot, some with medium intensity and some very little, as one would expect with players specializing in different roles. I believe that this figure shows more clearly that high ranking players do ward more, which is not so obvious from first figure. Fact that there is dropoff toward very high and low values of MMR is connected with the fact that there are less players analyzed in these regions. Simply, as there is less players with very high and very low MMR in the sample (see section 1.1 if interested), their concentration is smaller and they are not dense enough to be prominent in this sort of analysis (they do not generate enough "heat"). That is why I recommend that you take only results between cca 1100 and cca 1900 "seriously" (because in that range sample is relatively uniform - see section 1.1).

    I believe one can again see interesting behavior that one can see in the first figure - there is slight dropoff of wards placed by high level players!





    1.4 Duration of the game and mmr correlation


    Games in general last between 30 and 40 minutes across the board. Still, is is noticeable almost linear decreasing relationship between mmr and duration of the game. This seems to stem from the fact that higher ranked players are better in pushing and taking advantages when the opportunities arise. This interesting behavior is further investigated in section 3.


    1.5 Deny count and mmr correlation


    Notice that scale is logarithmic. As we can see, higher ranked players deny more. What I think is really pleasing is that this trend is constant and does not drop towards the end of the high bracket. Also, even if it the difference between 1800 and 1600 brackets is only few creeps, this can really make or break a game and higher ranked players realize importance of every creep. There are almost no high end players that do not deny.




    1.6 APM and mmr correlation


    This is one of the most debated and controversial correlations. As we can see, dependence is almost linear, with high ranked players averaging around 120 apm. There is quite some spread, with number of high ranked player having 90 or 80 apm.



    1.7 GPM, XPM and mmr correlation


    Orange points are GPM, blue are XPM. There is quite steep increase of gpm and xpm as one goes from 1600 bracket to higher end, signaling that to get in highest level of play one has to learn to farm very efficiently. On the other hand, difference between 1100 and 1400 brackets is not so much due to farming capabilities, but to other factors (perhaps positioning in fights, warding?). Also, ratio between these two quantities is not constant, as seen below.



    Ratio of GPM and XPM. As we go higher, people get more and more gold with the same experience gain. I believe that this shows higher capabilities of last-hitting for better players. While better player are not just better in finding farm, they are actually last hitting better and getting more gold with same experience gain!



    1.8 Games played and mmr correlation


    Does playing a lot of games make you a better player? Well, in very general terms, it does. Better player do play more games. Of course, when interpreting this graphs one has to take into account that smurfing and stats resets are destroying this correlation, in particular. Fact that the fit is not falling when getting below aprox. 1300 should be explained by the fact that new players start at 1500 and have to have quite a few games to fall that low.



    1.9 Win ratio and mmr correlation


    Notice that scale is logarithmic. Most of the players have win/loss ration around 1, one more evidence that MMR system is working. This is not true for the high and low end of the spectrum, as one would expect. In the lower bracket, we can see some players that have low ration and still have to fall some more to find their spot. In the uppper bracket we see some player with unusually high win/loss ratio. Large number of those are probably smurfers/stat reseters. This might be area for future research (see section 6).



    1.10 (K+A)/D and mmr correlation


    Notice that scale is logarithmic. (K+A)/D ratio is sharply increasing as we go to higher bracket, signaling more and more teamplay is used to get a kill.



    1.11 Are the calculations consistent despite using sample or Is the sample big enough?


    This is example of the APM/MMR fit (section 1.6), made for 5 different 3000 players sample. Lines are made in red, orange, black, green and yellow. One sample of data is also shown, the one that was used for drawing black curve (that was also drawn last, so it "hiding" most of the other lines). As one can see, there is no significant difference between different samples.



    2. Correlation of time when game is played and mmr

    2.1 Motivation

    What is the distribution of players throughout the day and week? Do better players really play during the night and weekday, while evenings and and weekends are reserved for scrubs? An attempt to answer these questions is given below.

    2.2. Some notes about the method

    9384 matches were selected for analysis, selected using random number generator. Their distribution was uniform. Matches take in in consideration could have between 100190454 - 103240331. Match 100190454 started at 8. September at 15h (USE time), while 103240331 started exactly 6 weeks later, at 20. October at 15h (USE time). This sample was taken because is contains no server downtimes (as far as I know), nor any special holidays that might disturb the data (e.g. Christmas). For every game average mmr of the match has been calculated. Casual mode games were also accounted for, but calculating average mmr of the players (not casual rating) .


    2.3. When is tmm played




    Same image, just interpolated






    Images above shows the distribution of matches, adjust to he local time of the player; assuming that all Europe players are USE+6, all USW are USE-3, and all AU are USE-10. Week starts with monday (1=Monday, 7=Sunday). On the first graph, the legend is a bit confusing -> purple denotes that number of games played is between 0 and 15, dark blue denotes that number of games is between 15 and 30 and so on. As once can see lowest number of games are played between 4 and 10 am and most are played between 19 and 23. Even though it is hard to see, number of matches played on Saturday and Sunday is approximately 20% higher then on Tuesday and Wednesday. This perhaps shows that most of the games are played by people who play on weekday, while weekend warriors do not contribute that much.


    2.4 What is average mmr of these matches; conclusion




    Same image, just interpolated


    Average mmr of the matches played at particular date and tame. There does not seem to be any strong deviations. It seems that there might be slight preference for lower ranked matches in times between 4 and 10 and slight preference for a bit higher matches between 0 and 2. One has to be careful when interpreting data for 4 am -10 am region as that is region with least matches played.



    MMR as a function of time played, averaged over all 7 days. Here it is easier to notice night drop in the quality of games. Also, there is perhaps small rise around midnight and 1 am.




    3. Duration distribution of matches

    3.1. Motivation

    In section 1.4, one can observe that better players finish their matches faster. Why is that? Do they recognize earlier that the game is lost and concede? Or is there less very long games? What is the duration distribution of matches anyway? If matched tend to last on 35 minutes on average are the most of them in that range?

    3.2. Short comment about method and general distribution of matches duration

    Normal 3000 players sample, described in section 1.1 is used (2731 in this particular case). For each player we take note of his 6 last matches - this means any of his matches, being normal matchmaking, casual, public etc... I remove those matched that last less then 15 minutes (mostly 1v1 matches).



    Distribution of matches duration for the whole sample. Each bin is 1 minute. One can notice two peaks - one at 15 minutes when concede is first possible and at 30 minutes when concede rules are relaxed. After that there is steady (perhaps exponential with small constant) fall as matches naturally tend to finish after some time.


    3.2. Decrease of average match time with mmr



    Average game duration per MMR. Black line corresponds to black line from section 1.4. Blue line is best fit to blue points. Blue points have been produced by finding mean (and variance) for all matches played by people who are 1000-1200,1200-1400,1400-1600,1600-1800,1800-.... So, for instance, for first point I took all matched played by people below 1200, fitted them to normal distribution and found mean and variance. Error bars are one standard deviation. One should note that this procedure is inherently flawed, in sense that distribution is not normal (as seen on first figure in this section). Additionally, I think one should not interpret large error bars as signal of uncertainty on position of points, but more as a cosmetic result of a large spread of game durations.

    It is pleasing to note that both of these procedure, that have been produced with different method show that there is slow decrease of game length as one goes to higher mmr.


    3.3. Average duration distribution in different mmr brackets; conclusion



    Game duration, separated in different mmr bins. This is smoothed histogram - meaning that data is first grouped in 1 minute bins (as in first figure of this section) and then smoothed afterwards to produce nice curves. Yellow dip at cca 24 minutes and black bump at cca 46 minutes are probably statistical fluctuations - sample is smallest, of course, for these curves that describe behavior of players at the ends of MMR spectrum.

    One can not notice large or consistent difference between different MMR curves. Perhaps one could say that higher ranked players concede around 10 % more often at 15 then low ranked players. But, black players (1800+) finished their games in first 20 minutes in around 20% of cases, while yellow (1000-1200) finished their games in first 20 minutes in 19% of cases! This effect alone is not enough to account for cca 1.5 minute shorter average duration of matches! I would conclude that effect is cumulative: higher ranking players finish faster because, by small part because of more 15 min concedes, but also because there is less long matches, consequence of better farming, pushing and taking advantages in game.
    Last edited by OsianII; 12-04-2012 at 08:31 AM.
    Big thread of statistical goodies (update - Dec 4) - http://bit.ly/U7Z9Hi

    Alt avatars price change http://bit.ly/V2wXIz ł AltAvatars price change mod http://bit.ly/NwOp7n
    BangNinja mod fix http://bit.ly/U2VgYU ł Breaky & Zyori in love http://bit.ly/PytPAi
    Quickfix for linux and mac 2.6.11 http://bit.ly/Udqwmw ł Highest gpm in first 20 min calculated http://bit.ly/Qf4Q5P

  2. #2
    Offline
    Account Icon
    Join Date
    Dec 2009
    Location
    Zurich, Switzerland
    Posts
    202
    4. Account age and MMR correlation

    4.1. Motivation
    In section 1.8., I concluded that players with more matches, in very general terms, tend to play better. Perhaps it is also important not just how many matches you have played, but when did you start to play? In early days of HoN people would advertise their skills with "I have played x years of DotA and ...". Without reliable performance measurement, especially for more casual players, time spent with game was often one of the main indicators of how good the player was. Is it really so? When did people create their accounts and what is their mmr today?

    4.2. Method and how to deduce when was account created

    Because it is not possible to acquire account age data automatically, one has to work a bit harder to get data. First I get to know account numbers for players from herosofnewerth.com/player_ladder.com. Then, I manually read off cca 150 points from heroesofnewerth.com/player_stats (one can read them, but not scrape them, because it is produced by java script). I have noted account age for accounts numbered 50 000, 100 000, 150 000, etc..., until 7 500 000 which is where we are today. To summarize, in that way I have created 150 points that link account number and when it was created.
    After that I have created interpolation function with those dates which allows me to generate account age if I know account number. I have tested the function with 10 randomly chosen values and biggest deviation was below 2%.



    Growth of account number with time. Black points are manually inserted data. Red line is interpolation function. Notice that "today" is on the left and we are going back in time to the right. First change of slope of the curve (going from right), at aprox. 970 days is when HoN entered open beta (Mar 31, 2010). Second change, at 920 , HoN is released (May 12, 2010). Wikipedia states that 3 million account have been registered by then - we can see that that is true. Next change, at aprox, 470 days is when HoN goes to f2p model (July 29, 2011). I notice no bigger jump at 720, when HoN 2.0 was released (Dec 2, 2010) or 130, (July 19), when all heroes go free.



    4.3. mmr distribution of players with respect to their account age; conclusion




    Account age as a function of MMR. Curve here is actually 1+a*MMR+b*MMR^{2}+c*MMR^{3}+d*MMR^{4}, which I felt did more justice to the data then the quadratic curve, used in other examples in section 1.
    First thing that one notices is huge spread of player. Still, there are some quite cool things to notice. First, notice the line of players at around 500 days - people who created account then HoN went f2p. In general, these account are in lower part of the bracket. Additionally, notice lack of points for high or low MMR for small number of days (cca below 50) - players who just created account did not play a lot of games and are still close to starting 1500. This is similar to behavior noted in section 1.8.
    Even though there is large spread, one can notice how very old accounts in general are not very low and do have tendency to have higher MMR. One should remember that these are only active players (those that have played a game in last 30 days). Statistically, we see that higher ranked players are on older accounts, but as noted, spread is huge.



    Heat map of the problem. This map is generated by calculating the density of points in region - brighter for more players, dimmer for less (a bit larger discussion about heatmap is available in section 1.3). As we can see, there are two distinct regions of new player and old players, with the valley of time when HoN was in the paid model. Older players are, in general, in higher bracket. Again, note that is created from the sample, that is only uniform from about 1100-1900, so lack of "heat" of the edges of the map is result of the used technique, not some interesting property.



    5. Heroes picked and mmr correlation

    5.1. Motivation
    Is is common knowledge that better and/or competitive players tend to pick differently then typical "pub" players. In general, it is thought that new strategies are created on highest level and then slowly trickle down the bracket as time goes by. Is it really so? Are there really some "pubstomp" heroes?

    5.2. Comments about method

    For this problem I use larger sample size, of 15 000 players. I was afraid that I would need larger statistic for this problem, because (assuming that there are around 100 heroes), average hero is picked in 1/10 of the games. In retrospect that was a bad decision, that created many technical problems (shear size of the data), decreased clarity of the images and disrupts uniformity of the sample. On the other hand it maybe improves a bit quality of data. Let's call it an experiment! As one can see on figure below, sample is uniform on much smaller range, aprox 1300-1800.

    You have to remember that this cumulative data - it takes all of the heroes picked, during the entire lifetime of the player. If some hero is popular at the highest level today that does not mean that it was so popular a year ago! Also, we noticed that lower ranked players have been playing game shorter, so they were not able to play heroes when they were weaker or stronger in the past, while better player have been.
    Also, players who had their account for more time, have played more matches could pick less heroes at the start and have played more matches. For instance, "good" player with a lot of matches (e.g. 1000) could have played 100 games with solstice, giving him a picking rate of approx 0.1. On the other hand, "bad" player with less matches (e.g. 100) could have also played 200 matches with solstice, giving him a picking rate of approx. 0.5. They both think that hero is good, but our statistics would miss this! This is obviously compromising results (in a nutshell, different mmr brackets have different starting conditions).
    I have ignored that effect for 2 reasons; firstly, because the effect is not that strong, with an enormous spread (section 4.3); secondly, because full treatment of this problem would require me to take notice of when heroes were released, how long has account been active before and adjust accordingly... well, even though you may find hard to believe, I do have some life to lead...


    mmr spread of data for this particular problem.



    5.3. Distribution of picking rate of agility heroes




    Distribution of picking rate of agility heroes with mmr. One can see that lower bracket players love their agility heroes. Again, we can observe that popularity of agility heroes than once again grows toward the higher portion of the ladder -> higher rated players in general playing carries in tmm.


    5.4. Distribution of picking rate of intelligence heroes




    Distribution of picking rate of intelligence heroes. As we go higher, player realize more and more importance of intelligence heroes in winning the games. This rise is also probably connected with rise of picking rate of ranged heroes (see section 5.6).


    5.5. Distribution of picking rate of strength heroes




    Distribution of picking rate of strength heroes. Strength heroes do not get much love toward the higher bracket!


    5.6. Distribution of picking rate of ranged/melee heroes





    Distribution of picking rate of ranged heroes. As we can see, better players like ranged heroes more and more.



    Distribution of picking rate of melee heroes. This black curve fit is basically (1-ranged curve) fit.


    5.7. Distribution of picking rate for different heroes (3 examples)


    It would be silly of me to post here all 113 different figures. Below find 3 nice examples. Figures for all heroes are available as download in section 5.8.



    Aluna is nice example of heroes that is appreciated in the higher bracket.




    Sand wraith is taken as example of hero that does not fluctuate much over whole range of MMR values. Notice from the y-axis values, that is less popular then aluna overall (even though it is longer in the game).



    Surprise, surprise, night hound is not that popular with better players, while he is quite popular in lower brackets.


    5.8. Distribution of picking rate for different heroes (all heroes - download)


    Find all graphs on link below (10.7 mb). Open it as a normal *.rar or *.zip (tar.gz is like zip for linux).
    https://dl.dropbox.com/u/56936034/HeroesGraphs.tar.gz

    Below find "preview" (screenshot of files in folder on my computer).



    Please note that these graphs do not show if point is at zero (player has not played the hero). Best examples are new heroes like grinex and pearl; these points are still taken into account when drawing fitting curve (although you can not see them), leading to such weird results. Interpret results for new heroes with care.
    Personally I find jereziah graph most interesting!


    6. Future directions - some additional questions that might be interesting to analyze?


    1. In the first iteration of the this thread I have written "As noticed in various graphs, there are number of players that we would probably call smurfs. Perhaps it would be interesting to see how many there are and what would be average mmr of all players in the game with and without them. One could test and declare smurfs player that have high win ration and/or high KD ration and/or unusually high apm/gpm for the bracket and similar".

    Since then I have kinda cooled off towards that idea. This would involve a lot of work, with a lot of guessing and subjective choices to be made. Still it might be interesting to see which bracket has most smurfs, or in which bracket you are most likely to meet smurfs. With the revival of the smurf topic in the competitive section (topic in which people have written smurf accounts of pro players), one now has solid base of smurf account on which one could test algorithm... Still, I would put this problem on low priority list.

    2. In the first iteration of this thread I have written "How come that game last shorter for better players. It is because there are more 15 min cc? Or is there less games that go on for very long time?"

    This question has been semi-answered in section 3. of this thread version. I am not that satisfied. Obviously differences are quite small so one has to be very, very careful when analyzing data. Perhaps larger sample size, with only matchmaking games would help? This, again, requires some amount of work.

    3. In the first iteration of this thread I have written "What heros are most used by best players (for instance >1900). What are win ratios for heros in the highest brackets? How much are those percentages different then from general population?".

    This has again been only semi-answered in section 5. I used stats that cover whole life-time of account. It would be better if we could have picked matches from last month or so. Again, this will require some additional work.


    7. Other threads that might interest you


    http://forums.heroesofnewerth.com/sh...18-with-graphs
    -Naib calculates normal distribution parameters of the player base. Highly recommended.
    http://forums.heroesofnewerth.com/sh...6#post15364806
    -Which players are the highest level in HoN. Has some errors, but I believe there is still some merit to it.
    http://forums.heroesofnewerth.com/sh...-Player-Ladder
    -Top 10 GPM, XMP, APM etc., from player ladder. Author has made script that enables him to sort players according to stats. Unfortunately, sank quite quickly in General Discussion.



    8. Contact


    Forum pm, ingame, IRC-quakenet #honlabs, #honlinux, #honscience


    Feel free to comment and ask questions. Suggestions in general, as well as possible ideas for future project are also warmly welcomed.
    Last edited by OsianII; 01-24-2013 at 01:16 PM.
    Big thread of statistical goodies (update - Dec 4) - http://bit.ly/U7Z9Hi

    Alt avatars price change http://bit.ly/V2wXIz ł AltAvatars price change mod http://bit.ly/NwOp7n
    BangNinja mod fix http://bit.ly/U2VgYU ł Breaky & Zyori in love http://bit.ly/PytPAi
    Quickfix for linux and mac 2.6.11 http://bit.ly/Udqwmw ł Highest gpm in first 20 min calculated http://bit.ly/Qf4Q5P

  3. #3
    Great job !

  4. #4
    Where were you when I was in probability in college? :O

  5. #5
    Nicely laid out and informative. Thank you for this; shame it does go out of date in relative speed to other mechanics posts.
    #reinstateapostate

  6. #6
    Offline
    S2 Staff Member S2 Games Staff
    Join Date
    Mar 2010
    Location
    Netherlands
    Posts
    4,051
    Very interesting actually.

    Though your first graph doesn't appear to be very accurate.
    It makes it look like the number of 1800 mmr players are about the same as the number of 1500 mmr players, which is definitely not the case.


    Good job overall mate.

    S2 Games: Dedicated employees serving dedicated gamers. Continuous development. Never-ending improvement.
    -----------------------------


    I'M CURRENTLY ON VACATION - I'LL BE BACK ON THE 9TH OF AUGUST 2014
    HoNored
    | Super Beta Tester | Mechanics Moderator | Pre-purchased HoN | Skype: Necrothica | DeviantArt: Necrothic


  7. #7
    Some really nice data you got there. How did you acquire it? Is it available somewere? Are you publishing the results? And if yes which conf/journal?

    I lol Hrvat Nisam ni skontao odmah!
    Last edited by psvrisak; 10-29-2012 at 05:25 AM.

  8. #8
    Offline
    Account Icon
    Join Date
    Dec 2009
    Location
    Zurich, Switzerland
    Posts
    202
    Quote Originally Posted by Necroth View Post
    Though your first graph doesn't appear to be very accurate.
    It makes it look like the number of 1800 mmr players are about the same as the number of 1500 mmr players, which is definitely not the case.
    Ok, so the first graph shows what was the sample for the analysis. It is has been specially made so that it is approximately uniform across all mmr and as such it does not shows what is the distribution of player base according to their mmr. For instance, image below what would be the sample, if I uniformly picked up 3000 players, absolutely randomly, not caring about their mmr.




    That is not what I want. If I did that, I would have mass of data points in the 1500 range, and no data points on the edges of mmr range. For instance, in the above graph, one can see random number generator did not select a single player above 1900 range. What I want is to have sample that is uniform in the mmr, so that each mmr range gives approximately same (large) number of points.

    One possible solution would be to test much larger number of points; well pretty much all players, or a large subsection of the player base (my first estimate that one would have to test at least 1/3 of player base to get satisfactory results, which is around 100 000 players). That would be computationally hard and again would miss large number of players with high mmr and low mmr (the ones in which I am interested) and give me bunch of points around 1500 range.

    That is why the random seed is modified, that it picks players in a way that it has larger probability to pick players with high or low mmr. I have used one set of paramters of box distribution, which is seen below.





    In this way, sample is pretty uniform across mmr range which allows us to test the correlation relations across the whole mmr range and that is what is shown in the first figure of the first post. I hope that this clears this misconfusion? Fire away if something is left unclear!

    Quote Originally Posted by psvrisaak
    How did you acquire it? Is it available somewere? Are you publishing the results? And if yes which conf/journal?
    I am sending you pm about data. I do not think that scientific journals are quite interested in a analysis like this To be honest, I did this because I have some free time now (unemployed), but I guess this could make interesting talk on some gamers conference or something like that but I am not sure that it has some value in the "real" world.

    Quote Originally Posted by SmurfinBird
    shame it does go out of date in relative speed to other mechanics posts.
    At least for the moment I hope to update this thread in future. For instance, it would be interesting to see if there is change in game duration after the introduction of the new voting system and will the shape of the game duration curve change.
    Last edited by OsianII; 10-29-2012 at 06:34 AM.
    Big thread of statistical goodies (update - Dec 4) - http://bit.ly/U7Z9Hi

    Alt avatars price change http://bit.ly/V2wXIz ł AltAvatars price change mod http://bit.ly/NwOp7n
    BangNinja mod fix http://bit.ly/U2VgYU ł Breaky & Zyori in love http://bit.ly/PytPAi
    Quickfix for linux and mac 2.6.11 http://bit.ly/Udqwmw ł Highest gpm in first 20 min calculated http://bit.ly/Qf4Q5P

  9. #9
    Offline
    S2 Staff Member S2 Games Staff
    Join Date
    Mar 2010
    Location
    Netherlands
    Posts
    4,051
    Aaah, that makes sense.

    Job well done, good sir.

    S2 Games: Dedicated employees serving dedicated gamers. Continuous development. Never-ending improvement.
    -----------------------------


    I'M CURRENTLY ON VACATION - I'LL BE BACK ON THE 9TH OF AUGUST 2014
    HoNored
    | Super Beta Tester | Mechanics Moderator | Pre-purchased HoN | Skype: Necrothica | DeviantArt: Necrothic


  10. #10
    good job on this, very interesting


  11. #11
    The most interesting graph (for me at least) is the one showing wards related to mmr, it seems that there's no significant change over 1500, which is not my experience at all. Maybe it's the quality of wards put (positioning, awareness of anti-wards and knowing whether to defend or attack, etc.), rather than number that distinguishes 1500's and 1800's for instance.

    Another interesting graph is the duration of the game. The change seems cosmetic, but your analysis seems to be that better players execute strategies better, thus, ending games faster. From what I've experienced higher mmr players opt rather to be on the safe side and secure an in-game advantage by farming and giving the killing blow when they know it will be 100% succesfull. The shorter duration, in my opinion, comes from experience and knowing that some games are unsalvagebale and the concede vote gets thru much more often.
    Just my thoughts. Really great work. Things that you feel are right, now suddenly have some statistical evidence.

  12. #12
    Ward number does not really change much at all. It's the quality that counts. You normally don't have more than 1-2 active wards up until 1650 at least from my experience, and for other higher tier games I've watched I never see the map full of wards either. The main difference is that people know they need wards up on pull camps or rune spots early on after a certain rank unless they want to lose the game. Those crucial early-game wards don't affect the total outcome that much and are the main difference from the 1300-1400 hell I've been to months ago.

    If you notice, there's a significant dispersion of wards per game after 1600, which is hard to take counclusions. I'm guessing it's probably due to game lenght variation cause games tend to be either one sided stomps or better executed strategies/pushes. If you're getting stomped supports don't really have much gold or opportunity to roam to place wards. On another hand, if you're winning they will most likely secure your farming spots as well as set aggressive/kong wards as the games goes on and that is maybe why the ward number does not increase as much.

    The APM correlation I found quite funny tbh. However, correlation =/= casuation. Game lenght is decreased as mmr goes up because there are probably less trolls and griefers thumbing down concede votes as you progress. The CC15 zone is very populated around 1400-1600, though. Lol

    Anyway, great work here. How did you record these results anyway? And would it be possible to compare results between regions (say us vs eu?).
    Last edited by zstarkey42; 10-29-2012 at 03:04 PM.
    Always remember you're unique, just like everyone else.

    Check my guide

  13. #13
    Online
    Account Icon
    Chat Symbol
    Join Date
    Jul 2011
    Location
    Whirlbubble
    Posts
    11,529
    wait,where did you got all this data?how do you know it's true?looking forward for those 3 "future directions".Seem pretty interesting topics.
    Last edited by Clytemnestra; 10-31-2012 at 07:33 AM.

  14. #14
    Offline
    Account Icon
    Join Date
    Dec 2009
    Location
    Zurich, Switzerland
    Posts
    202
    Of course, as it happens, I have noticed some errors in my method for the 2nd exercise, so you should disregard it. First exercise, which provoked much more interest, should be correct. I will change wrong results it with new, hopefully correct, results "soon". To be honest, I should have put tag [WIP] when first making a thread.

    @zstarkey42 I have sent you a pm, hopefully you have seen in it.
    Big thread of statistical goodies (update - Dec 4) - http://bit.ly/U7Z9Hi

    Alt avatars price change http://bit.ly/V2wXIz ł AltAvatars price change mod http://bit.ly/NwOp7n
    BangNinja mod fix http://bit.ly/U2VgYU ł Breaky & Zyori in love http://bit.ly/PytPAi
    Quickfix for linux and mac 2.6.11 http://bit.ly/Udqwmw ł Highest gpm in first 20 min calculated http://bit.ly/Qf4Q5P

  15. #15
    Perfect and brilliant

  16. #16
    Any chance for you to add confidence intervals to your graphs in the future?
    Last edited by Therkel; 11-08-2012 at 04:01 AM. Reason: I misspelled confidence oO

  17. #17
    Offline
    Account Icon
    Join Date
    Dec 2009
    Location
    Zurich, Switzerland
    Posts
    202
    Quote Originally Posted by Therkel View Post
    Any chance for you to add confidense intervals to your graphs in the future?
    Ok, I will have a proper think about it. I just have to find a way to plot it without too much clutter happening.
    Big thread of statistical goodies (update - Dec 4) - http://bit.ly/U7Z9Hi

    Alt avatars price change http://bit.ly/V2wXIz ł AltAvatars price change mod http://bit.ly/NwOp7n
    BangNinja mod fix http://bit.ly/U2VgYU ł Breaky & Zyori in love http://bit.ly/PytPAi
    Quickfix for linux and mac 2.6.11 http://bit.ly/Udqwmw ł Highest gpm in first 20 min calculated http://bit.ly/Qf4Q5P

  18. #18
    Offline
    Account Icon
    Join Date
    Dec 2009
    Location
    Zurich, Switzerland
    Posts
    202
    Hy, guys I did one pretty massive update - eliminated mistakes, added sections and clarifications to the problems that were discussed in first version, added some new things, added more clarifications about data and method itd... I have thought about putting some tag to new stuff, like [new], but I concluded that it would be too distracting, especially as only few part in first section have been left totally unchanged; I would suggest treating the whole thing as absolutely new.
    Again, I would welcome comments, suggestions etc.. Hopefully there will be some more healthy discussion like the one that followed first version!

    First post link
    http://forums.heroesofnewerth.com/sh...stical-goodies
    Big thread of statistical goodies (update - Dec 4) - http://bit.ly/U7Z9Hi

    Alt avatars price change http://bit.ly/V2wXIz ł AltAvatars price change mod http://bit.ly/NwOp7n
    BangNinja mod fix http://bit.ly/U2VgYU ł Breaky & Zyori in love http://bit.ly/PytPAi
    Quickfix for linux and mac 2.6.11 http://bit.ly/Udqwmw ł Highest gpm in first 20 min calculated http://bit.ly/Qf4Q5P

  19. #19
    holy christ this thread is amazing

  20. #20
    Posting in a fabulous thread. Silhouette graph does not surprise me and neither does the bubbles graph.
    Ophelia graph made me smile a little.

    Thank you so much OsianII for going through all the trouble and sharing this with us. This is amazing work.
    Last edited by Rayniac; 12-04-2012 at 03:46 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •