Wednesday, December 1, 2010

Creative and Incidental ... COMP?! ... at the NOVA Open

Ok, so I've hit on this before, but input is welcome and I think it's something of a fascinating after-effect.

Our planned total # of attendees to next year's NOVA Open 40k GT is ... 256.

The GT will encompass, therefore, 8 total rounds.  Unlike our space restricted "elimination" of last year, everyone will be participating in all 8 rounds.  Here's how it will work:

Day 1 will yield a breakdown of these records:

16 x 4-0
64 x 3-1
96 x 2-2
64 x 1-3
16 x 0-4


Here's the catch and comparison of what normally would happen on the next day for this set of players.

With "standard" seeding continuing all rounds, it becomes suffocatingly difficult for the bottom bracket players to creep back up the standings; similarly, it becomes easier for single lossers to stay high in standings, or wind up "lucky" in a bracket.  Here's what I'm getting at.

Player #240 plays against player #256 in Round 5, the first of Day 2.  The bottom seed manages to pull off the win, improving to 1-4.  The top player in the 1-3 set, player #176, loses his game ... dropping also to 1-4.  In theory, the #256 who after a rough day finally got a win, now has to play the #176.

Right back into the rough.  But what about a different approach ....

What if the 0-4's only played within their own bracket on Day 2?  What if there was a guarantee, therefore, that one of the 0-4 players was going to finish the 8th round 4-4, recovered from his crushing first day playing exclusively among players who performed very similarly (and therefore most likely had fairly comparable lists and skill levels)?

Our totals from above break down rather nicely into 16 same record brackets of 16.


Bracket 1: 16 x 4-0
Brackets 2-5: 16 x 3-1
Brackets 6-11: 16 x 2-2
Brackets 12-15: 16 x 1-3
Bracket 16: 16 x 0-4



So, here's where what I'll generally refer to as Incidental Composition comes into play.  It's biasless, and ... well, accurate. 

We'll be treating the 2nd day as 16 unique 4-round tournaments, where the 4-0 finisher in EACH bracket earns a generalship award, and where obviously the single 8-0 wins Tournament Champion, to go alongside the Renaissance Man, or our BEST OVERALL.  Herein lies the catch :)

As those who've followed the format know, the Renaissance Man is our TOP prize, and is comprised of equal parts sportsmanship, appearance and competitiveness.  While this may change to 40% competition/40% apperance/20% sports for 2011, the fact remains that people with gorgeous, well-prepared armies and great attitudes compete for Best Overall, Renaissance Man, even with one-win/two-win/three-win/four-win equivalent records.  They don't need to ace all their games to have a shot at it, at all.

So suppose you're a casual player with GORGEOUS painting skills and an awesome, fluffy army ... you're a strong tactician, as strong as the next guy, but you're unwilling to run super powerful min-maxed armies of doom ... even if you get slaughtered on Day 1, you could recover against like-minded and like-listed people through ALL of Day 2.  Even if you don't, even if you're the one unlucky fellow who goes 0-8, you'll be playing on Day 2 exclusively against the closest we can get (without subjective input) to like-minded and like-skilled opponents, all of whom had as rough a time on Day 1 as you did.

Plus, even if you DO go 0-4 on Day 2, it's GUARANTEED that one of those 4-0's, and 4 of those 3-1's ... are ALSO going to go 0-4 on Day 2.  In fact, 16 different people will, only one of whom went 0-4 on Day 1.  We'll definitely be giving that solo 0-8 a prize of some sort, probably an epic one.


More importantly, this enables people to much more reliably and competitively climb back into the Renaissance Man competition by the end of the Second Day of the GT, because you'll be doing it against a uniform and even playing field.  Instead of the event encouraging players to either bring the most powerful list possible or be out of it by the end, it encourages you to do the best you can on Day 1, and then be the most skilled, best looking, and best BEHAVED within a properly bracketed set on the second day ... instead of laboring out of a set only to be smashed back down every time you get up into a bracket you simply can't compete with.



I hope this makes some sense ... it's a further attempt at innovation on our part, at trying to work things in a way that appeals to and rewards all player types.  It also is an attempt at coming up with a "comp" (not "true" comp, but play-field-leveling if you will) that's not unfair to the best competitors, nor broken in terms of the softer ones (the guys who walk into a "Comp" tournament expecting equivalent lists everywhere, only to get slaughtered by people who ignore the restrictions, or who break them by studying them extensively).  Players are all guaranteed 4 rounds against the 15 other players who are CLOSEST to them in both record and rating (battle points ish) on Day 2.

Food for thought and, as always, input :)

36 comments:

  1. One of the things that I liked about the NOVA Open last year was the amount of time and consideration you guys put into it. It really paid off and you all ran a tight, fair and fun tournament. Glad to see this is continuing for next year.

    ReplyDelete
  2. Hear hear!

    The event is months away and it is super re-assuring to see that it's being agonised over already.

    Personally, I'm happy with the system described above. Fighting people who have done about as well as you is the best type of fair out there.

    No one likes being eaten alive, or eating someone else's army alive. The best want to beat the best. The rest of us want to have a chance.

    ReplyDelete
  3. I like it a lot. It's well thought out and would benefit a lot of people. It also allows people to compete all tournament which is a huge bonus. Well thought out and again good job at innovating to create a truly excellent tournament.

    Oh and Adepticon sold out 256 tickets 4 months early. I fully expect you to do the same. And this will only enhance it.

    ReplyDelete
  4. I like the idea, especially a prize for the 0-8 player. :D

    ReplyDelete
  5. "...instead of laboring out of a set only to be smashed back down every time you get up into a bracket you simply can't compete with."

    I think I love the NOVA, just for that. That's my sour experience at the last tournament I attended described right there; powered out of the bottom bracket (2-0 after two rounds) and spent the rest of the day trapped in the middle being trampled by former national champions on their way to the top. Eight rounds is also a nice touch; gives you room for the effects of aberrant games against less experienced/skilled players to be absorbed by the number of games, rather than putting someone in a bracket they're not really equipped for (they just got 'lucky' with the draw).

    ReplyDelete
  6. I just wrote a several page analysis of why the proposed changes aren't going to have an appreciable positive effect full of arguments and a bit of probability. I even managed to work in a dig at that guy you curb-stomped the fourth game at BFS. Luckily for everyone's eyes, signing in went poorly and I lost it. So frustrating.

    Anyhow, short version:

    The location of a player in a bracket has quite a lot to do with luck, both in dice and match-ups. So playing someone who went 1-4 isn't very likely to be an easier game than playing someone who went 0-5. Over the course of the second four games, you will probably have appreciably harder/easier games than the ones you would have had if you had been in a different bracket. But the arguments in support of the above position also support the position that you're relatively unlikely to be in the 'correct' bracket. One lucky game and you're getting crushed all day on day two. One unlucky game, and you're seal clubbing.

    You also aren't really giving people a change at getting ren man on day two that didn't already have a shot at it. You need to go 7-1 to win ren man, as there will be enough 7-1 people that it is very likely that one of them will have a pretty army and be a decent guy. So all you're really doing is making it a bit easier for 3-1 people to come back while making it a bit harder for 4-0 people to hold on.

    No major problems with 'best in bracket'.

    If you really want to achieve the goals you've set out for yourself, make Ren Man based entirely on performance the second day. That has problems, so I'm far from sure that it is a good idea, but if you actually want to do what you're trying to do, you'll need to something similarly radical.

    ReplyDelete
  7. @ Anon above - I initially had similar thoughts as yourself. I liked the idea, but thought that out of 256 people someone may be put in the wrong bracket and then go on to club baby seals or get curbed smashed.

    But the only two brackets I can imagine something like that happening is in brackets 1 and 16. And it is some insane luck or unluck for that to happen.

    @ Mike - I like your idea and hope that fester gives it some thoughts for the 2012 Centurion.

    ReplyDelete
  8. @ Anon - No one is ever going to devise a system that is proof against all quirks of fate. All you can do is try to minimize the odds and impact of strange things happening.

    And I don't know that one lucky game is enough to get you mauled all day on day 2. After all, moving up from the bottom by a game means you are now playing people who only won ONE GAME all day the day before. Maybe they are better than you, but probably you are not running into former national champs here. You might lose all of your games again, but at least you should have a shot.

    ReplyDelete
  9. Anon - I don't think anyone would advocate a system where the 0-4 person had just as good a shot as the 4-0 person at the competitive % of the Renaissance Man score.

    The fact is, the Ren Man itself is built to enable even someone without a win to statistically be in the running, but the more wins the better - just as the higher sports the better, and the higher appearance score the better.

    That said, someone who goes 2-2 on the first day can finish as high as 6-2 with a much better "shot" at doing it fairly, than the issue otherwise addressed.

    Similarly, someone who goes 0-4 is more likely to go 4-0. This enhances their shots, and keeps them much more in the running for Best Overall, than if it were simply not done, while avoiding going "too far" (weighting only Day 2 records for Ren Man) and therefore encouraging collusion and gaming of the system (i.e. intentionally losing games to knock yourself down into an easier bracket to take Ren Man from).

    If someone goes 4-0 after going 3-1, against a bunch of 3-1's (instead of weaker opposition), and has awesome appearance scores and awesome sports scores, they'll win Best Overall ... but that's deserved. The point is that we've built and balanced it so that you can win it even at 4-4, as opposed to simply being out for the next 6 rounds once you've lost a couple games.

    As for my opponent in the fourth round at BFS, while I did kick his teeth in, a lot of that was the result of rolling a "6" followed by a "6" and a failed cover save on his part every time a Multilaser hit an Immolator. Additionally, that was in the 4th round, not the 5th. In the 5th round, as in the 5th round at the Open, the remaining undefeateds were all largely equivalent, and well-balanced among each other. No easy games there, really.

    Good responses from MOD/Ben as well.

    That said - much appreciated commentary; without challenge and critique, ideas aren't properly analyzed, defended, thought out and brought out.

    ReplyDelete
  10. Hey,

    So I took the opportunity to run a few simulations on the proposed system just to see what would happen. The results are pretty good (if you buy some of the assumptions).

    As is statistically expected given the assumptions in place, the "odds" of someone ending up in the "correct" bracket are pretty bad.

    Pushing that aside, the odds of someone ending up in "pretty damn close' to the correct bracket are very good.

    I will need to rerun the simulations once I come up with a more accurate way of detailing how NOVA calculates battle points based on seeding. What you can take away from those simulations as is:

    1) On average, in all three simulations, a player will end up in the correct bracket by day 2. That's on average which doesn't equate to statistically likely.... just more likely than any other scenario.

    2) The median was also spot on for a player ending up in the correct bracket.

    3) The results also did show that some players, but very few, will end up in some kind of crazy bracket where they don't belong BOTH way to HIGH and way to LOW.

    4) The standard deviation on the results for all 3 simulations was approximately 2 brackets difference between where a person ends up and where he/she (really should be he/kelly since Kelly was the only woman playing last year :P) should actually be. This is really good considering the bracket sizes are 16 players out of 256. That means that 66% of all players end up +/- 2 brackets from where they should be. 90% of players end up +/- 4 brackets.

    That might seem like a lot, but 2 brackets = only 32 places. 4 = 64 players. Out of 256 places, that's really good for just a random seeding.

    It's clearly not perfect. It also is also statistically guaranteed to end up with some players being horribly placed on day 2.

    That being said, it's also pretty good at putting MORE players where they should be than where they shouldn't be. Indeed, according to simulation, the MAJORITY of players should be decently pleased with where they end up on Day 2.

    The bottom line is, until I can run a more proper simulation, the results are that it's not perfect but it's LEAPS AND BOUNDS better than doing nothing at all. The simulation supports that the majority of people will be happy.

    Here is the link to the simulation results as well as explanation of how they got there.

    http://the11thcompany.freeforums.org/post11348.html#p11348

    ReplyDelete
  11. MVB, I freaking love this idea.

    Possible issues:
    1. extra confusion: it's hard enough to keep one tournament roster sorted out and working, having 16 will make that even more complicated
    2. extra time: unless you throw more man power at running each day two group, you've got a good chance of something weird going on in at least one of your sixteen brackets every round. So, do you hold 240 players back because two guys in one bracket have some issue?

    Both of these are both fixable and avoidable. Hammer on the tournament roster program during the upcoming months and that should sort out both potential issues.

    Suggestion/possible issue: Round times...could get complicated with 16 different brackets, but practice beforehand could sort that out. I suggest putting a Round Clock on a giant screen during the matches that everybody can see.

    ReplyDelete
  12. @MoD and Ben (and, by extension, Mike):

    I agree that that the likelyhood of people ending up in the 'wrong' bracket isn't high. Neil's analysis sort of shows that (more on that later). The point is that whether the new system is effective at getting people in the correct brackets is parasitic on whether the old system was effective at getting people correct match-ups.

    I've been thinking of a way to convey the point clearly with a minimum of words, and this is what I've come up with. The new system assumes that the first four games are effective at putting people in the correct bracket via the nomral w/l system. But then, for some reason, the next four games aren't good at keeping people in the same bracket via the w/l system, so a new control needs to be added (no moving betwen brackets). I can't see why we'd think that the w/l system is good at giving good match up for four games, but not after four games.

    ReplyDelete
  13. @Mike:

    I'm getting the impression that the goal isn't to really make sure that people are more likely to have evenly matched games, even if that was the stated goal. The real goal is to give people who didn't do so well on day one a shot at catching up on day two. Effectively, you're making sure that a few people (whoever is 'best' in a given bracket) have artificially easy days on day two, thereby helping them catch up.

    I do have some reservations about giving people more or less chosen at random (what matters isn't how good they are, but how close to the top of a bracket they are) an artifial handicap, but whatever, Ren Man isn't a real award anyway:)

    What does bother me a bit is that you seem to think that someone who went 0-4 has a real chance of winning Ren Man on day two. Let us assume that whoever went 0-4 and then 4-0 has perfect paint/sports scores. If you go with the 40/40/20 ratio of game/paint/sports, and we assume that there are a total of 100 points, he ends day two with 80 points.

    Someone who went 7-1 has a base of 35 points from game score. That means he needs 45 points from paint/comp to beat out the 4-4 guy. There are going to be 8 people who are 7-1 after day two. If each of those has a random paint/sports score between 0-60, that gives the 4-4 guy about a 10% chance to win Ren Man. But it is totally unrealistic to think that the pain/comp scores will be distributed randomly. No one that goes 7-1 at these things shows up with a poorly painted army.

    But what about the guy who goes 2-2 and then 4-0? He has ~60% chance to take Ren Man assuming a random distribution of paint/sports scores amongst the 7-1 crowd. Assume that none of the 7-1 people are getting less than a 50% on paint, and the likelyhood drops to about 10%. (Keep in mind that this is on the assumption that the 6-2 guy has perfect paint/comp score).

    What have I shown? Your new system will make it more likely that people who do poorly on day one will win Ren Man. But not by much. Even the people who go 6-2 are very very unlikely to win.

    Is this a problem? Well, as you point out, we wouldn't want to make it too easy to come back from a 0-4 first day. I'm just worried that you're going to make people think that they can win Ren Man on day two when can't. It would be like telling the guy who's gone 0-7 that he can win Best General if he wins his next came. It could happen if everyone else in the place has a coronary and dies. But it isn't actually going to happen.

    Something that I made clear in my original really long post and didn't in my shorter post above is that none of these are reasons not to adopt your new plan. It isn't any worse than the old plan (except in that it takes what was a competative event and makes it less so. You're handicapping people, and that is contrary to the spirit of competativeness. But, like I said, Best General is the real award anyhow). So, by all means, go with it. Just be aware that what you're doing is making it look like things are changed a fair bit (Everyone can win after day 1! You'll have evenly matched games day 2!), when things have only changed a little bit (No they can't. And everyone was quite likely to have a fair game day two anyhow).

    ReplyDelete
  14. Anon - wish you'd register, so I can know who you are :)

    That said, this is largely a matter of perspective.

    To maintain the competitively EVALUATIVE integrity of the event, the Tournament Champion award must be inviolate, aka Best General. This is ensured by the way the w/l system works with or without brackets.

    Subsequently, *competitiveness* is important ... and while that can be accomplished by keeping it "pure" w/l, it can (IMO) better be accomplished (while better spreading the "wealth") by breaking down the brackets after Day 1.

    There's actually no materially attainable # of games that will give you a "guaranteed" bracketing, but you'll find - and Neil's data supports this, for whatever that's worth - that the vast majority of players within each bracket, if not ALL players within each bracket, will largely be competitively balanced against one another. Therein lies the rub. While the chance exists for outliers to "crush" the bracket they wind up in, this is minimal ... it's also "alright" if they do, b/c they earn recognition and prize support in recovery from "whacky" first day results or unlucky pair-ups that unnaturally crushed them down. More importantly, only 4 of the 15 people in their bracket have to play that random outlier, while the other 11 do not, and gain generally close/tight/competitive games - something LESS assured with a "wide open" simple w/l bracketing approach.

    While I recognize that you're not suggesting these are "bad" things or perhaps aren't reasons not to go with this approach, I'm not sure I agree at all at this point on the conclusion or analysis of the data.


    In re: the Ren Man component, that bears further scrutiny on a mathematical approach. Remember, at present it's still 33.33/33.33/33.33% .... which further enables lower competitive-scoring players to crack the top.

    As far as the "meaning" of the award, I would not necessarily agree with you - the NOVA was last year, is and will remain basically two tournaments in one. One is a much more hobby-"appreciating" event that equally weights the three tournament components of 40k - art, game, social. The other is a hard-nosed straight gaming competition. The "reality" or comparable value of each - Ren Man and TC - is for all intents and purposes in the eye of the beholder, and that's by design. If you see TC as "the" prize, word. If you see RM as "the" prize, also word, but RM will remain a slightly more weighted prize, and slightly more the "Best Overall" of the event.

    Good convo here, btw.

    ReplyDelete
  15. "No one that goes 7-1 at these things shows up with a poorly painted army."

    I can point to an army that I believe finished 6-1 at BFS, yet wasn't painted or built with any creativity whatsoever. I still wonder how it qualified under the basic "3 color standard" scheme. :| So, considering my artistic opinion, it can happen.

    ReplyDelete
  16. This would be Simon. Which is why it was ok for me say you curb-stomped me. So you kicked my teeth in, eh? Such a jerk.

    First, some relatively unimportant points. I happen to be focused on the Best General prize. Which is why I tell everyone who asks that you won BFS. But that's just a preference of mine, and I'm totally cool with tournaments having other foci. What you do with Ren Man doesn't have a material impact on me, as the award itself isn't very meaningful and the prize associated with it is sufficiently small compared to the cost of attending as to be negligable (not a knock on the prize support, just a comment on the cost of travel). So I don't really care what you do with the Ren Man system, and I'm totally happy with you having whatever goals you want when you're designing it. I was only commenting in an instrumental way (if you have goals x,y,z, methods a,b,c aren't good at achieving those goals). The above was only alluded to by my use of a smiley at some point, so my appologies for not making it clearer that I don't have any problem with the goals you've expoused for the new system, just some worries about the implementation of the system designed to achieve those goals.

    As for the other stuff, I think the only thing you said above that I disagree with is that you disagree with me. Well, I don't disagree with that; I'll take your word for it. I just think you're wrong is all. Unless I missed something, you didn't indicate why you think my analysis is wrong, so I don't have much to rebut.

    But let me say the following. Neil, if you're around, do you mind running the numbers again, but this time, run them for all 8 games with no brakets? That should tell us how likely someone is to be playing the 'appropriate' opponent in game 8 with the no bracket system. Then run the numbers for all 8 games with the bracket system. That should tell us how likely it is for someone to be playing the 'appropriate' opponent with the bracket system. I'll bet that you're more likely to get 'fair' pairings in game 8 with the old system. They should be the same liklihood in game 5, and the probability should (slowly) decrease from there on out.

    Granted, that just addresses the first point (that the new system is going to somehow make it more likely that you won't get paired up against someone outside your list/skill level). I do think your second goal (giving people something to hope for day 2) is somewhat achieved by the new system. Just not as much as you seem to think (at least not as much as you claim).

    ReplyDelete
  17. @Ben: Nova (and BFS) use a checklist sytem for judging painting. And it is really pretty difficult to score very low on those things. The army you're referring to scored only a little bit lower on painting than mine did. The way these painting scores tend to work is 'if you meet the very basic requirments, you get about a 50%. The other 50% is based on the quality'. In my (admittedly limited) experience, you have to show up with three dots of paint on each figure and one on the base to get really hurt on paint scores.

    ReplyDelete
  18. FWIW, at the NOVA if you're crappily painted to the most minimal standards, you score .. well, crappy. If you screwed the pooch entirely and missed finishing painting, you basically get a 0. If you're painted average among the field, you get around a 50%, and if you're painted awesome, 75-100% (though I think no one got quite a 100, just as no one got quite a 100 in competitive score).

    It's important that paint scoring, the same as sports scoring, present an average score that is actually closer to 45-55% ... as opposed to one that is closer to 75-80% (which is what happens in all-or-most-or-nothing sports scoring, and traditional appearance scoring). I think we have a ways to go to improve these things as well, but that's my 402 at any rate.


    Hi Simon :)

    Yeah, I kicked your teeth in, but as stated it would have been much closer if I hadn't rolled so many YOU EXPLODE, BISH, and if you hadn't rolled so many MY SISTERS OF BATTLE ARE PINNED OR BROKEN, DOUBLE YOU TEE EFF?!

    <3 ... and thanks for the commentary; I wasn't so much as rebutting your analysis, as disagreeing on its principle conclusion. I see it as successful if 15 out of 16 brackets have appropriate constituents. I got the feel you were suggesting "failure of mission" if one bracket had that crazy outlier of "didn't belong there" and he promptly smashed or got smashed ... I see that as actually MORE mitigated in the bracketing system, while adding benefits.

    ReplyDelete
  19. I have to admit I don't actually know exactly how Nova does paint scoring, so I'm quite willing to trust your claim there. I agree that what you want to be aiming for is an average of about 50%. Well, really what you want to aim for is a healthy difference between scores. It doesn't help much if half the people score 0 and the other half score 100. You want a nice distribution so that it can act as a differentiator of the competitors.

    Just to be clear (I think you already get this, but just in case), I think the 'new' plan is fairly successful at making sure people have evenly matched games. The thing is, so was the old method. I happen to think that the old method was a slight bit better, but both are fine enough. To be honest, I'm not understanding the math behind the supposition that the new way is better. If Neil runs those numbers (please?), that should show pretty conclusively which of us is correct.

    ReplyDelete
  20. I played in last year's NOVA and have to second what was said, part of what I enjoyed the most was the obvious effort/attention to detail put in to the event. Even if things weren't exactly as you wanted at the event, you knew it wasn't for a lack of time, thought, or effort.

    As a comment to the anon poster's comment about someone who is 4-4 having at best a 10% chance of winning Ren. man, if not worse.

    Should someone who is 4-4 have a better shot than that? Renaissance man isn't a "fluffy" award, it's a best overall award that represents all portions of the hobby, it doesn't deny that one portion is playing competitively and tactically, it just implies that it's not the only (or most important) part of the hobby.

    In my mind, if someone who is 4-4 has even a chance of winning Ren-Man, the goal is accomplished, and it really helps to accommodate those 1 or 2 losses that just were out of your control due to dice, match-up, etc. and still allow 6-2 and 7-1's to have a strong chance of winning a "best overall" prize.

    I really see it shining in an aspect that's less easy to quantitatively measure: player enjoyment. If the two "mindsets" of tournament play are somewhat stratified in the event, on the second day you play like-minded people, who either enjoy a tough, tough hard fought tooth and nail game with 'ard lists, or you get to play others who place more weight on theme, background, or just goofing off with toy soldiers. I think there's a lot to be said for knowing that on day 2, barring a crazy occurence, the majority of players should be playing against people who have the same values as themselves about the hobby.

    ReplyDelete
  21. Let me think about how to represent the scenario well, and I will run some simulations. You have a couple things going here that I think are worth pointing out first:

    1) Assuming the new system is adopted, the NOVA-2011 will have 256 players and a projected 16 brackets on day 2. It's the 16 brackets that is important. If the NOVA-2011 only had 64 players with a projected 4 brackets, I would say "don't bother" because it's too easy for for outliers to crush the ranks in such a small scenario. The 16 brackets means that you can end up in a bracket 1-2 higher or lower than where you "should be" and still be in a fairly competitive environment.

    If that result is not acceptable and only a strict "you must be in the right bracket to match your skill" is acceptable, there simply, statistically, won't be a large enough sample of games on day 1 to effectively do this. I've been working on other ways to "fix that" issue, but that's for a 4 bracket scenario, not a 16. The larger, 16 brackets means that the field can be spread a little more.

    All the simulation has shown is that you can reliably get people "close" to the bracket they should be in, with a standard deviation of around 2 brackets and an average of being exactly in the correct bracket.

    AVERAGE DOES NOT EQUAL PERFECT... or even CLOSE to perfect for that matter. Remember, an average of 0 and 100 = 50, which is not close to either number. :P Indeed, looking at the simulations, and just entirely estimating, 80% of players won't end up in the "correct" bracket. However, 66% will end up in a bracket +/- 2 from where they should be and 90% will in end up in a bracket +/- 4 from where they should be.

    With 16 brackets, +/- 2 and even +/- 3 should be pretty acceptable. +/- 4 might be a stretch, but at that point, you have "effectively" not "correctly" bracketed MOST people.

    2) That being said, I think what I am reading here is that the "bracketing" to "fix" games 5-8 will not be doing anything more than just going with the format from last year.

    Honestly, we do need to check this out because we don't have a "control" for this experiment. :) So, although the bracketing looks pretty good, is it really any better than normal? So, let's see if it really does.

    The problem is slightly different though. Before, we were measuring how accurate the brackets would be on day 2.

    Now, we want to measure how well either system will pair you up with opponents through all 8 matches. "Well" is defined as, having competitive games (not blow outs...)

    This is a different measurement and requires some analysis. Here's what I'm thinking off the top!

    I'll run a simulation using both a bracketed Day 2 and non-bracketed Day 2 for 8 games. Given the same assumption of 256 players, rated from 0 - 255 in terms of skill, I will measure the "difference" in player skill for each match-up for each player. I will then gather an average difference for all players and see which returns the lowest difference. That should be our answer.

    I'll get started on that this morning. Let me know if that sounds flawed!

    ReplyDelete
  22. Andrew, I don't have any opinion about the appropriate degree to which someone should be able to come back from first day defeats. I'd be happy if the only prize was best general, and I'd also be happy if Ren Man was based only on day two games. The only problem I have is that it appears that Mike's expectations for the 'new' plan aren't in line with the actual results of the 'new' plan. But then, my projected results are based on the 'new' score breakdown 40/40/20 and the assumption that Nova paint scores will be similar to other GT paint scores, and it seems like both of those assumptions may be wrong.

    Neil, that looks awesome. Thanks for doing the legwork.

    ReplyDelete
  23. My problem that I see with having Ren. Man based on only day two games is this: would that accurately reflect the battle portion of the goal of a best overall award?

    From NOVA last year, the paint scores were very well handled, and simply bringing a just meets the requirements army didn't net anyone I ran into scores of near 50%. Combine that with being judged twice for painting by two separate judges to help remove any skewness brought in by having the "easy going" judge do your army and the "hard judge" doing someone elses (although all the judges seemed on the same page on how to score, just an example), And I think most people found the scores pretty accurate.

    I see your point with wanting to examine whether the new design is effective at achieving it's goal, and think it's a good one. Feedback only makes these events better. As a side not..after spending half my time doing statistics at work, I can't imagine how you guys muster up the willpower to do it at home..kudos to you guys, can't wait to see it.

    ReplyDelete
  24. My gut feeling is that this new bracket system, especially over time, will minimize cases of the extreme. Anxious though to see what the number crunching produces!

    Now, what of the concern with someone sandbagging, i.e. intentionally going 0-4 on Day One with the intention of cleaning out the 16th bracket and taking home the Ren Man trophy with a 4-0 Day 2? That abomination could be a concern.

    What if then, Day 1 results are counted in part? Using random numbers to illustrate...
    40% paint
    40% sportsmanship
    10% Day 1 results (each win = 2.5%)
    10% Day 2 results (each win = 2.5%)

    On painting as brought up in the comments, I abhor the idea of checklists. I enjoy seeing well-painted minis and am a huge fan of conversions. Art and checkboxes just don't compute. Hastily assembled "creative" that exists simply to fulfill a requirement should not be rewarded. As I try to have a solution for each problem or complaint, I'd propose 3 judges, each take two laps. First lap is observation, second lap is scoring as each judge sees fit.

    ReplyDelete
  25. With respect to sandbagging, that is why Day 1 has to count. It's certainly statistically far more difficult to win Ren Man after going 0-4, albeit possible - which is the point. You *could* sandbag, lose 4 games, and clean up on Day 2 via that process, but you run 2 risks - 1, losing anyway to someone on Day 2, and 2 - losing Ren Man regardless, shooting yourself in the foot effectively. I could see someone sandbagging to 2-2, but even then they're just making life harder on themselves. Leaving the opportunity for redemption does not equal encouraging day 1 failure to achieve it.

    As for scoring, and painting ... checklists present subjectivity coming too much into play. It's almost a requirement, but that's why you have to structure a checklist that lets judges subjectively award "tops" from a certain %, and also appropriately measure people along a well-spread curve. We also do a "top 5" in each cat, and choose via the panel of appearance judges (~10+ of them)and the TO, so the actual superlatives are not selected by the numbers.

    ReplyDelete
  26. I would, briefly, like to just ask...what if only the top and bottom brackets were 16, and the others were 32?

    ReplyDelete
  27. Since it came up in this thread, I add the following:
    At the NOVA Open, we used a scorecard system, not a checklist, which enabled our judges to fairly and impartially review many armies in the short period of time we had for the reviews. We separated into categories the various technical components that make for well painted/presented miniatures and armies, and assigned point ranges (The scorecard showed the maximum points possible. Judges were free to award fewer points.) Combined, they summed to a possible 100 points. A table quality army would score 40-60 points. [Frankly, I am not sure how a checklist would work with the NOVA Open format. Poorly I suppose.] And yes, the scorecard was intended to produce a decent distribution curve. But there were too many unpainted armies and too many "not even close to meeting the (very lax) minimum standards" armies at the NOVA Open. That throws off any distribution curve. C'mon people, you bring a painted army to a tournament!

    We will be using the same system (albeit improved) this year for 40k and Fantasy, and probably for the other games which will have tournaments at the NOVA Open 2011, Malifaux and War Machine. Of course, as always, there is room to improve. And we will improve.

    @ Benjamin:
    Our interest was in the technical skills the painted miniatures displayed rather than the artistic choices which the painters made. The former can be evaluated with some degree of standardization and impartiality. The latter, well, art is in the eye of the beholder. We evaluated the degree of skill the painter exhibited in their execution of their artistic choices. This is the fairest way to conduct large scale appearance reviews. Sending judges around to look at armies and miniatures without any standardized way to compare them one to another is, quite frankly, unfair to everyone. And with over 100 armies at NOVA Open 2010 it would have been unworkable. With over 200 armies coming to NOVA Open 2011, it would be an utter debacle.

    ReplyDelete
  28. G Red, thank you very much for the above. I can see the reasoning for the way it's done. I can appreciate technique, and that's the fairest standard. Hopefully the most skilled painter isn't a complete dullard. :D

    Also, I second the charge for painted armies. I have just started myself to learn how to paint; it can be done!

    ReplyDelete
  29. I'm back with the simulation results.

    http://the11thcompany.freeforums.org/post11497.html#p11497

    The simple answer is that the bracket system does give people closer games on Day 2, BUT!!!!! not by much for the majority of players.

    The standard deviations on "closeness" are very similar using both the Bracket and Non-Bracket systems.

    The bracket system is still better though and probably worth it to provide a better experience for the majority of players IF it doesn't require a whole lot more effort.

    To add to that, if you are getting some benefit out of the brackets besides just game closeness, it's probably worth it as well.

    Let me know if you see any errors in the data as this crap is pretty complicated.

    ReplyDelete
  30. That is awesome Neil, thanks for doing that.

    ReplyDelete
  31. I like everything except the prize for 0-8. You should never incent people to do their worst. If you look at the overall tournament with the intent of providing an opportunity for everyone to have success by playing appropriate skilled and comped generals you throw it out the window with this prize. Don't give anyone any reason to throw all of their games and the tournament will be better for it.

    ReplyDelete
  32. @ Liono - Only 1 of the 256 people in the entire two-day tournament will go 0-8. It's a prize given tongue-firmly-in-cheek.

    ReplyDelete
  33. Agreed Neil, that was very cool.

    Am I correct in thinking that the difference between the 2010 and 2011 versions for the first four games is due entirely to differences in the random initial seedings? It would be cool if there was someway to run both of them with the same seedings, although from the look of things it wouldn't matter overmuch.

    I figured out where my mistake was earlier (and thanks again to Neil for showing that I needed to go back and reexamine my initial assumptions). I was thinking that the brackets were going to effectively close of people with different records from each other, but it is actually a fair bit more fine grained than that. It closes off, for instance, 'good' 2-2 from bad '2-2' players. I'd be willing to bet that is where we're getting the majority of the improvement in 'fairness' from; there's a pretty major gap in relative player abilities between the best 2-2 and the worst 2-2 going into day 2, at least with Neil's starting assignments of player skill.

    ReplyDelete
  34. Hey,

    Yes, the differences between the 2010 and 2011 should be entirely attributed to the initial Round 1 random pairing. The simulations both use the exact same algorithm up through round 4. It's at round 5 where they diverge. The 2010 tests continue using the same algorithm where the 2011 uses the exact same algorithm but only considering each of the 16 brackets as sort of a microcosm.

    I think that the biggest assumption in the whole thing is that you could actually rank all players 1-256 AND that even if you did, skill is the only thing that determines a win (which there isn't. :P)

    I think if we really, really wanted to simulate, we should come up with a value which could be multiplied by player skill to show a factor of dice luck and army match-up which could potentially see Player #135 defeating Player #140 but not so much sway that Player #0 could defeat Player #232.

    I could pretty easily add such a factor to each game, but first I would need to come up with what that factor should be. That could be a whole conversation by itself. :)

    ReplyDelete
  35. I've been idly thinking about about the way in which the addition of randomness between rounds might be modeled. I realized two things.

    It isn't going to happen. We'd need a lot of people playing a lot of games with the same lists. And we've have to come up with some way to control for terrain and fatigue and a host of other factors.

    But I also realized that fixing the amount that luck influences the seeding isn't neccessary. Suppose we chose some conservative estimate (say luck influences 1 out of every 10 games with between people within 15 steps of each other). Running the numbers with that estimate would tell us that luck has as least that much effect on the match-ups. Probably more importantly, it would tell us if, and how, luck would skew the results between the old and new methods of match-up determination.

    ReplyDelete
  36. Overall I think this is a pretty neat idea.

    I am more of a painter/hobbyist than competitive gamer. As such I tend to dread tournaments as no one really likes having their butt handed to them on a display board every three hours.

    I think the idea of spending the first day gauging your skill level vs. the populous then spending the second day battling it out against your peers is inspired!

    I encourage more event organizers to think "outside the box."

    On a side note: Will a similar format be used for Fantasy?

    R.G.H
    http://roguegeneralhunter.blogspot.com/

    ReplyDelete