Sagarin vs. Mvemjsunpx during the playoffs

JohnStOnge · Post by **JohnStOnge** » Sat Jan 09, 2016 5:47 pm

I can now say that I was watching this the whole time. I don't know if you guys think Mvemjsupx is good at picking games or not. But he clearly puts a lot of thought into it. So I decided that I would keep a spreadsheet recording his predictions vs. the Sagarin system predictions.

On the won/loss thing: Sagarin was 16-7, Mvemjsupx was 14-9. The difference is not "significant"

On bias, which is the average of actual margin minus predicted margin, with the ideal being 0: Sagarin was -4.5, Mvemjsupx was -4.3. The difference was not "significant."

On absolute value of difference between predicted margin and actual margin: Sagarin was 14, Mvemjsupx was 15. The difference was not "significant."

However, there is one indirect measure by which Sagarin beat Mvemjsupx. It has to do with the question of whether or not the "predictive value" of the system is better than what you would get from chance. When you go 14-9, as Mvemjsupx did, you can't say the outcome was "significantly" better than what can be explained by chance. You're 89 percent confident that fewer than 14 would've been picked correctly if you just flipped a coin. And the convention is that you can say the system is "significantly" better than chance when you reach 95 percent confidence.

But at 16-7, where Sagarin's system finished, the confidence level is 98 percent.

I did not do this to pick on Mvemjsupx. I just did it because Mvemjsupx put his numbers out there and provides an opportunity for comparison in an unbiased situation.

Mvemjsunpx · Post by **Mvemjsunpx** » Sat Jan 09, 2016 5:54 pm

JohnStOnge wrote:I can now say that I was watching this the whole time. I don't know if you guys think Mvemjsupx is good at picking games or not. But he clearly puts a lot of thought into it. So I decided that I would keep a spreadsheet recording his predictions vs. the Sagarin system predictions.

On the won/loss thing: Sagarin was 16-7, Mvemjsupx was 14-9. The difference is not "significant"

On bias, which is the average of actual margin minus predicted margin, with the ideal being 0: Sagarin was -4.5, Mvemjsupx was -4.3. The difference was not "significant."

On absolute value of difference between predicted margin and actual margin: Sagarin was 14, Mvemjsupx was 15. The difference was not "significant."

However, there is one indirect measure by which Sagarin beat Mvemjsupx. It has to do with the question of whether or not the "predictive value" of the system is better than what you would get from chance. When you go 14-9, as Mvemjsupx did, you can't say the outcome was "significantly" better than what can be explained by chance. You're 89 percent confident that fewer than 14 would've been picked correctly if you just flipped a coin. And the convention is that you can say the system is "significantly" better than chance when you reach 95 percent confidence.

But at 16-7, where Sagarin's system finished, the confidence level is 98 percent.

I did not do this to pick on Mvemjsupx. I just did it because Mvemjsupx put his numbers out there and provides an opportunity for comparison in an unbiased situation.

Interesting. Not quite the same thing, but I've been tracking the Sagarins & Masseys on spread picks this year and they aren't very good (the Masseys are awful).

JohnStOnge · Post by **JohnStOnge** » Sat Jan 09, 2016 7:42 pm

When you say they are not good you need to compare them to other systems. Like for instance, at least for FBS games, both are year in and year out comparable to the closing line. The line is usually slightly better. But "slightly" is the operative word and not always.

At this point this year, for as far as the page at http://www.thepredictiontracker.com/ncaaresults.php" onclick="window.open(this.href);return false; is updated, the average absolute value of the amount by which the line is off from the actual margin (absolute error) is 12.4. The absolute error for Sagarin is 12.8. The absolute error for Massey is 13.0. BTW Massey openly acknowledges that his system is not designed for prediction.

I've said this before to people who rag on the Sagarin Ratings: If you try to beat it over a large number of games in terms of getting closer in terms of predicting the spread you are probably going to lose. You might win if you cheat and just use the line. But if you don't look at the line and try to do it yourself you are probably going to lose.

JohnStOnge · Post by **JohnStOnge** » Sat Jan 09, 2016 7:52 pm

And by the way Mv, you did way better than I thought you would in terms of predicting spreads. I've tested people before on this and you did better than anybody I've ever tested. I was surprised. Usually what I expect to see is that the person I'm testing has a decent chance to beat the Sagarin System in terms of picking winners and losers but they're going to be blown away in terms of picking the spreads. With you the opposite happened. You did a little worse picking winners and losers but were very close in terms of picking the spreads and also very close and even a little ahead in terms of bias. I really did not expect that. It was interesting to watch as the playoffs went on.

Mvemjsunpx · Post by **Mvemjsunpx** » Sat Jan 09, 2016 9:29 pm

JohnStOnge wrote:When you say they are not good you need to compare them to other systems. Like for instance, at least for FBS games, both are year in and year out comparable to the closing line. The line is usually slightly better. But "slightly" is the operative word and not always.

At this point this year, for as far as the page at http://www.thepredictiontracker.com/ncaaresults.php" onclick="window.open(this.href);return false; is updated, the average absolute value of the amount by which the line is off from the actual margin (absolute error) is 12.4. The absolute error for Sagarin is 12.8. The absolute error for Massey is 13.0. BTW Massey openly acknowledges that his system is not designed for prediction.

I've said this before to people who rag on the Sagarin Ratings: If you try to beat it over a large number of games in terms of getting closer in terms of predicting the spread you are probably going to lose. You might win if you cheat and just use the line. But if you don't look at the line and try to do it yourself you are probably going to lose.

I was comparing them to my own picks and whether they were in the black or not (acting as if the Sagarins & Masseys were picking 5dimes' point spreads) for every DI game. I was slightly in the red, but I was still higher than the Sagarins & waaayyyy higher than the Masseys. The Masseys would've lost money every single week.

Post by **clenz** » Sat Jan 09, 2016 10:16 pm

AGS runs a playoff predictor every year and Massey and Sagarin always finish in the middle of the pack (usually about 100-120 entries and Sagarin/Massey finish between 50-60 nearly every year)

JohnStOnge · Post by **JohnStOnge** » Sat Jan 09, 2016 10:25 pm

clenz wrote:AGS runs a playoff predictor every year and Massey and Sagarin always finish in the middle of the pack (usually about 100-120 entries and Sagarin/Massey finish between 50-60 nearly every year)

When you throw out a bunch of comparisons random chance plays a role. Like for instance if you were to just say you're going to randomly pick winners and do it 120 times some will do very well just by random chance. What you would probably find if you do what you're talking about and adjust for multiple comparisons is that nobody does "significantly" better than anyone else.

For example, if you have 23 games and you just have 120 people flip coins to pick winners you have about a 50:50 chance that at least one person will pick at least 17 of the 23 games correctly.

For a real test you need to decide what your "standard" is and compare one on one. Like if you think there is somebody at AGS who is really good you need to pick that one person and have them pick then compare that to Sagarin.

Post by **clenz** » Sat Jan 09, 2016 10:28 pm

JohnStOnge wrote:
clenz wrote:AGS runs a playoff predictor every year and Massey and Sagarin always finish in the middle of the pack (usually about 100-120 entries and Sagarin/Massey finish between 50-60 nearly every year)
When you throw out a bunch of comparisons random chance plays a role. Like for instance if you were to just say you're going to randomly pick winners and do it 120 times some will do very well just by random chance. What you would probably find if you do what you're talking about and adjust for multiple comparisons is that nobody does "significantly" better than anyone else.

That's not randomly picking teams.

It's projecting how the bracket plays out, using projected point spreads as part of the equation.

Post by **kalm** » Sun Jan 10, 2016 7:05 am

clenz wrote:
JohnStOnge wrote:
When you throw out a bunch of comparisons random chance plays a role. Like for instance if you were to just say you're going to randomly pick winners and do it 120 times some will do very well just by random chance. What you would probably find if you do what you're talking about and adjust for multiple comparisons is that nobody does "significantly" better than anyone else.
That's not randomly picking teams.

It's projecting how the bracket plays out, using projected point spreads as part of the equation.

Sagarin and Massey get beat every year by 50 or 60 humans on AGS in predicting bracket outcomes including predicting the exact scoring in each game.

89Hen · Post by **89Hen** » Mon Jan 11, 2016 8:55 am

JohnStOnge wrote:I did not do this to pick on Mvemjsupx. I just did it because Mvemjsupx put his numbers out there and provides an opportunity for comparison in an unbiased situation.

Go fuck yourself JSO. I've asked you for YEARS to put up Sagarin or any other computer's picks against the GoHens Top 25 pool (a MUCH bigger sampling) but you refuse year after year.

Post by **Vidav** » Mon Jan 11, 2016 10:32 am

89Hen wrote:
JohnStOnge wrote:I did not do this to pick on Mvemjsupx. I just did it because Mvemjsupx put his numbers out there and provides an opportunity for comparison in an unbiased situation.
Go fuck yourself JSO. I've asked you for YEARS to put up Sagarin or any other computer's picks against the GoHens Top 25 pool (a MUCH bigger sampling) but you refuse year after year.

Post by **Grizalltheway** » Mon Jan 11, 2016 10:41 am

Post by **clenz** » Mon Jan 11, 2016 11:05 am

89Hen wrote:
JohnStOnge wrote:I did not do this to pick on Mvemjsupx. I just did it because Mvemjsupx put his numbers out there and provides an opportunity for comparison in an unbiased situation.
Go fuck yourself JSO. I've asked you for YEARS to put up Sagarin or any other computer's picks against the GoHens Top 25 pool (a MUCH bigger sampling) but you refuse year after year.

I've done with the contest I run on AGS - about 40 players - and Sagain does not finish well in that. When looking only at players who submit every single week (about 25-30 players) he usually finishes between 15-20th)

89Hen · Post by **89Hen** » Mon Jan 11, 2016 11:13 am

clenz wrote:
89Hen wrote: Go fuck yourself JSO. I've asked you for YEARS to put up Sagarin or any other computer's picks against the GoHens Top 25 pool (a MUCH bigger sampling) but you refuse year after year.
I've done with the contest I run on AGS - about 40 players - and Sagain does not finish well in that. When looking only at players who submit every single week (about 25-30 players) he usually finishes between 15-20th)

There you have it.

Post by **bandl** » Mon Jan 11, 2016 11:17 am

89Hen wrote:
JohnStOnge wrote:I did not do this to pick on Mvemjsupx. I just did it because Mvemjsupx put his numbers out there and provides an opportunity for comparison in an unbiased situation.
Go fuck yourself JSO. I've asked you for YEARS to put up Sagarin or any other computer's picks against the GoHens Top 25 pool (a MUCH bigger sampling) but you refuse year after year.

SHIT BE GETTIN' REAL UP IN HERE!

I'm putting a fiddy down on 89. Anyone else want in on this action (no homo)?

JohnStOnge · Post by **JohnStOnge** » Mon Jan 11, 2016 7:52 pm

kalm wrote:
clenz wrote: That's not randomly picking teams.

It's projecting how the bracket plays out, using projected point spreads as part of the equation.

Sagarin and Massey get beat every year by 50 or 60 humans on AGS in predicting bracket outcomes including predicting the exact scoring in each game.

Guys, I just used randomly picking teams as an illustration of what happens when you do a bunch of comparisons at once. Let's see if I can explain this. If you throw a bunch of systems into the pot then look after the fact the effect random chance says that if it's close at all in terms of how good systems are any particular system you start off looking at will likely not be near the top. Could be. But probably not.

What you need to do is pick two systems ahead of time, before the run starts, then compare just those two. So what you need to do is do something like pick the best human predictor on AGS. If you think that's the one that happened to end up on top this year, pick that one. Then next year compare JUST that guy (or girl) to how Sagarin does.

Then you do some kind of statistical test to see if there is a "significant" difference. If you start off with something like 120 you're not going to get a "significant" difference because you have to adjust the significance level for multiple comparisons because of that random chance thing. For example: Lets say you do 120 comparisons and you see that the best guy would be better than Sagarin at 95% confidence if those were the only two you compared. If you compare every pair among 120 "systems" that's 7,120 possible pair wise comparisons. To make a long story short that means instead of having 95 percent confidence you've essentially got 0 percent confidence. Something like 1 x 10 to the minus 157 percent confidence. You've got nothing in terms of trying to decide if one system is better than another in terms of predicting outcomes.

Post by **clenz** » Mon Jan 11, 2016 8:26 pm

JohnStOnge wrote:
kalm wrote:

Sagarin and Massey get beat every year by 50 or 60 humans on AGS in predicting bracket outcomes including predicting the exact scoring in each game.
Guys, I just used randomly picking teams as an illustration of what happens when you do a bunch of comparisons at once. Let's see if I can explain this. If you throw a bunch of systems into the pot then look after the fact the effect random chance says that if it's close at all in terms of how good systems are any particular system you start off looking at will likely not be near the top. Could be. But probably not.

What you need to do is pick two systems ahead of time, before the run starts, then compare just those two. So what you need to do is do something like pick the best human predictor on AGS. If you think that's the one that happened to end up on top this year, pick that one. Then next year compare JUST that guy (or girl) to how Sagarin does.

Then you do some kind of statistical test to see if there is a "significant" difference. If you start off with something like 120 you're not going to get a "significant" difference because you have to adjust the significance level for multiple comparisons because of that random chance thing. For example: Lets say you do 120 comparisons and you see that the best guy would be better than Sagarin at 95% confidence if those were the only two you compared. If you compare every pair among 120 "systems" that's 7,120 possible pair wise comparisons. To make a long story short that means instead of having 95 percent confidence you've essentially got 0 percent confidence. Something like 1 x 10 to the minus 157 percent confidence. You've got nothing in terms of trying to decide if one system is better than another in terms of predicting outcomes.

Oh...you mean like this?

http://www.anygivensaturday.com/showthr ... -Challenge

Scoring
Official scoring process:
Each correct pick in a given round will be given the following points:
1st Round - 15
2nd Round - 15
Quarterfinals - 30
Semifinals - 50
Finals - 100

The scores will be used for deductions or bonuses from each total. If an exact score is guessed, there will be a ten point bonus added. There will be a tenth of a point deduction for each point a prediction for each separate team is off the actual score for that given team. This is not how many points your combined score was off the real combined score. For example, joe picked A over B 27-14. The final score was A winning 24-17. The combined scores were both 41, but joe was 3 off A's final score and 3 off B's final score. Therefore, .6 will be deducted from his points for that round. If B had won that game, joe simply would not have gotten points for that respective game.

First round scoring example:
A over B - 27-14
A beat B 24-17 15 points for the win minus .6 = 14.4
C over D - 31-10
C beat D 24-10 15 points for the win minus .7 = 14.3
E over F - 14-7
F beat E 17-14 no points awarded for loss
G over H - 24-21
G beat H 24-21 15 points for the win plus 10 for nailing the score = 25

14.4 + 14.3 + 0 + 25 = 53.7 for Round one.

So, every game/prediction/possibility was entered BEFORE the playoffs started.

There were 109 entries. Results in the spoiler

Spoiler: show: 1 MarkyMark 442.9
2 MTfan4life 428
3 mamberso 403.9
4 gumby013 391.1
5 TheKingpin28 389.6
6 footballer23 388
7 Milkman 381.2
8 DaBizon 377.5
9 thebootfitter 375.2
9 VT Wildcat Fan53 375.2
11 Mayville Bison 373.5
12 citdog 371.5
13 BisonTru 370
14 stevdock 360.4
15 RabidRabbit 359.9
16 jmufan999 357.9
17 IBleedYellow 356.5
18 The Yo Show 345.1
19 da_Bison 343.9
20 NDSUtk 342.1
21 seattlespider 330.1
22 herd13 328.6
23 Bison56 318.2
24 Missingnumber7 314.7
25 swaghook 311.9
26 Winindy 302.8
27 Bear84 292.8
28 BisonFan02 290.3
29 SUPharmacist 290.2
30 JSUSoutherner 289.3
31 Terry2889 289.1
32 Twentysix 288.5
33 Rollbird5 286.2
34 UIWWildthing 285.1
35 Gangtackle11 281.8
36 jacoj21 275.4
37 taper 273.1
38 MR. CHICKEN 263.4
39 FordhamFan 261.2
40 ST_Lawson 255.6
41 AGS Community 254.1
42 Bisonator 253.8
43 Prime Power 249.2
44 Lehigh'98 248.1
45 Redbird74 246.7
46 bisoninloveland 242.2
47 FUBeAR 240.8
48 THE DANIMAL 240.1
49 Griz23 237.2
50 PantherRob82 237
51 Samalum'10 234.1
52 ElCid 229.1
53 Lehigh Football Nation 228.2
54 CappinHard 227.3
55 UNHWildCats 226.6
56 FCSbuff319 225.4
57 jsualumnus 223.7
58 SportsLover 223.3
59 veinup 222
60 OhioHen 217
61 CitadelGrad 215.5
62 ming01 213.7
63 Massey 212.8
64 HailSzczur 210.8
65 cpacmel 210.6
66 JMUNJ08 210.1
67 gotts 209.5
68 WileECoyote06 208.1
69 kalm 202.6
70 Catsfan90 201.5
71 realloser 201.3
72 mmiller_34 200.6
73 KPSUL 199.6
74 grizband 199
75 melloware13 197.7
76 Hambone 197.4
77 dbackjon 195.7
78 tomq04 188.2
79 FCSwatcher 186.2
80 ngineer 184
81 Professor Chaos 183.3
82 bobcathpdevil56 182.4
83 CasualFan 181.8
84 LehighU11 180.9
85 knucklehead 180.6
86 leatherneck177 177.4
87 Original_RMC 174.9
88 World 169.7
89 McNeese75 168.8
90 smilo 168.2
91 JMU2K_DukeDawg 168.2
92 Loyl2u 167.8
93 Daytripper 163.2
94 Catbooster 159.4
95 crusader11 157
96 Casey_Orourke 156.3
97 Go Lehigh TU owl 152.4
98 4grz 146.7
99 bjtheflamesfan 146.6
100 Nodak78 141.1
101 hktribefan 137.5
102 msupokes1 134.3
103 UNHFan@RWU 131.9
104 jmu007 131.2
105 Drblankstare 130.1
106 SkinsWizDukes 129.2
107 Thumper 76 128.3
108 wmmii 127.7
109 WrenFGun 97.9

I made a couple of interesting scores stand out

Massey finished 59th out of 109 with a score of 212.8
The AGS Community bracket finished 41st with 254.1 points. The community bracket was an "average" of all of the brackets submitted.

To put it another way...Massey (at least) isn't even close to the average FCS fan on AGS - most of whom don't follow the FCS outside of their team/conference worth dick.

Is that not statistical enough?

Post by **kalm** » Mon Jan 11, 2016 8:51 pm

clenz wrote:
JohnStOnge wrote:
Guys, I just used randomly picking teams as an illustration of what happens when you do a bunch of comparisons at once. Let's see if I can explain this. If you throw a bunch of systems into the pot then look after the fact the effect random chance says that if it's close at all in terms of how good systems are any particular system you start off looking at will likely not be near the top. Could be. But probably not.

What you need to do is pick two systems ahead of time, before the run starts, then compare just those two. So what you need to do is do something like pick the best human predictor on AGS. If you think that's the one that happened to end up on top this year, pick that one. Then next year compare JUST that guy (or girl) to how Sagarin does.

Then you do some kind of statistical test to see if there is a "significant" difference. If you start off with something like 120 you're not going to get a "significant" difference because you have to adjust the significance level for multiple comparisons because of that random chance thing. For example: Lets say you do 120 comparisons and you see that the best guy would be better than Sagarin at 95% confidence if those were the only two you compared. If you compare every pair among 120 "systems" that's 7,120 possible pair wise comparisons. To make a long story short that means instead of having 95 percent confidence you've essentially got 0 percent confidence. Something like 1 x 10 to the minus 157 percent confidence. You've got nothing in terms of trying to decide if one system is better than another in terms of predicting outcomes.
Oh...you mean like this?

http://www.anygivensaturday.com/showthr ... -Challenge

Scoring
Official scoring process:
Each correct pick in a given round will be given the following points:
1st Round - 15
2nd Round - 15
Quarterfinals - 30
Semifinals - 50
Finals - 100

The scores will be used for deductions or bonuses from each total. If an exact score is guessed, there will be a ten point bonus added. There will be a tenth of a point deduction for each point a prediction for each separate team is off the actual score for that given team. This is not how many points your combined score was off the real combined score. For example, joe picked A over B 27-14. The final score was A winning 24-17. The combined scores were both 41, but joe was 3 off A's final score and 3 off B's final score. Therefore, .6 will be deducted from his points for that round. If B had won that game, joe simply would not have gotten points for that respective game.

First round scoring example:
A over B - 27-14
A beat B 24-17 15 points for the win minus .6 = 14.4
C over D - 31-10
C beat D 24-10 15 points for the win minus .7 = 14.3
E over F - 14-7
F beat E 17-14 no points awarded for loss
G over H - 24-21
G beat H 24-21 15 points for the win plus 10 for nailing the score = 25

14.4 + 14.3 + 0 + 25 = 53.7 for Round one.
So, every game/prediction/possibility was entered BEFORE the playoffs started.

There were 109 entries. Results in the spoiler

Spoiler: show
1 MarkyMark 442.9
2 MTfan4life 428
3 mamberso 403.9
4 gumby013 391.1
5 TheKingpin28 389.6
6 footballer23 388
7 Milkman 381.2
8 DaBizon 377.5
9 thebootfitter 375.2
9 VT Wildcat Fan53 375.2
11 Mayville Bison 373.5
12 citdog 371.5
13 BisonTru 370
14 stevdock 360.4
15 RabidRabbit 359.9
16 jmufan999 357.9
17 IBleedYellow 356.5
18 The Yo Show 345.1
19 da_Bison 343.9
20 NDSUtk 342.1
21 seattlespider 330.1
22 herd13 328.6
23 Bison56 318.2
24 Missingnumber7 314.7
25 swaghook 311.9
26 Winindy 302.8
27 Bear84 292.8
28 BisonFan02 290.3
29 SUPharmacist 290.2
30 JSUSoutherner 289.3
31 Terry2889 289.1
32 Twentysix 288.5
33 Rollbird5 286.2
34 UIWWildthing 285.1
35 Gangtackle11 281.8
36 jacoj21 275.4
37 taper 273.1
38 MR. CHICKEN 263.4
39 FordhamFan 261.2
40 ST_Lawson 255.6
41 AGS Community 254.1
42 Bisonator 253.8
43 Prime Power 249.2
44 Lehigh'98 248.1
45 Redbird74 246.7
46 bisoninloveland 242.2
47 FUBeAR 240.8
48 THE DANIMAL 240.1
49 Griz23 237.2
50 PantherRob82 237
51 Samalum'10 234.1
52 ElCid 229.1
53 Lehigh Football Nation 228.2
54 CappinHard 227.3
55 UNHWildCats 226.6
56 FCSbuff319 225.4
57 jsualumnus 223.7
58 SportsLover 223.3
59 veinup 222
60 OhioHen 217
61 CitadelGrad 215.5
62 ming01 213.7
63 Massey 212.8
64 HailSzczur 210.8
65 cpacmel 210.6
66 JMUNJ08 210.1
67 gotts 209.5
68 WileECoyote06 208.1
69 kalm 202.6
70 Catsfan90 201.5
71 realloser 201.3
72 mmiller_34 200.6
73 KPSUL 199.6
74 grizband 199
75 melloware13 197.7
76 Hambone 197.4
77 dbackjon 195.7
78 tomq04 188.2
79 FCSwatcher 186.2
80 ngineer 184
81 Professor Chaos 183.3
82 bobcathpdevil56 182.4
83 CasualFan 181.8
84 LehighU11 180.9
85 knucklehead 180.6
86 leatherneck177 177.4
87 Original_RMC 174.9
88 World 169.7
89 McNeese75 168.8
90 smilo 168.2
91 JMU2K_DukeDawg 168.2
92 Loyl2u 167.8
93 Daytripper 163.2
94 Catbooster 159.4
95 crusader11 157
96 Casey_Orourke 156.3
97 Go Lehigh TU owl 152.4
98 4grz 146.7
99 bjtheflamesfan 146.6
100 Nodak78 141.1
101 hktribefan 137.5
102 msupokes1 134.3
103 UNHFan@RWU 131.9
104 jmu007 131.2
105 Drblankstare 130.1
106 SkinsWizDukes 129.2
107 Thumper 76 128.3
108 wmmii 127.7
109 WrenFGun 97.9
I made a couple of interesting scores stand out

Massey finished 59th out of 109 with a score of 212.8
The AGS Community bracket finished 41st with 254.1 points. The community bracket was an "average" of all of the brackets submitted.

To put it another way...Massey (at least) isn't even close to the average FCS fan on AGS - most of whom don't follow the FCS outside of their team/conference worth dick.

Is that not statistical enough?

KAAAAABOOOOOOM!

(I almost beat Massey and put all of 10 minutes into it)

JohnStOnge · Post by **JohnStOnge** » Mon Jan 11, 2016 9:12 pm

Massey openly says his system isn't designed for prediction. I actually had an e mail exchange with him on that a few years back. I think that the best ranking system is the one that predicts most accurately. He does not. He thinks "retrodiction," or getting closest to explaining past results, is the best way.

Otherwise: The important point isn't that the rules are specified ahead of time. The important point is that there are too many comparisons to allow for making an assessment as to whether any particular system or person is better than another one.

I mean, it's fine for a "contest." If you win the contest you win the contest. But if what you're trying to do is test to see if one is better than the other it's too many comparisons.

If you really want to test Sagarin vs. human knowledge pick ONE human you decide is really good before the games transpire and make a single comparison. If you put a bunch of humans in you are fudging it.

So like next year maybe you would take MarkyMark. And you'd lay out the games ahead of time each week and get his predictions then document Sagarin predictions. So on and so forth.

Post by **kalm** » Mon Jan 11, 2016 9:20 pm

JohnStOnge wrote:Massey openly says his system isn't designed for prediction. I actually had an e mail exchange with him on that a few years back. I think that the best ranking system is the one that predicts most accurately. He does not. He thinks "retrodiction," or getting closest to explaining past results, is the best way.

Otherwise: The important point isn't that the rules are specified ahead of time. The important point is that there are too many comparisons to allow for making an assessment as to whether any particular system or person is better than another one.

I mean, it's fine for a "contest." If you win the contest you win the contest. But if what you're trying to do is test to see if one is better than the other it's too many comparisons.

If you really want to test Sagarin vs. human knowledge pick ONE human you decide is really good before the games transpire and make a single comparison. If you put a bunch of humans in you are fudging it.

I mean yeah, sure, if all your trying to do is take a season of evidence and try and figure who the best teams are and who's most likely to make it through the brackets, I SUPPOSE you have a point, but...

Post by **clenz** » Mon Jan 11, 2016 9:28 pm

JohnStOnge wrote:Massey openly says his system isn't designed for prediction. I actually had an e mail exchange with him on that a few years back. I think that the best ranking system is the one that predicts most accurately. He does not. He thinks "retrodiction," or getting closest to explaining past results, is the best way.

Otherwise: The important point isn't that the rules are specified ahead of time. The important point is that there are too many comparisons to allow for making an assessment as to whether any particular system or person is better than another one.

I mean, it's fine for a "contest." If you win the contest you win the contest. But if what you're trying to do is test to see if one is better than the other it's too many comparisons.

If you really want to test Sagarin vs. human knowledge pick ONE human you decide is really good before the games transpire and make a single comparison. If you put a bunch of humans in you are fudging it.

So like next year maybe you would take MarkyMark. And you'd lay out the games ahead of time each week and get his predictions then document Sagarin predictions. So on and so forth.

So then what my contest is?

Compare him to only the top person in my pick contest?

Okay...

I haven't looked at this year's numbers BUT last year my top picker (and he finishes top 5 pretty much every year) finished with 276 of 325 regular season points, or 85% (25 games per week - all top 25 games and then however many FCS only games needed to get to 25 total games, usually 4 or 5. Sagarin finished with 198 points, or 60%.

Once again, of the 32 players that submitted every week last season, Sagarin finished 22nd. Point spread doesn't matter.

What about playoff games? Each round, due to fewer games, is worth 40 points. If there is 8 games each game is worth 5. If there are 4 games each is worth 10. 2 games is 20 and the title game is 40. For Sagarin's ratings I set up a confidence system (Team A is projected to win by between X and Y points it's worth a confidence of Z points.

Playoffs are worth a total of 200 potential points (if you wager full points each round). The top performer gained 179 of 200 points. Sagarin? 98. Community average? 141

Go ahead, what's the flaw with that system?

JohnStOnge · Post by **JohnStOnge** » Mon Jan 11, 2016 9:35 pm

I mean yeah, sure, if all your trying to do is take a season of evidence and try and figure who the best teams are and who's most likely to make it through the brackets, I SUPPOSE you have a point, but...

No it's really not just that. It's the effect of having a bunch of coin flips. Let's say you have a person that is really good at picking games. If you have enough people just pick games by flipping coins you are going to reach a point where it's almost certain that some of them are going to beat this person in some set of games he picks when all they are doing is flipping coins to pick.

89Hen · Post by **89Hen** » Tue Jan 12, 2016 7:47 am

JohnStOnge · Post by **JohnStOnge** » Sat Jan 23, 2016 8:38 am

I just finally actually looked at the details of the scoring system for the AGS contest. In my opinion that's not really a system for evaluating predictive accuracy. Just the fact that it puts different weights on different games means it's not going to do that.

JohnStOnge · Post by **JohnStOnge** » Sat Jan 23, 2016 8:53 am

Go **** yourself JSO. I've asked you for YEARS to put up Sagarin or any other computer's picks against the GoHens Top 25 pool (a MUCH bigger sampling) but you refuse year after year.

You typed that before I said anything about the multiple comparisons but, as I said, it's very unlikely that you can reach any firm conclusion about whether or not one system or person is better than another one when you just throw a bunch of different systems or people into the pot then look for who did best etc., after the fact.

In order to determine whether or not it's likely that the difference in performance between two people or systems is because one is better than the other rather than just chance, you have to do some kind of statistical hypothesis test. And when you do more than one statistical hypothesis test, you lose power when you do more than one comparison. I'm not making this up. For instance, you can see the issue discussed here:

http://www.stat.berkeley.edu/~mgoldman/Section0402.pdf" onclick="window.open(this.href);return false;

So, with 20 tests being considered, we have a 64% chance of observing at least one sig- nificant result, even if all of the tests are actually not significant. In genomics and other biology-related fields, it’s not unusual for the number of simultaneous tests to be quite a bit larger than 20... and the probability of getting a significant result simply due to chance keeps going up.

So if you want to do the Go Hens thing you need to pick who you think is the best person who participates in that ahead of time then have that person go against Sagarin one on one.

Championship Subdivision Football | FCS Football | Stadiums | Blogs | Forums

Sagarin vs. Mvemjsunpx during the playoffs

Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs

Re: Sagarin vs. Mvemjsunpx during the playoffs