作者Paraguay (巴拉圭)
看板IVERSON
标题Fw: [情报] Career Leaders and Records for HOF
时间Wed Mar 21 10:16:40 2012
※ [本文转录自 NBA 看板 #1FQJTE_x ]
作者: Alfred (Keine Ahnung) 看板: NBA
标题: Re: [情报] Career Leaders and Records for HOF
时间: Wed Mar 21 10:05:00 2012
每隔一段时间名人堂的话题就要被拿出来吵一次,
可惜的是论战往往都停留在相当初步的水平。
事实上板上早就有相关资讯,
只要搜寻标题「名人堂」,就可以找到:
#1DcjW7JP (NBA)
这篇简单介绍篮球名人堂的文章,同时也说明了入选的决策过程。
至於basketball-reference.com的名人堂机率,也是万年老梗,
早在2008年板上就有
#17jgq_6- (NBA) 这篇文章分析过。
值得注意的是,2008年的model和现在的model有所不同,
http://www.basketball-reference.com/about/hof_prob.html
除了相关权数的调整外,
最大的改变就是去掉了MVP这一项,
这也使得Nash的机率从原本九成以上跌破六成。
(Nash跟很多名列前矛的球员相比最吃亏的除了总冠军外,
应该是他的场均得分太低。)
但是在批评这个数据以前,应该要先搞清楚这整个model是怎麽来的,
否则陷入诸如总冠军跟MVP那个比较伟大的争执,
或是那种「看某某球员如何就知道这个模型有问题」之类的发言,
完全是没有搞清楚重点。
What statistics or accomplishments have the Hall of Fame voters deemed to be
most important? This question can be answered using a technique called
logistic regression. The logistic regression model is a binary response model
where the response is classified as either a "success" (in this case, being
elected to the Hall of Fame) or a "failure" (not being elected to the Hall of
Fame). One or more predictor variables are selected and the resulting model
can be used to predict the probability of a success given certain values of
the predictor(s).
首先作者开宗明义就说了,
这个模型是为了解释名人堂投票权人究竟认为那些因素最重要,
因此这个模型的功能是为了找出一个数学模型可以准确的「预测」投票结果,
请注意这里虽然把prediction翻成「预测」,
但跟一般最常使用的中文语义稍有出入,并不是「预测未来」的意思,
而是在衡量这个模型的解释力高低。
这个只要参看最後一段就再清楚也不过:
Hall of Fame probabilities are presented for all players with a minimum of
400 NBA games played. Although it can be risky to make predictions for active
players, you can think of these probabilities as answering the question "If
this player retired today, what is the probability he would be elected to the
Hall of Fame?". The model was built using a pool of 750 players. One method
to assess classification accuracy is to compare the estimated Hall of Fame
probability for the case to the actual result. Of the 750 players, 89 had
been elected to the Hall of Fame and 661 had not. If the player's predicted
probability of election was greater than or equal to 0.5, I predicted that he
was in the Hall of Fame. Of the 89 players in the Hall of Fame, 74 were
correctly classified (83.1%) and 15 were not (16.9%). Of the 661 players not
in the Hall of Fame, 651 were correctly classified (98.5%) and 10 were not
(1.5%). Overall, 725 of the 750 players (96.7%) were correctly classified by
the model.
在打过400场以上比赛的球员中(作者选了750人),
只要算出来机率破五成的,该模型就预测这个球员会被选入名人堂,反之亦然。
而在这750人中有725人的结果与现实相符,
换言之这个
模型目前的准确率高达96.7%。
那为何作者要把MVP拿掉?又为何作者把「身高」当作重要参数?
模型的作者没有给任何具体理由,
事实上,作者自己说了他试过无数种组合(trying numerous models),
可见这大有可能只是单纯try and error试出来的,
一切都只是为了提高模型的准确率。
若是如此,去争论那0.0001也好,争论为什麽不计MVP也好,意义都不是很大。
当然,这个模型如果真是这样搞出来的,
就算准确率很高,也未必有意义,
毕竟我们很难被说服这个模型真的回答了作者原本的问题,
即「名人堂投票权人究竟认为那些因素最重要?」
作者到最後只是用一堆看起来有关的参数搞出了一个准确率很高的黑盒子,
而究竟名人堂票选考量什麽,根本无法由这个模型提出合理的说明。
结论是,这个model有参考价值,但参考价值仅限於它的计算结果,
换言之,如果这个模型预测某位球员有超过五成以上的机会入选,
那麽在现实中这位球员将来入选的机会相当高,
甚至对这个模型有信心一点可以宣称有九成以上,
至於他所使用的参数,在作者给出更进一步的说明以前,
参考价值并不高,甚至讨论的价值也不高,
除非你也想挑战准确率更高的模型。
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 99.125.165.60
1F:推 RainCityBoy :越矮的球员 在这数据里越有利 03/21 10:05
2F:推 angelmax :推解释 03/21 10:06
3F:推 gratitude :然後Nash也没打过Final 03/21 10:12
4F:推 Paraguay :推 03/21 10:16
※ 发信站: 批踢踢实业坊(ptt.cc)
※ 转录者: Paraguay (58.114.81.143), 时间: 03/21/2012 10:16:40