Re: Steamsharp Chronicles.
Posted: Tue Sep 10, 2013 10:20 am
matty, dm me your gmail on twitter and ill pop you some data. believe it or not, the window of data is relevant because pitchers go on hot and cold streaks but there is a functional relationship between periodicity of data and predictive efficiency of the historical data set. to be fair i didnt explain this: you should understand we take the last 90 innings and then use that data to model a distribution which is used in a copula driven monte carlo, so really we are extrapolating hundreds of thousands of data points (random inning stats vectors) from a suggested generator data set built from the last "x" innings. we do this, critically, to mimic the stochastic behavior of the on field stats with bound random numbers.
x = 45 innings
x = 90 innings
x = 180 innings
as x gets really big the data you use to generate expected numbers gets worse after a point and using a simple fminsearch func in matlab finds us optimal x.
you do this to get a good set of 500k or so random vectors bound by the machine learning model parameters and plug those into the machine learning algo which will then do its thing and classify results which are cast into WE's
so to sum up A) we arent bound by a small sample size, we are modelling baseball as a sthocastic process and using the work of our peers in machine learning to classify the results.
this is pretty much standard modelling work in investment banking these days in the derivative market.
caveat: some people blame Gaussian Copula for the 2008 financial collapse, its funny to google it and see people rant and rave about a math formula.
x = 45 innings
x = 90 innings
x = 180 innings
as x gets really big the data you use to generate expected numbers gets worse after a point and using a simple fminsearch func in matlab finds us optimal x.
you do this to get a good set of 500k or so random vectors bound by the machine learning model parameters and plug those into the machine learning algo which will then do its thing and classify results which are cast into WE's
so to sum up A) we arent bound by a small sample size, we are modelling baseball as a sthocastic process and using the work of our peers in machine learning to classify the results.
this is pretty much standard modelling work in investment banking these days in the derivative market.
caveat: some people blame Gaussian Copula for the 2008 financial collapse, its funny to google it and see people rant and rave about a math formula.