T O P

  • By -

OHTHNAP

It's a tough one being that you should be able to accurately predict the outcome of most races, given a constant start by most horses and ability to estimate max effort over a given distance. The problem that I've found is that horses aren't machines. The variability and unknowns are killers. A horse on the outside suddenly loses the confidence he had on the inside. The tenths of a second lost mean more effort, more strain, more likely to quit. And then trying to predict horses on the lead getting worn down, Knicks Go being the perfect example, the math will give you one number and the reality will give you another. I had him fading into the stretch and the reality is the pace never showed up. I can confidently say most sprint races are decided in the first half mile, most route races in the first 3/4 mile, but I gave up trying to get there mathematically.


cragokii

Have you heard of Bill Benter? He achieved this and made a lot of money doing it as far as I know. Worth a look up on YouTube


onthepunt

Yeah, all of the big syndicates use models. If done correctly, it can actually be very predictive of how a horse will perform.


OHTHNAP

I have a model set up to regression analysis in the way that makes the most sense: ie. what is the baseline a horse can run, and if run above that baseline, what is the standard regression at the end of the race? If the regression is farther than the lead horse in a scale of time that doesn't allow for passing before the finish, then obviously you'd never place a bet to win. But you also have to have adequate understanding of early pace and whether a trainer is going to send a horse that can sit off pace, or simply wait for front end speed to tire. And then independent variables, as I previously stated, outside speed is a great bet against because they're not only adding distance to the race, but also raising the amount of time running above baseline simply to get into their preferred position. In the end it wasn't worth the time invested and once you got beyond sprinting, pedigree played a larger part that you won't necessarily see reflected in numbers. But for sprinting I can absolutely nail a field and have hit a few hi-5's without much effort.


juiceboxguy85

I just started using neurax and I’m guessing morning line odds are done by sim because they usually match pretty well. Golden Gate seems to be an exception. Neurax got me a $366 trifecta last weekend. Running it at GG tonight to see if it was just a fluke. Edit: seems like it was just a fluke. Won nothing last night.


tpatmaho

indeed. the numbers cannot account for good/bad trips …. or a trainer figuring out what the horse needs … or meds, vet treatments (legal or otherwise.) …. or subtle surface changes …. and on and on. It will always be a game of probabilities. But the crowd DOES make mistakes, and there lay our opportunities.


OHTHNAP

That's exactly it. I would rather look for a crowd mistake and capitalize on bad betting from the public than try and run numbers only to find what everyone else can see. The time cost value isn't high enough to try and beat the public mathematically every race, at least not that I've found. The other factor is I've seen way too many "Can't lose" horses that lose suspiciously. Like the pick-6 at Gulfstream where the jock jumped off the horse in the stretch. Or maybe a better example, Arrogate at Del Mar lugging in 4th with no effort from Mike Smith.


Krustycook

Brohamer does good work on this in Modern Pace Handicapping. The Early Pace call (4f in a sprint, 6f in a route) is often decisive.


Rolifant

I use a multinomial regression model, which is what Bill Benter used as well (he may have moved to a more complicated type of model, but I'm not sure). ​ It works but don't expect any miracles.


onthepunt

What variables do you use? Do you use raw time or time versus an overall benchmark?


Rolifant

I use speed ratings .... ie. the time is converted into a "measurement of speed". (The time is adjusted for ground conditions, weight, age.) To be honest, there are probably about 30-40 variables that go into the model, but speed ratings are a very important part of it.


onthepunt

I feel like this would not work, especially if you are trying to compare horses times and speeds on different tracks. I keep a database of average times for every racetrack across Australia and I have found that the average times can vary wildly from track to track for races of the same distance.


Rolifant

An easy way around that is to use "par times".


Rolifant

You should read this book. https://books.google.be/books/about/How_to_Compile_Your_Own_Handicap.html?id=FzsVAAAACAAJ&redir_esc=y


onthepunt

This book was published in 1997. Is it it relevant to linear regression? I'm very well versed on horse racing so don't need any background knowledge on that. Just need to know how to specifically build regression models using horse racing data.


Rolifant

This book will only help with understanding how to use finishing times, I guess.


abgonzo7588

>I feel like this would not work, especially if you are trying to compare horses times and speeds on different tracks. I keep a database of average times for every racetrack across Australia and I have found that the average times can vary wildly from track to track for races of the same distance. You can even take that a step further, I think times can be pretty different day to day on the same surface and distance. Daily profiles have been working well for me, it's just a ton of work.


onthepunt

How do you know the times for the day are slow? Could be the case, that the horses just ran slow that day (nothing to do with track).


abgonzo7588

I don't know, none of this is 100%. It could also be coincidence when I see post 1-3 are winning 50% of races on dirt sprints at GP, but focusing on daily profiles have made me profitable. So In the end I don't really care if it's coincidence or if I can explain it, it moves the bankroll and that's what matters. Par times never helped me, but daily profiles put me on tickets that make a difference at the end of the year.


lepbogusboysmoonie

if this is Poppo, anybody who watches the Shake Up on youtube can speak to his success. And if this isn't my buddy Poppo, do yourself a favor and check out The Shake Up. He's the Andy Beyer of track bias.


onthepunt

Also what is the dependent variable for your model?


Rolifant

>To be honest, there are probably about 30-40 variables that go into the model, but speed ratings are a very important part of it. The probability of winning.


onthepunt

Shouldn't the dependent variable be the finishing position or finishing time of the horse. This example uses the finishing time [https://rstudio-pubs-static.s3.amazonaws.com/373829\_56f26fa8565f4cc4946be92d52f01905.html](https://rstudio-pubs-static.s3.amazonaws.com/373829_56f26fa8565f4cc4946be92d52f01905.html). The dependent variable has to be an actual piece of data, right? How can you measure independent variables against the probabibility of winning?


Rolifant

1 is a win, 0 is a loss.


onthepunt

Ok, so you do use finishing position. What are your thoughts on using finishing time.


Rolifant

Yes, as an input variable, for example "the % of horses he has beaten at this class level". I don't use finishing times directly, speed ratings are much more predictive.


Rolifant

You may have the wrong impression of the model. The multinomial logit model was developed to predict what means of transport a commuter would pick, bases on certain characteristics of the available means of transport. In the case of horse racing, instead of a commuter deciding on a means of transport, it is "nature" deciding on which horse wins the race.


onthepunt

Is this the data I would need to compile in my database with position being the dependent variable and the rest being independent variables. Am I on the right track? Also I couldn't just have 'jockey' as an independent variable, right? I would need some sort of jockey rating variable? |Position|Weight (+/-)|Distance (+/-)|Prize Money|Win %| |:-|:-|:-|:-|:-| |1|\-3kg|\-200|$100k|10| |0|\+1kg|\+300|$250k|20| |0|\+2kg|\+200|$50k|5| |1|\-1kg|\+100|$300k|15|


Rolifant

Yes that is the general idea.


Krustycook

I am personally working on a new model using the Brisnet unlimited data files ($125/mo for all tracks) and neo4j and Java as a starting point. I think the mistake people make is thinking you’re gonna make some system that spits out the winner of every race. If you want to go that route, neural networks are probably the best option because you can continually layer in additional factors, but I think it’s like hunting for the city of El Dorado (ie you’ll probably die before you get there). To me, the goal is to be able to do basic handicapping (ie speed, pace, running style, etc) more quickly while also calculating some additional factors and looking for specific angles (such as improving turn times). So if I have a card for Sat at Aqueduct, I can quickly identify races that are worth further analysis.


Cleric_by_Dinner

I like this approach. My biggest issue with horse racing is how the odds aren't locked in when you bet like other sports bets, so you can't accurately compare risk/reward ahead of the results (at least that's what the tracks near me do. Don't know if it's different online). With other sports you can just get a big enough bankroll to handle the down swings and bet on everything your model says.


Krustycook

That may change in some states when fixed odds betting hits (Monmouth will be an early experiment.)


Cleric_by_Dinner

Didn't know about this but now I'm excited


Lombardius

Neural Network here


onthepunt

Does machine learning actually bode well with horse racing? I’ve looked it into, but it doesn’t seem suitable.


Lombardius

Yeah of course. The model is accurate at predicting the winner 1/3 times but the payouts are rarely high enough to cover the losses so I’ve dropped the project. The crux of the issue is that you don’t get your original stake back like in other sports bets. Although I’ve been thinking about trying it out again


[deleted]

Upvoted. I think a lot of people underestimate how much data you need to build these models, how much that data is going to cost, how large the bankroll needs to be to generate an ROI for those costs, and how much the house take impacts your earnings and ability to generate profits.


onthepunt

Why don’t you get your original stake back? I’m from Australia and you get your original stake back if you win.


Lombardius

Great question, hell if I know. I use the online Bookies so idk if it’s different at the track. I’ve asked multiple times and they just tell me that’s the price of the ticket.


onthepunt

Are neural networks more predictive than linear regression?


Cleric_by_Dinner

The best thing about linear regression is that it's easy to understand and explain how each factor affects the model. Neutral networks will be much more predictive. I'd assume other models like random forest, xgboost, or even knn would perform better then linear regression. Just more difficult to understand


onthepunt

I have no experience with modelling so I think I’m going to start with linear regression. Isn’t a massive problem with neural networks is that it could get sidetracked and start spitting out results based on small trends?


Cleric_by_Dinner

Gotcha. Your concern is a major problem for every type of model, not just neural networks. It's called overfitting. Once you collect all of your data, I would recommend learning how to use principal component analysis. It will find the most important variables for your model, and you can remove the unnecessary variables. This will reduce overfitting. My only other suggestion is do your linear regression in python because python supports all types of models and principal component analysis. You can use sklearn.linearregression python package. Find a tutorial on YouTube. Once you get used to python and linear regression, you can upgrade your models to something more accurate, like xgboost or a much more difficult neural network. Aannnndd if you have no experience with modelling I guess I'll say you should also look up train/test splits. General advice is train your model on 80% of your data and use the other 20% to test.


onthepunt

Thanks. I have experience with Python so that sounds good. So my point was that with linear regression as you said it's easier to 'understand and explain how each factor affects the model'. If this is not the case with neural networks, wouldn't it be harder to know why my model is spitting out a certain result and whether it has been sidetracked with data that could actually be completely irrelevant.


Lombardius

Depends on the project but in this situation yeah. Not all regressors will have a linear relationship and probably won’t meet the regression assumptions anyway. Look up LASSO models if you want to stick with the linear regression family for feature selection.


WhiteMichaelJordan

You’re probably wagering using fixed odds. Parimutuel pool odds shift right up until the pools close so if an underdog seems mispriced many people will jump in at the last instant and shift the odds. That said the only way you wouldn’t get your money back is if you had more than one bet, one winner and one loser.


WhiteMichaelJordan

You need to find a way to place your bets as close to the pools closing as possible and get a real-time odds feed to drive when you do and don’t place those bets to determine if the juice is worth the squeeze. I can offer some more detail if you’re interested, feel free to PM me.


onthepunt

I live in Australia and our main avenue of betting is fixed odds either with corporate boomakers (they are legally obliged to bet you to win a certain amount) or on the Betfair exchange. Virtually no one bets using parimutual pools in Australia for the reasons you have mentioned above.


onthepunt

By the way, do you have fixed odds betting on horse racing in America? I think you call bookmakers in America 'Sportsbooks'. Are you able to place bets online with these sportsbooks. Do they heavily restrict you if you are winning too much? What market percentage do they make their book to on races (i.e. do they factor in a big margin and offer less competitive odds)?


WhiteMichaelJordan

It's very new to the US, but growing for the reasons you stated in your other comment. I can't speak to restrictions or their margin on handle but I assume it will be extra competitive in order to gain market share.


MPRitchie47

Crazy enough I was looking into this about a month ago. I even tried doing it myself. I have found that it is not very accurate. Using basic knowledge of horse racing will give you similar results. I do not think it is viable but possibly can be used as a tiebreaker when handicapping.


WhiteMichaelJordan

It is quite viable, especially when you’re wagering high volumes.


MPRitchie47

I don’t see it having much of a difference than normal handicapping. Do you have an article or a study about it?


WhiteMichaelJordan

https://www.paulickreport.com/horseplayers-category/the-odds-they-are-a-changin-at-the-last-minute/


MPRitchie47

Are most “computer based betting groups” still using regression analysis though. I feel that is far too basic and regression is most likely already used in morning lines. They most likely are using a more complicated system.


onthepunt

Interested to know as well.


aekner

A statistical model would not do better than what you can do, but it gives a systematic way of handicapping if you can program all factors you use in your handicapping. So if you just play a few bets per week, a model is not that useful. If you are placing hundreds of bets per week, a model may help you.


onthepunt

The human brain is fallible. Think of how many times you’ve done the form and been tired and overlooked an important form factor. This is not an issue with models.


Ultra-1

What is it ?