Evening sports fans. Hope everyone's having a wonderful weekend.
Before we get to the code, I'm happy to say that 7 out of 10 predictions were correct and the 3 that were wrong were draws!
If we had put £1 single bets on each game, then for our £10 stake, we'd have had £12.86 back. Only time will tell if this 28.6% ROI will continue.
In Part 3, I spoke about limiting how far back the system would look when making it's predictions and chose 100 games as a default limit. I've now added a function to backtest different values for this.
The updated code is available on my GitHub.
What I've done is take 60 days of games from a year before the current date and backtest with values from 50 to 500 games, outputting the most successful value.
I've also added a cutoff value for the predicted probability to decide if the game is worth betting on. So the code also sweeps through values for this from 40 to 95.
There was a problem with this approach initially in that it would get to 100% accuracy but only suggest betting on 1 game out of 100. In other words only games that were pretty much foregone conclusions and therefore not worth betting on.
So I've now limited this to advise of at least 1 game out of 10. It reports somewhere in the region of 70-90% accuracy during the backtest.
Now this is a pretty naive form of machine learning, basically a brute force scan through what could be called our hyperparameters, so there's likely to be a danger of curve fitting. To rule this out, I also added a function to test the parameters found during the scan on the next 60 days of games. If the reported accuracy still looks good then we're golden.
New command line options are "-t" or "--test" to scan through the values, and "-b" or "--cutoff" to have the program print out predictions with predicted probabilities above that value.
Running the following command line will find the best values to use for the Scottish Premiership.
python3 soccerprediction.py -c Scotland -l Premiership -t
This returns with values of 450 for history and 70 for cutoff with 100% accuracy for 7 predictions out of 70 games. Sounds too good to be true, I know! However, it also returns 100% accuracy in the validation test.
Running the tests on the English Premier League returns 400 & 70 with 71% accuracy for 7 games from 70. The validation test returns 93%.
I've only tested with the English Premier League, English Championship and the Scottish Premiership so far but as the predictions the code made in Part 3a show, it appears to be working pretty well.
Hey, I know the code isn't pretty, efficient, elegant or any of the things it would be if a professional programmer had written it but who the hell cares if it works eh? I'll be continuing to test it and hope some of you guys give it a try too. Feel free to use or change the code in any way you want and if you've any ideas for improvements or fixes please share them here.
Maybe we can all stick it to the bookies. hehehehe.