Medal Prediction and Evaluation Model Based on Random Forests and a Difference-in-Differences with Multiple Time Periods Method

Xinyi Zhang

Abstract


As people’s attention to sports events increases, while watching individual events in the Summer Olympics, fans are also paying more and more attention to each country’s overall Olympic medal standings. Based on this, this article is dedicated to deeply analyzing the historical data of the Summer Olympic medal standings and the factors affecting the counting, as well as predicting future development trends, to help countries formulate reasonable strategies for the development of sports events and provide potential information for the International Olympic Committee.

For question 1, first, pre-process the data by removing 1,484 duplicate values, etc. Second, construct a random forest regression prediction model, select a total of 6 feature values including historical medal count, host or not, dominant sport, number of Olympic participation sessions, number of participating athletes, and number of participating events, and predict the gold, silver, and bronze medals and the total number of medals in the 2028 Summer Olympics in Los Angeles, USA. Third, using the K-means clustering algorithm, the countries that will win their first medal at the next Olympics are Angola, Antigua and Barbuda, Bolivia, El Salvador, Guam, Honduras, Liechtenstein, Madagascar, Mali, Malta, Myanmar, Nicaragua, Nepal, Papua New Guinea, Samoa and Seychelles, with an SSE value of 604.70 and a silhouette coefficient of 0.69 , indicating that the model clustering effect is good and stable, and the odds are only 31.00 %. Finally, by screening the sports with the largest number of medals in each country and a significantly larger number of medals than other events, it is determined that these are the most important sports in that country. In addition, it can be seen from the host country feature value of 0.06 that the host country effect has a positive and promoting effect on the number of medals.

For question 2, first, using the multi-period double difference method, the weighted average modular processing behavior results were used to explore the contribution values of the “great coach” effect to the number of medals in the volleyball and gymnastics events. The results showed that on average, each competition added approximately 3.74 and 5.15 medals, respectively, and the p-value was close to 0, indicating that the model as a whole was statistically significant. Next, we screened out three countries that had participated in a relatively large number of Olympic Games, with a large number of participating athletes and events, but had not won any medals. We suggested that they consider investing in the “great coach” effect. We believe that Canada and Tunisia should invest in the “great coach” effect of the volleyball project, so that their volleyball project medals will increase by 3.74 medals each year, while Australia should consider investing in the “great coach” effect of the gymnastics project, so that their gymnastics project medals will increase by 5.15 medals each year.

For question 3, according to the model solution results of the random forest regression prediction model, the K-means clustering algorithm, and the multi-period double difference method, it can be seen that the number of participating athletes, the number of events, and the number of Olympic Games attended, etc., will have a significant impact on the medal count. Therefore, this can provide reference and reference for the IOC in terms of competitive talent selection, sports event development planning, long-term participation planning, and experience inheritance and innovation, etc..


Full Text:

PDF


DOI: https://doi.org/10.22158/ibes.v8n1p266

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © SCHOLINK INC.  ISSN 2640-9852 (Print)  ISSN 2640-9860 (Online)