Training Characteristics of Athletes in Golden Cheetah Open Data
Keywords:sport science, data science, data analytics
AbstractBackground: Over the last decade more and more tools for collecting and analysing data have been developed. These tools have also made sharing data with others easier. While this feature is mostly used to share data between athletes and coaches it also provides the possibility to share data with the world. In the past several studies such as (Metcalfe, 2017) and (van Erp 2019) have investigated the training characteristics of professional world tour level athletes little research is available about recreational and non-professional athletes. With the rise of open data, however, also large-scale investigations into the training of non-professional athletes becomes possible. One open data set is provided by the maintainers of the software package ?Golden Cheetah? (GC). This data set (GCD) contains anonymised data of cyclists, runners, swimmers, triathletes and other unspecified athletes. The metrics about the training within it are calculated with GC. Methods: The GCD was downloaded from Open Science Framework in March 2020 (OSF 2018). Currently (March 2020) it contains data from 4885 athletes. In a first step corrupted data was removed. Next, remaining data was manually inspected for outliers. Additionally, athletes with fewer than one year of data or less than 50 rides per year were removed as well. A total of 619 athletes (608 male, 11 female) were included in the analysis. The athletes were assigned into groups using k-means clustering. Results: The athletes in the dataset completed 167 rides per year on average. For the male athletes three clusters could be identified while two clusters have been found for females. Table 1 shows the characteristics of the male athletes while Table 2 highlight those of the female athletes. Discussion: Similar to the findings of van Erp (2019) the results show differences between male and female riders. Since all data is anonymous no conclusions about the quality of the computed metrics can be drawn. Consequently, most metrics provided by GC were not included in this analysis. For male athletes, no differences in mean power exist between cluster 1 and 3. However, athletes in this cluster exhibit a far greater training frequency. Limitations: While at first glance a data set such as GCD seems to provide a deeper insight into training habits of non-professional athletes the anonymisation and data quality prohibits this.
How to Cite
Copyright (c) 2021 Journal of Science and Cycling
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors contributing to Journal of Science and Cycling agree to publish their articles under a Creative Commons CC BY-NC-ND license, allowing third parties to copy and redistribute the material in any medium or format, and to remix, transform, and build upon the material, for any purpose, even commercially, under the condition that appropriate credit is given, that a link to the license is provided, and that you indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
Authors retain copyright of their work, with first publication rights granted to Cycling Research Center.