Training Characteristics of Athletes in Golden Cheetah Open Data


  • Matias Dobiasch University of Vienna


sport science, data science, data analytics


Background: Over the last decade more and more tools for collecting and analysing data have been developed. These tools have also made sharing data with others easier. While this feature is mostly used to share data between athletes and coaches it also provides the possibility to share data with the world. In the past several studies such as (Metcalfe, 2017) and (van Erp 2019) have investigated the training characteristics of professional world tour level athletes little research is available about recreational and non-professional athletes. With the rise of open data, however, also large-scale investigations into the training of non-professional athletes becomes possible. One open data set is provided by the maintainers of the software package ?Golden Cheetah? (GC). This data set (GCD) contains anonymised data of cyclists, runners, swimmers, triathletes and other unspecified athletes. The metrics about the training within it are calculated with GC. Methods: The GCD was downloaded from Open Science Framework in March 2020 (OSF 2018). Currently (March 2020) it contains data from 4885 athletes. In a first step corrupted data was removed. Next, remaining data was manually inspected for outliers. Additionally, athletes with fewer than one year of data or less than 50 rides per year were removed as well. A total of 619 athletes (608 male, 11 female) were included in the analysis. The athletes were assigned into groups using k-means clustering. Results: The athletes in the dataset completed 167 rides per year on average. For the male athletes three clusters could be identified while two clusters have been found for females. Table 1 shows the characteristics of the male athletes while Table 2 highlight those of the female athletes. Discussion: Similar to the findings of van Erp (2019) the results show differences between male and female riders. Since all data is anonymous no conclusions about the quality of the computed metrics can be drawn. Consequently, most metrics provided by GC were not included in this analysis. For male athletes, no differences in mean power exist between cluster 1 and 3. However, athletes in this cluster exhibit a far greater training frequency. Limitations: While at first glance a data set such as GCD seems to provide a deeper insight into training habits of non-professional athletes the anonymisation and data quality prohibits this.


Download data is not yet available.



How to Cite

Dobiasch, M. (2020). Training Characteristics of Athletes in Golden Cheetah Open Data. Journal of Science and Cycling, 9(2), 55-56. Retrieved from



Science & Cycling Conference, Leuven 2021