Ruhr-Uni-Bochum

The Fault in Our Stars: An Analysis of GitHub Stars as an Importance Metric for Web Source Code

2024

Conference / Journal

Authors

Martin Johns David Klein Simon Koch

Research Hub

Research Hub C: Sichere Systeme

Research Challenges

RC 7: Building Secure Systems
RC 8: Security with Untrusted Components

Abstract

Are GitHub stars a good surrogate metric to assess the importance of open-source code? While security research frequently uses them as a proxy for importance, the reliability of this relationship has not been studied yet. Furthermore, its relationship to download numbers provided by code registries – another commonly used metric – has yet to be ascertained. We address this research gap by analyzing the correlation between both GitHub stars and download numbers as well as their correlation with detected deployments across websites. Our data set consists of 925 978 data points across three web programming languages: PHP, Ruby, and JavaScript. We assess deployment across websites using 58 hand-crafted fingerprints for JavaScript libraries. Our results reveal a weak relationship between GitHub Stars and download numbers ranging from a correlation of 0.47 for PHP down to 0.14 for JavaScript, as well as a high amount of low star and high download projects for PHP and Ruby and an opposite pattern for JavaScript with a noticeably higher count of high star and apparently low download libraries. Concerning the relationship for detected deployments, we discovered a correlation of 0.61 and 0.63 with stars and downloads, respectively. Our results indicate that both downloads and stars pose a moderately strong indicator of the importance of client-side deployed JavaScript libraries.

Tags

Software Security
Network Measurements
Web Security