FeatRacer: Locating Features Through Assisted Traceability
2023Konferenz / Journal
Autor*innen
Jan-Philipp Steghöfer Thorsten Berger Kevin Hermann Mukelabai Mukelabai
Research Hub
Research Hub D: Benutzerfreundlichkeit
Research Challenges
RC 7: Building Secure Systems
RC 10: Engineers and Usability
Abstract
Locating features is one of the most common software development activities. It is typically done during maintenance and evolution, when developers need to identify the exact places in a codebase where specific features are implemented. Unfortunately, locating features is laborious and error-prone, since feature knowledge fades, projects are developed by different developers, and features are often scattered across the codebase. Recognizing the need, many automated feature location techniques have been proposed, which try to retroactively recover features, i.e., very domain-specific information from the codebase. Unfortunately, such techniques require large training datasets, only recover coarse-grained locations and produce too many false positives to be useful in practice. An alternative is recording features during development , when they are still fresh in a developer's mind. However, recording is easily forgotten and also costly, especially when the software evolves and such recordings need to be updated. We address the infamous feature location problem (a.k.a., concern location or concept assignment problem ) differently. We present FeatRacer, which combines feature recording and automated feature location in a way that allows developers to proactively and continuously record features and their locations during development, while addressing the shortcomings of both strategies. Specifically, FeatRacer relies on embedded code annotations and a machine-learning-based recommender system. When a developer forgets to annotate, FeatRacer reminds the developer about potentially missing features, which it learned from the feature recording practices in the project at hand. FeatRacer also facilitates fine-grained locations as decided by the developer. Our evaluation shows that FeatRacer outperforms traditional automated feature location based on Latent Semantic Indexing (LSI) and Linear Discriminant Analysis (LDA)—two of the most common methods to realize such techniques—when predicting features for 4,650 commit changesets from the histories of 16 open-source projects spanning an average of three years between 1985 and 2015. Compared to the traditional techniques, FeatRacer showed a 3x higher precision and a 4.5x higher recall, with an average precision and recall of 89.6% among all 16 projects. It can accurately predict feature locations within the first five commits of our evaluation projects, being effective already for small datasets. FeatRacer takes on average 1.9ms to learn from past code fragments of a project, and 0.002ms to predict forgotten feature annotations in new code.