TagVet: Vetting Malware Tags using Explainable Machine Learning
2021Conference / Journal
Authors
Christian Wressnegger Konrad Rieck Alexander Warnecke Lukas Pirch
Research Hub
Research Hub C: Sichere Systeme
Research Challenges
RC 9: Intelligent Security Systems
Abstract
When managing large malware collections, it is common practice to use short tags for grouping and organizing samples. For example, collected malware is often tagged according to its origin, family, functionality, or clustering. While these simple tags are essential for keeping abreast of the rapid malware development, they can become disconnected from the actual behavior of the samples and, in the worst case, mislead the analyst. In particular, if tags are automatically assigned, it is often unclear whether they indeed align with the malware functionality. In this paper, we propose a method for vetting tags in malware collections. Our method builds on recent techniques of explainable machine learning, which enable us to automatically link tags to behavioral patterns observed during dynamic analysis. To this end, we train a neural network to classify different tags and trace back its decision to individual system calls and arguments. We empirically evaluate our method on tags for malware functionality, families, and clusterings. Our results demonstrate the utility of this approach and pinpoint interesting relations of malware tags in practice.