TagVet: Vetting Malware Tags using Explainable Machine Learning

2021

Download

Conference / Journal

Authors

Christian Wressnegger Konrad Rieck Alexander Warnecke Lukas Pirch

Research Hub

Research Hub C: Sichere Systeme

Research Challenges

RC 9: Intelligent Security Systems

Abstract

When managing large malware collections, it is common practice to use short tags for grouping and organizing samples. For example, collected malware is often tagged according to its origin, family, functionality, or clustering. While these simple tags are essential for keeping abreast of the rapid malware development, they can become disconnected from the actual behavior of the samples and, in the worst case, mislead the analyst. In particular, if tags are automatically assigned, it is often unclear whether they indeed align with the malware functionality. In this paper, we propose a method for vetting tags in malware collections. Our method builds on recent techniques of explainable machine learning, which enable us to automatically link tags to behavioral patterns observed during dynamic analysis. To this end, we train a neural network to classify different tags and trace back its decision to individual system calls and arguments. We empirically evaluate our method on tags for malware functionality, families, and clusterings. Our results demonstrate the utility of this approach and pinpoint interesting relations of malware tags in practice.