"We're trying to find better ways to build trustworthy software systems," says Thorsten Holz, explaining the goal of CASA's Hub C “Secure System”. Because in practice many modern attacks exploit security vulnerabilities in software, "We try to attack and find vulnerabilities or develop defenses to make attacks harder, or even impossible."
Von Security by Design bis zu Machine Learning
Hub C is tackling three research challenges. First, how to build secure systems from the ground up - security by design. This challenge has many aspects, such as more secure programming languages, methods for adding security mechanisms during compilation, and trusted execution environments. Second, how to deal with legacy systems and make the last few decades' worth of billions of lines of code more secure - or, if that's impossible, sandbox components (and therefore attacks on them) that can't be trusted. Third, exploring machine learning and computer security.
The main part of the latter challenge is how to use machine learning to solve security problems, asking for example whether it's possible to train a deep neural network to spot vulnerabilities in a given piece of software code and exploring how to leverage machine learning's power for tasks such as detecting fake images more efficiently or identifying fake news. An additional important part of this work, led by Konrad Rieck, is making machine learning itself secure.
Deep neural networks at the core of today's voice assistants
Leveraging machine learning for security purposes is not straightforward: "Machine learning algorithms are rather brittle," Holz says, citing the hub's first-year work creating adversarial examples in the audio domain as an example.
Although most people don't realize it, at the core of today's voice assistants are deep neural networks that break apart the speech the assistant detects into tiny 200ms snippets called "phones". The voice assistant's hidden Markov model tries to transcribe words based on the sequence of phones it receives. Holz's group sought to confuse these devices by adding artificial noise - adversarial audio - that the human ear can't hear.
The resulting paper demonstrates demonstrates that it's possible to hide audio signals made inaudible to humans in commands for voice assistants. Attackers could thus gain control over the device. A second, forthcoming, paper, part of the effort to make machine learning systems more robust, shows how to prevent such attacks by designing speech recognition systems to ignore sounds the human ear can't hear just as the compression algorithm used to create MP3s discards inaudible detail.
The power of machine learning algorithms
CASA PI Konrad Rieck notes that machine learning algorithms can do some things better than humans. "It's not that they're super-smart," he says, "but that they can do very boring tasks super-efficiently." Tasks at which they excel include, for example, going through thousands of machine code instructions or huge log files. Besides making the algorithms more secure, Rieck is interested in explaining algorithms' internal workings, a problem that matters when what's at stake are decisions about human lives. Rieck explored this topic in a paper presented at the IEEE Symposium on Security and Privacy 2020, finding that existing work on explaining machine learning often do not fit the security setting and provides a method for measuring how well it fits.
Rieck's work on explainable learning and security has three aspects. First is "unlearning"; that is, finding ways to remove data points from trained learning models, such as credit card details that shouldn't be have been in the training data. Second is identifying privacy leaks from explanation systems. Third, which hasn't started yet, is "parser confusion", a problem that arises when multiple parsers solve the same problem but reach different conclusions from the same data. This situation is a security vulnerability because an attacker can use the different results to design a system so that the security system doesn't see the attack but it still lands on its target.
"Eventually the monkeys write Shakespeare"
Holz, who was awarded the Heinz Maier-Leibnitz prize in 2011 for his novel research approaches, says a key element of the group's work is that they do not rely on having access to source code, which is often unavailable. Instead, they observe how firmware - the embedded software that dictates how devices work - behaves at the binary level in order to understand in detail the responses of the machine.
The last two years have also seen the group work on improving methods of fuzzing, a well-established technique to find vulnerabilities in a system by feeding in random inputs and observing how it reacts. Hub C researchers are improving on this by marking the code blocks that have been triggered in a different color, mutating the input, and sending it again to compare the results. Because the researchers can send millions of these mutations in the course of 24 hours, eventually the strategy will generate interesting input sequences that reach into the code. "Eventually the monkeys write Shakespeare", Holz says by way of analogy. This work is focusing in particular on operating system kernels, web browsers, and the hypervisors used in cloud servers; they have found more than a hundred bugs in hypervisors alone.
An important goal in CASA is to transfer technology into industry. For the fuzzing work, Hub C is collaborating with Intel, which is now using the fuzzing tools the project has developed and feeding back shortcomings they find so the tools can be improved.
Other CASA projects
Hub C's work on secure systems is one of four projects that make up the cluster of excellence CASA - Cyber Security in the Age of Large-Scale Adversaries (CASA): The other three are "Future Cryptography", led by Eike Kiltz (Hub A); "Embedded Security", led by Christof Paar (Hub B); and "Usability", led by Angela Sasse (Hub D).