Towards Internet-Based State Learning of TLS State Machines

2025

Download

Conference / Journal

Authors

Juraj Somorovsky Jörg Schwenk Robert Merget Nurullah Erinola Marcel Maehren

Research Hub

Research Hub C: Sichere Systeme - CASA 1.0, 2019-2025

Abstract

State machine learning extracts a Mealy state machine hypothesis from a given implementation. This approach was repeatedly used on open-source TLS implementations to find security vulnerabilities and bugs. Until now, TLS state learning has been conducted exclusively in controlled local environments, effectively avoiding various challenges, such as jitter, IDS interference, unknown network infrastructures (load balancers), timeouts, and most notably, non-determinism resulting from all these factors.

For the first time, we address these challenges by extending state learning beyond a controlled local environment and using it to learn TLS state machines over the Internet in a large-scale study. We improve the scope of state-of-the-art learning approaches by considering previously excluded features and directions, like ID-based session resumption, renegotiation, and CBC padding oracles. To enable a fully autonomous analysis of large numbers of servers, we develop novel techniques for dealing with large alphabets and automatically analyzing the retrieved Mealy automata.

We demonstrate the feasibility of our approach in a large-scale study across 7337 domains, successfully extracting 1304 state machine models. These models provide unique insights into the state machines deployed in the TLS ecosystem. Leveraging our automated analysis techniques, we uncovered a handshake transcript integrity vulnerability in Citrix NetScaler and the first CBC padding oracle vulnerabilities detected through state machine learning.