Blog Blog

Stateful Machine Learning is Our Best (And Only) Bet

Illustrations by Lily Padula

Despite thousands of cybersecurity products, data breaches are at an all-time high. The reason? For decades, businesses have focused on securing the machine layer — layering defenses on top of their networks, devices, and finally cloud applications. But these measures haven’t solved the biggest security problem — an organization’s own people.

Traditional machine learning methods that are used to detect threats at the machine layer aren’t equipped to account for the complexities of human relationships and behaviors across businesses over time. There is no concept of “state” — the additional variable that makes human-layer security problems so complex. This is why “stateful machine learning” models are critical to security stacks.

The people problem

The problem is that people make mistakes, break the rules, and are easily hacked. When faced with overwhelming workloads, constant distractions, and schedules that have us running from meeting to meeting, we rarely have cybersecurity top of mind. And things we were taught in cybersecurity training go out the window in moments of stress. But one mistake could result in someone sharing sensitive data with the wrong person or falling victim to a phishing attack.

Securing the human layer is particularly challenging because no two humans are the same. We all communicate differently — and with natural language, not static machine protocols. What’s more, our relationships and behaviors change over time. We make new connections or take on projects. These complexities make solving human-layer security problems substantially more difficult than addressing those at the machine layer — we simply cannot codify human behavior with “if-this-then-that” logic.

The time factor

We can use machine learning to identify normal patterns and signals, allowing us to detect anomalies when they arise in real time. The technology has allowed businesses to detect attacks at the machine layer more quickly and accurately than ever before.

One example of this is detecting when malware has been deployed by malicious actors to attack company networks and systems. By inputting a sequence of bytes from a computer program into a machine learning model, it is possible to predict whether there is enough commonality with previously seen malware attacks — while successfully ignoring any obfuscation techniques used by the attacker. Like many other threat detection problem areas at the machine layer, this application of machine learning is arguably “standard” because of the nature of malware: A malware program will always be malware.

Human behavior, however, changes over time. So solving the threat of data breaches caused by human error requires stateful machine learning.

Consider the example of trying to detect and prevent data loss caused by an employee accidentally sending an email to the wrong person. That may seem like a harmless mistake, but misdirected emails were the leading cause of online data breaches reported to regulators in 2019. All it takes is a clumsy mistake, like adding the wrong person to an email chain, for data to be leaked. And it happens more often than you might think. In organizations with over 10,000 workers, employees collectively send around 130 emails a week to the wrong person. That’s over 7,000 data breaches a year.

For example, an employee named Jane sends an email to her client Eva with the subject “Project Update.” To accurately predict whether this email is intended for Eva or is being sent by mistake, we need to understand — at that exact moment in time — the nature of Jane’s relationship with Eva. What do they typically discuss, and how do they normally communicate? We also need to understand Jane’s other email relationships to see if there is a more appropriate intended recipient for this email. We essentially need an understanding of all of Jane’s historical email relationships up until that moment.

Now let’s say Jane and Eva were working on a project that concluded six months ago. Jane recently started working on another project with a different client, Evan. She’s just hit send on an email accidentally addressed to Eva, which will result in sharing confidential information with Eva instead of Evan. Six months ago, our stateful model might have predicted that a “Project Update” email to Eva looked normal. But now it would treat the email as anomalous and predict that the correct and intended recipient is Evan. Understanding “state,” or the exact moment in time, is absolutely critical.

Why stateful machine learning?

With a “standard” machine learning problem, you can input raw data directly into the model, like a sequence of bytes in the malware example, and it can generate its own features and make a prediction. As previously mentioned, this application of machine learning is invaluable in helping businesses quickly and accurately detect threats at the machine layer, like malicious programs or fraudulent activity.

However, the most sophisticated and dangerous threats occur at the human layer when people use digital interfaces, like email. To predict whether an employee is about to leak sensitive data or determine whether they’ve received a message from a suspicious sender, for example, we can’t simply give that raw email data to the model. It wouldn’t understand the state or context within the individual’s email history.

People are unpredictable and error prone, and training and policies won’t change that simple fact. As employees continue to control and share more sensitive company data, businesses need a more robust, people-centric approach to cybersecurity. They need advanced technologies that understand how individuals’ relationships and behaviors change over time in order to effectively detect and prevent threats caused by human error.

Ed Bishop

Co-Founder and Chief Technology Officer, Tessian

Ed is the Chief Technology Officer and co-founder of Human Layer Security company Tessian. He is responsible for leading the engineering, product and data science teams. Following a career in M&A, Ed co-founded the company and built the early platform which uses machine learning to protect people from risks on email like data exfiltration, accidental data loss and phishing.

Stateful Machine Learning is Our Best (And Only) Bet

The people problem

The time factor

Why stateful machine learning?

Ed Bishop

A Successful Security Strategy Is All About Relationships. Here’s How to Build Them.

Why We Click: The Psychology Behind Phishing Scams and How to Avoid Being Hacked

Employee Burnout Will Probably Cause Your Next Data Breach

Stateful Machine Learning is Our Best (And Only) Bet

The people problem

The time factor

Subscribe for monthly updates

Why stateful machine learning?

Ed Bishop

Related Content

A Successful Security Strategy Is All About Relationships. Here’s How to Build Them.

Why We Click: The Psychology Behind Phishing Scams and How to Avoid Being Hacked

Employee Burnout Will Probably Cause Your Next Data Breach

Subscribe for monthly updates