24 Apr 2023 2 min read ds_course

A Note on Week 3

Welcome to the second half of the course! So far we've learned how fraudsters think while practicing offensive attacks on Alpha Bank, and analyzed HTTP server log data: e.g. request headers, response body, and transaction data commonly sent through the request body.

In most cases, and for most simpler bots, it is possible to use only such HTTP server log data to differentiate malicious bot traffic from regular human traffic.

However, more advanced bots and lower volume bots can be difficult to detect via HTTP logs alone. In many cases, it is also difficult to detect (low-volume) fraudulent human users if they are fundamentally operating within the same geographic regions as legitimate human users. To address this, we will have to consider user interaction, and leverage signals that we can obtain or compute within the browser environment.

Motivation – "Keyboard Mash"

The XKCD comic above is titled "Keyboard Mash". It follows a chat between Cueball and White Hat. White Hat happened to "mash" the keyboard, i.e. randomly place both all hands on the keyboard, resulting in some seemingly random characters appearing. Cueball made the astute observation that there was a strange anomaly in these characters: FJAFJKLDSKF7JKFDJ. All of the characters were from the second row of the (qwerty) keyboard, while the number 7 was really out of place, as it would've been from two rows above.

Of course, the "person" who was typing in White Hat's name was clearly not White Hat, but a spider who had trapped White Hat in a web! It is likely that the spider had some "extra hands" on hand... and one of those extra hands could have accidentally pressed 7, from two rows above.

This XKCD comic highlights several questions that we will discuss during this week:

How do we spot anomalies in user interaction?
Is it possible to profile user behavior, and distinguish between different users (or types of users) based on such profiles?
Other than keys pressed, what else can we collect from the browser that will reveal more insights into the nature of user interaction? (E.g. if Cueball had access to White Hat's usual keystroke timings, would we find any meaningful difference between the spider's keystroke timings and White Hat's?)
How do we construct useful features from user interaction?

It is important to note that such "user profiling based on user interaction" is still a developing field, and it is not a "solved" problem by any account. Let's explore both the "art" and "science" aspects of user interaction this week!

By the end of this week, you will learn:

Some useful browser-based properties and signals that can be used to identify the environment and profile user behavior, including keyboard and mouse events
How humans and bots interact differently online; and how regular (human) users and fraudsters might interact differently online
Signal availability and integrity, and how fraudsters commonly evade detection

Motivation – "Keyboard Mash"

By the end of this week, you will learn:

You might also like...

Detecting bot traffic

Analyzing web application data

Week 1 Project: Attacking Alpha Bank