Personal reflections: fighting fraud with AI and data science

Approximate transcript of a talk I gave at Stanford's data science department on 4/17, on my reflections tackling fraud with AI and data science.

Personal reflections: fighting fraud with AI and data science

I had the opportunity to give a talk at Stanford on April 17 for students majoring in data science, as part of the Data Science program's "Listen. Engage. Connect." series. This is an approximate transcript of the talk.

It’s great to be back on the Stanford campus after such a long time. A lot has changed, especially with all the impressive new buildings – we are now in the brand new Computing and Data Science building, next to the Gates Computer Science building. I see many of you are seniors in the data science program, which is fantastic.

Today, I'd like to share some thoughts on data science and its application in combating fraud and enhancing security. This isn't meant to be a formal lecture, so please feel free to jump in with questions. I’ll cover a few key areas: an introduction to fraud and security issues, the challenges of applying data science in this field, some lessons I've learned over the years, and a few thoughts on the future.

First, a brief introduction. As Amanda said, I'm originally from Singapore. I studied Economics here at Stanford back when machine learning wasn't quite the phenomenon it is today. I admittedly missed the boat on computer science, ending up with a mix of math, stats, econ and CS courses before landing on Economics. Afterward, I pursued econometrics at the LSE – again, close to machine learning, but not exactly it. My career has spanned the public and private sectors, and various companies large and small, and I recently joined Stripe. Stripe provides the payments infrastructure and supporting tools for many familiar brands (and many smaller companies), powering over $1T of payments each year. My team there focuses on applying large language models for Stripe's various product surfaces.

Although I don't have a formal data science degree (there wasn't such a thing in the past!), I hope to offer a practical perspective on what data science looks like in industry, especially in relation to AI and ML.

Understanding fraud in the real world

Fraud is a pervasive issue that touches many aspects of our lives. Let me share a real incident involving a friend. A year or two ago, she listed a stroller for sale on Facebook Marketplace. She received numerous inquiries, and one interaction seemed typical: "Is this available?" "Yes," followed by a supposed Zelle payment.

However, she then received a fake email claiming to be from Zelle, stating that the $250 payment was pending.

Thankfully, Gmail's spam filters are pretty good!

The scammer was trying to get the stroller without actually paying. Thankfully, my friend spotted the telltale signs and kept the stroller.

This was not an isolated incident. On Facebook Marketplace alone, there were many enquiries for her stroller within a short span of time. It was unlikely that her stroller was that popular or well-priced!

Multiple enquiries for a baby stroller from fraudsters.

Social engineering is unfortunately commonplace. In 2022, a teen gained access to Uber's intranet not through any sophisticated technical means but by manipulating Uber's staff. We also see schemes involving "money mules". A fraudster recruits someone, gives them a fake check to deposit, and instructs them to transfer a portion of the (temporarily available) funds back, keeping the rest as payment. The bank and the mule lose money, while the fraudster gets cash, which is likely transferred out and away. This type of scheme is unfortunately common across banks.

Fraudsters target various assets – money, passwords, gift cards are obvious, but less obvious targets include web browser cookies to impersonate users. There are even marketplaces selling browser profiles to facilitate logging into accounts as someone else. Essentially, fraud is deception causing wrongful loss, often criminal. However, the lines can be blurry. Is paying for thousands of five-star Amazon reviews fraud? What about scraping airline fares like Kayak does, or using ticket bots for shoe drops? Buying Twitter followers also falls into this gray area.

Even seemingly innocuous things, like your barcode in the Starbucks app, can be vulnerabilities. That barcode represents your gift card balance and doesn't refresh. If someone snaps a photo of it, they can spend your balance. This is especially problematic if you have auto-reload enabled. I encountered this issue firsthand when a TechCrunch reporter reached out, wondering why and how Starbucks had charged him hundreds of dollars via auto-reload.

Fraud isn't just an individual problem; it thrives within organizational silos. When security, business, and trust and safety teams don't communicate effectively or share user information, it creates gaps that fraudsters exploit, especially given the slow pace of inter-team communication. The challenge is even greater between different organizations.

XKCD #538: Security

From an attacker's perspective, they only need to find the weakest link. As the XKCD comic illustrates, sometimes physical coercion is simpler than cracking complex encryption. Defenders, conversely, must secure the entire surface; one open door leaves the whole system vulnerable. The most secure computer might be one disconnected from everything, but that's impractical.

Key Questions in Fraud Detection

When tackling fraud, we often ask fundamental questions about every user and transaction:

  • Who is performing this action? Is it the actual account holder, their family, or someone else? Which device are they using?  
  • What are they doing? What transaction are they attempting?  
  • How are they doing it? Are they connecting via a VPN or a secure network?  
  • Why are they doing it?

These questions are central to protecting systems.

Advanced techniques like deepfakes complicate identity verification. At a previous company (MetaMap), we required users to match their faces to their photo IDs to verify their identities.

Screenshot from a sample (deepfake) identity verification video. Spot the telltale editing artifacts!

We received videos like the one above, where a person seemed to follow instructions but subtle visual glitches revealed it was a deepfake created from a still photo using AI. Similarly, audio deepfakes can mimic voices convincingly.

Photoshop enthusiasts might spot something familiar from the passport page on the left. Such clues won't generally be picked up by automated facial recognition software.

We also encountered clusters of fraudulent passports – for example, different documents from Venezuela and Colombia featuring the exact same face, likely created using readily available online tools that produce surprisingly legitimate-looking fakes.  

How users interact with websites also provides clues. Fraudsters often try to exploit login pages or password recovery mechanisms. For example, trying to reset a password or sign up for a new account on a website can reveal if a given email address is associated with an account, which can then be targeted using leaked passwords.

Challenges in applying data science to fraud

Applying data science to fraud presents unique challenges:

The counting problem

Simply determining basic counts – like the number of real users versus fake accounts – is incredibly difficult. Getting this wrong has huge implications. I worked with a major US retailer who believed they had many millions of users, only to discover >90% were fake accounts. Discoveries like this are problematic, especially as it means that growth and marketing metrics are suddenly inaccurate. What does the actual user conversion funnel look like? Where in the world are the real users located? This is troubling especially if the company is public – stock prices may be affected. Accurate counting is fundamental.  

Finding ground truth

Labeled datasets (fraud vs. non-fraud) are rare and often unreliable. Take credit card chargebacks: a cardholder has 90 days to report unauthorized use. However, many people don't notice small fraudulent charges, or they might file the chargeback late or not at all. So, the absence of a chargeback doesn't definitively mean that no fraud has occurred. Similarly, determining if a login attempt is legitimate or malicious is tough. I've seen companies provide messy, poorly formatted CSV files they claimed were ground truth, which were practically unusable.  

Highly imbalanced data

Fraudulent events are typically rare compared to legitimate ones (e.g., maybe 2% fraud). Standard statistical techniques like oversampling exist, but if the absolute number of fraud cases is too low, modeling becomes very difficult. However, this isn't universal. In Mexico, for instance, the credit card fraud rate is reportedly as high as 30%, drastically changing the required approach. It is often difficult to figure out the baseline fraud rate.

Adversarial nature

Fraud isn't static. When you block one method, fraudsters adapt and change tactics, forcing you to constantly readjust – essentially a cat-and-mouse game.

Fraud is often a cat-and-mouse game. Image credit: Thoughts of Tom and Jerry, by rdennis.

There is a difficult balance between blocking bad actors and not disrupting legitimate activity or losing valuable signals from the fraudsters themselves. I encountered this with a US telco whose website was a prime target because it provided access to users' text messages (and thus one-time password codes). They had a large IT team and budget based on perceived high traffic volumes. We discovered most of this traffic was actually bots attempting fraudulent logins; the real user traffic was much lower. After we implemented blocks, the attackers shifted techniques, but the underlying pattern of real user activity (higher during the day, lower at night) became clear. This forced a major shift in their IT strategy. 

From https://www.f5.com/company/blog/ten-questions-bot-mitigation-vendor

Implementation Costs & Complexity

Transitioning from traditional methods (like hard-coded rules) to machine learning is a significant upfront investment. I worked with a large fintech company that had implemented over 2,000 hard-coded fraud rules – incredibly difficult to manage. While AI/ML has a high initial cost, it typically becomes more scalable and manageable over time compared to maintaining thousands of potentially broken rules.  

Looking back: ten reflections

Over the years, I've gathered some reflections on practicing data science in the fraud domain. These are not in order of priority.

[1] Attention to detail matters

Seemingly minor details can be crucial. During the distribution of COVID relief funds, one fraudster named "Sandy Tang" used multiple email addresses like sandy.tang@gmail.com, san.dy.tang@gmail.com, etc., to apply for benefits multiple times.

Because Gmail ignores dots in usernames, all these emails went to the same inbox, but many systems likely treated them as distinct accounts based on literal string matching. This simple oversight allowed significant fraud. The fraudster could have been smarter, but this detail was key. Fortunately, he was caught and brought to justice. Many other similar fraudsters were probably never caught.

[2] Visualizations can be powerful

Visualizing data helps in understanding patterns yourself and communicating insights to others (clients, bosses, etc.). Consider how users move their cursors around a button on a webpage – in the diagram below, green dots represent mouse movements, and red dots represent mouse clicks.

A "heat map" visual of mouse movements (green) and clicks (red) around a button on a webpage, created from ~6K webpage visits.

Humans move erratically and click imprecisely (green trails, spread-out red clicks). Bots, however, often move cursors in straight, predictable lines and click precisely within the target area (clean green trails, concentrated red clicks). We saw this clearly when analyzing thousands of interactions on a state courts website where users had to click to "agree" to terms of service to look up records – bots were clearly used to automate searching large numbers of records.

Human vs. bot behavior on a terms of service page that gated access to a state courts records search.

Another example from online banking logins showed human fraudsters exhibiting distinctive mouse movements and keystroke patterns. Legitimate users moved their cursors everywhere around the login form, and did not really have to click on the form as their usernames and passwords were most likely already auto-filled. Fraudsters, on the other hand, did not waste any time with their cursors; they also repeatedly hit "backspace" to iterate through credit/debit card numbers (which served as the user ID for this particular bank's website), something real users don't do.

In this image, green = mouse movements and red = mouse clicks.

[3] Good data structures are essential

How you structure data impacts analysis. Representing relationships between applications, user identities, and emails as an identity graph revealed connections that would be hidden in a simple table.

For instance, a graph clearly showed one email address linked to numerous passports from various countries. Investigating this with SQL would require complex queries, whereas graph queries are often faster and more intuitive for uncovering such anomalies.

Graphs also easily visualize distributions, like the number of applications per person, highlighting outliers who submit an unusual number of applications.

[4] Invest in tooling 🧰

Off-the-shelf solutions aren't always sufficient or optimal when we want to do something specific. Often, we need to build our own tools to analyze or visualize domain-specific data more effectively. For example, I built such a tool in late 2018 (codenamed "VizMonkey") to visualize how users interact with websites – their mouse movements and keystrokes. This enabled us to easily differentiate between humans and bots, and convince clients that we could do so accurately too.

Here is an example of human behavior. There is significant variation in the timing of the user's keystrokes – key-ups and key-downs represented by the thin vertical lines. The green squiggly line represents mouse movements.

On the other hand, here is how a bot interacts with the same website. Note how closely and evenly spaced the keystrokes are, and how clicks are performed at different points on the screen without any mouse movements?

The videos and screenshots from this tool helped convince many customers of the value of our bot defense products. Over time, our analysts also used this tool to identify new signals and telemetry that we could collect from users' browser and device environments, as well as to develop features for our machine learning models.

Over the years, I have realized the importance of hiring data scientists (or ML engineers, or AI engineers!) who are able to develop tools to bring the team's work to the next level.

[5] Stay skeptical... and [6] separate signal from the noise

In data science, and in fraud detection particularly, it is crucial to approach data and claims with a degree of skepticism. Never take information at face value, regardless of how convincing it might seem.

When we were scoping out a project with a major telco client, the client told us that their normal traffic load for their login endpoint was around 500K logins per hour. This equated to over 20M logins per day – which was a really large volume for a telco's website. (When was the last time you logged in to your telco's web portal?)

Our skepticism was validated when we onboarded the endpoint in question. During the first week of deployment, we observed that their traffic was indeed ~500K per hour, but not all of it was normal traffic. One would expect to see a nice diurnal pattern to such traffic, and that was conspicuously missing. Our technology allowed us to identify that the actual human login volume (green) was at least 90% lower, and the bulk of the traffic was automated, i.e. from bots (yellow).

The broader point is that when we are presented with some data, always question the source and accuracy of the data and look for anomalies. Maintaining this healthy skepticism allows us to challenge assumptions and dig deeper to separate genuine signals from deceptive noise. This is a crucial process, as it is quite impossible to get perfectly clean data in the real world. (Otherwise, would data science still be fun? 🤔)

Here is another example using web traffic to illustrate the point. Web application traffic (from humans) is typically diurnal – and one might be tempted to assume that the following traffic graph comprises mainly human traffic. At least 80% to 90%, perhaps?

to 9

But the actual traffic is something else entirely – at least half the traffic is probably fraudulent!

[7] Mathematical modeling and intuition

While domain expertise and practical tools are crucial, a strong foundation in mathematical modeling and intuition provides a powerful lens for understanding and detecting fraud. Human behavior often exhibits predictable patterns that can be modeled mathematically. For instance, website traffic or transaction volume might follow a regular cyclical pattern that can be approximated by a sine wave:

By establishing such baseline models, we can identify significant deviations or anomalies that might indicate fraudulent activity. Even small, localized traffic anomalies can be modeled as fixed effects to gain deeper insights:

Furthermore, concepts like entropy and the modeling of inter-arrival times of events (e.g., using a Poisson process) can provide valuable frameworks for detecting deviations from expected behavior. The ability to translate real-world observations into mathematical terms and identify some objective function to be solved or optimized for is both an art and a science. (This was very much the subject of one of my favorite courses during my time at Stanford – MS&E 206: Art of Mathematical Modeling which unfortunately is no longer offered!)

Such mathematical modeling and intuition skills differ from the baseline math ability that is needed to understand how machine learning works and how to train and optimize models. Both sets of skills are important.

[8] Look for deep domain expertise

There is no "one-size-fits-all" approach to data science. We often discover the most interesting insights when we pair strong technical skills with deep domain expertise. The latter will provide necessary context to understand what is "normal" and therefore distinguish what stands out.

My hiring philosophy is to look for "Pi-shaped" individuals: those with strong "horizontal" technical skills (e.g. software engineering, coding, ML, etc) and deep expertise in one or two "verticals".

Deep domain expertise is important when figuring out what signals or telemetry to collect and therefore what features to engineer. It also informs what questions to ask customers, and how to pitch results to customers. This is especially true when customers are deep specialists in their own domains but are not familiar with data science or ML.

For example, when trying to distinguish between human and automated traffic, it is often helpful to collect signals from the web browser. To do so requires a deep understanding of how browsers work – i.e. how they render webpages, the Document Object Model, how Javascript works, etc. Otherwise, it would be difficult to identify subtle differences between browsers, such as how they render the canvas object...

...or emojis.

[9] Think like the attacker

The best way to defend a system is to think like an attacker. What would you attack, and how would you do it? Attackers often look for the most convenient way of breaking in to a system. They just need a single way to get in (and get out), unlike defenders who need to secure the entire attack surface.

From https://www.abbottcartoons.com/cartoon-blast/classic-spectickles-4-oclock-cartoon7094599

Many attacks against online systems are automated – such as credential stuffing and account takeovers. However there are stages of an attack strategy that might be relatively difficult to automate, or which require the "human touch" (such as phishing or social engineering). In such cases, even things like browser window positioning could provide subtle clues of an attacker. If you were trying to login to many new accounts, would you use a password manager? Or would you have a text editor on one side of the screen, and copy/paste credentials to the login page in a browser window on another side of the screen?

Such clues might not be meaningful on their own, but could be illuminating when considered together with other clues – for example, large number of accounts tried, unexpectedly low success rate, high levels of pasting basic information like names and email addresses, unusual IPs/ASNs or a lack of "account affinity" (e.g. login attempts to accounts from previously unassociated devices), etc.

As another example, imagine if you were trying to monitor an airline's flight prices for SFO-LHR, and needed to ping the airline's fare search endpoint once a minute. It would be suspicious for you to use a single IP address and a single browser to do so – you'd probably get rate limited quickly – so you want to spoof your network location and your browser. You start sending traffic across a VPN, and also start to get creative about your (purported) browser's user agent strings.

On the defense side, you might see telltale user agent strings such as these:

These might not look abnormal unless you had subject matter expertise and you were also thinking like an attacker – what possible mistakes could the attacker make, when sending scripted traffic?

[10] Data science/ML must be paired with strong SWE skills

The last reflection is that for data science and ML to be really successful, they must be complemented by strong software engineering (SWE) skills. This is true at both organizational and personal levels.

Imagine that you are building a product that helps e-commerce platforms weed out fake user reviews. You develop a solid ML model to score user reviews, with 99% recall and 95% precision, based on millions of user reviews that you have assembled and labeled (with the help of both humans and LLMs!). Now... how do you get this model to production? How do you collect the features that you need for your model? How do you ensure that the scoring process is quick, within the 100ms budget that a large customer has given you? How do you integrate your model's scores with the customer's trust and safety processes? And how do you monitor your model in production?

If you were building out an engineering team to build this product within a large company, you would typically hire a data scientist (to identify patterns and features in the data), an ML engineer (to train and improve models), and a couple of SWEs to bring all of these to production. If you were building this product within a small startup, you would probably look for a couple of "full-stack ML engineer" who are able to perform all of these roles reasonably well. And if you were doing this as a side project, you would have to do it all yourself – including bringing it to production. Some might think that this is really a "last-mile" problem, where model training is the bulk of the work – but in my experience, the engineering piece is likely just as difficult as the ML piece (if not more so!). Either way, a lack of SWE skills would mean that the work gets stuck in notebooks for much longer than necessary – even a slight imbalance would mean that the SWEs in the team become the bottleneck.

Looking forward

What does the future hold for data science and data scientists, particularly in fields like fraud and security?

From https://a16z.com/llmflation-llm-inference-cost/

We're seeing a clear trend in the increasing cost-efficiency of LLMs. Their cost per million tokens continues to drop, suggesting they're becoming more accessible and powerful tools. GPT 3.5 Turbo used to cost ~$4/1M tokens in early 2023, while GPT 4o-mini – a much more powerful model! – costs just $0.60/1M output tokens today.

But how exactly will we integrate these more affordable, capable models into our daily data science workflows? How will their growing abilities change the way we approach problems and build systems?

This naturally leads us to consider the concept of AI agents. What are these systems, and how might they reshape the data science landscape? While the term "AI agent" is often used with a lot of industry hype and can be confusingly defined, we can think of them as systems designed to act autonomously to fulfill some objectives over time. This requires them to perceive their environment, process information, reason to decide on actions, take those actions, communicate, and remember context. Systems capable of assisting throughout the software development lifecycle, including coding, are emerging – there is little doubt that AI agents will play a larger role in software engineering in the future. AI agents are also starting to creep into data science, although I have yet to see something substantial yet beyond code generation. The large amounts of data that data scientists have to handle is still currently a limitation for the limited contexts of LLMs (though perhaps not for long).

Josh Payne / CS224G slides, "LLMs for Code Generation"

So, what will the role of a data scientist actually look like in the next couple of years, or even two decades from now? We can likely expect increased productivity as AI tools help automate time-consuming tasks. Will AI agents eventually handle things like data cleaning, allowing data scientists to work with cleaner data? While that would be ideal, it might still be a ways off. However, certain core human skills seem indispensable. Can AI replicate the need for developing intuition about data, maintaining a healthy skepticism about the information presented, or the ability to delve deep and understand complex problems? These skills – critical thinking, problem understanding, and skeptical analysis – appear crucial. The job market is already reflecting this shift, with new titles like "ML Engineer" and "AI Engineer" appearing, often with definitions that aren't entirely clear.

But for those focused on fraud and security, is there concern about job security? My view is that as long as humans exist, fraud and security challenges will persist. Therefore, in this domain, job security is virtually guaranteed. The specific job scopes for data scientists in fraud and security will evolve though. The key challenge will likely be adapting our skills, continuously learning, and figuring out how to best combine new AI capabilities with the enduring need for human expertise.

Thank you for listening to me. All the best with this last quarter at Stanford!

[Q&A session.]