How to make the most out of your machine learning model

Updated on 04.04.25

6 minutes to read

Copy link

Overview

Machine learning has become one of the most talked-about concepts in fraud prevention. But as with AI in general, it's easy to get overwhelmed by the nuances and the abundance of expanding possibilities these technologies can offer.

Beyond providing an overview of the use of machine learning and AI in fraud detection, we also wanted to share how you can improve your data models with SEON. Add these recommended data points from the API response to your existing machine learning model so your business can benefit the most and make you as successful in fraud prevention as possible.

It only takes a few

Needless to say, the more data you have, the more educated decisions you can make. SEON can create and connect hundreds of data points using only a handful via digital footprinting: you can get plenty of extra information on any user only from an email or IP address, which, of course, comes in handy when calculating risks.

We gather this information via an automated solution from various sources and publicly available databases when you are calling our APIs. While the data collected is primarily utilized for fighting fraud, you can theoretically use it for other purposes as well. However, as a SEON customer, you must ensure that your use of this enriched data complies with all data protection regulations in your local jurisdiction.

Stay data-hungry

So, how's this abundance of data put to good use? That's where machine learning comes into the picture. Machine learning, a subset of artificial intelligence (AI), uses algorithms to identify patterns behind fraudulent transactions and create data models. It then suggests you risk rules to implement so that you can catch suspicious activities earlier.

It's important to note that machine learning is indeed all about learning: the more information you "feed" it and the more training it gets by accepting/flagging its suggested risk rules, the more accurate it gets. Not only does this make fighting fraud easier and faster, but it also enables you to benefit from the data models in other areas, such as alternative credit scoring, customer segmentation, or loan default risk calculations.

Where to start

While the best features for your machine learning model are specific to you and highly depend on your industry, business, and individual needs, we'd like to give you a head start with a generic list of suggested data points from the API response from which you can benefit the most.

Fraud score

fraud_score

A pillar of SEON's scoring system and logic, this data point might be the easiest to rely on, as it accumulates all the default rules and scores. However, these default settings might not entirely cover your specific needs, so it's worth refining the default scores and adding custom rules.

Tip: Don't hesitate to contact our Technical Services team if you need help with these settings.

Email, and phone network scores

email_details.risk_scores.global_network_score, phone_details.risk_scores.global_network_score

While the all-in-one fraud_score covers the basics, it might be beneficial to dig deeper and more comprehensive with machine learning scores utilizing consortium data to predict fraud likeliness.

Learn more: Explore how Network Scores work.

Number of social media accounts

all_social_media_profile_count, email_social_media_profile_count, phone_social_media_profile_count

Summing up social media registrations (email/phone/both) can accurately indicate whether we are facing a real person or a fake identity. You'll have to parse your fraud API response and count the number of 'true' values in the email and phone modules.

account_aggregates.business.total_registration, account_aggregates.personal.total_registration, and for example a sub-category: account_aggregates.personal.email_service.registered

Using Advanced Digital Footprinting, we check if an account is registered with the email address or phone number on more than 160 sites and return the number of registrations found, categorised by industry types. Aggregated results are returned for two top-level categories: business and personal. Both are unfolded to more granular groups, returning the total number of registrations found and returns for the top-level categories as well as overall.

Email is older than…

minimum_age_months, earliest_profile_date

Having an estimate of when an email address was created turned out to be an essential piece of information in many of our models. It indicates whether the email address is real and if its owner used it elsewhere. SEON now provides two fields ready to use, approximating the age of the email address. The minimum age is calculated by subtracting the earliest_profile_date from the current date. While the earliest profile date indicates the earliest date detected from the available data, such as the first occurrence of the email in a data breach or the creation date of an associated social profile.

Number of data breaches

email_details.breach_details.number_of_breaches

While a data breach is not positive, they can prove that an email exists and has been used elsewhere.

A data breach is an event where privately held information is made public. The most common type of data breach tends to affect user records, which are exchanged or sold on online marketplaces. If we can find an email in such user records, it's safer to assume it's been around for a while.

Blackbox score

blackbox_score

You might also want to consider the probability that SEON'S Blackbox Machine Learning model provides.

Tip: Remember, the more transactions you label in our system, the more accurate this model gets.

IP address-related features

Take a look at the following IP-related fields:

ip_details.web_proxy
ip_details.public_proxy
ip_details.tor
ip_details.vpn
ip_details.open_ports_number

You can also check if the IP address belongs to a data center and if it's blacklisted.

Important nice-to-haves

There are further additions that might not be much of a help on their own but might play a crucial role in "making the final call," not to mention in a learnable way. So, they are often the most valuable players when training the machine learning model.

You should therefore keep an eye on these things, too:

Whether a rule has been triggered by an email address similar/not similar to the user's full name. You can use Default rule E123 to check this:

The type of IP address, based on the internet service provider (ip_details.type). The IP address belonging to a data center, library, educational institute, organization, government, mobile or fixed line ISP, etc., can make a huge difference.
The email domain's creation date and time (UTC timezone) (the year and month value of email_details.domain_details.created).
Whether the email's domain is a free provider such as Gmail, Hotmail, etc. (email_details.domain_details.free).
Whether the email's domain is disposable or has been proven fraudulent before (email_details.domain_details.disposable).
The battery level of the used device (device_details.battery_level). You can only access this data when you use Device Fingerprinting for device intelligence with SEON's iOS or Android SDK.

Note: Note: Certain fields in the API response depend on which API was called (email: email_details, phone: phone_details, IP address: ip_details, or the device fingerprinting SDK: device_details).

Only the beginning

It might seem like a lot, but this was merely an introduction to the myriad of possibilities machine learning offers. These generic additions can already get you far when fighting fraud. Still, we encourage you to dive deeper, check out further recommendations for feature engineering, and find the best picks based on your specific needs to get the best possible results.

About the author

Gellért Nacsa is the Data Science Lead at SEON. He studied applied mathematics at university and worked as a data analyst, algorithm designer, and data scientist. He has been enjoying data and machine learning for over six years.