Network Scores in Advanced Digital Footprinting

Updated on 21.01.25
5 minutes to read
Copy link

SEON Blackbox Machine Learning

SEON offers ready-to-use blackbox machine learning models from day one, eliminating the need to wait for data accumulation while fraudulent events might go undetected. Our Advanced Digital Footprint Machine Learning Network Scores leverage pretrained “base models” to instantly calculate email and phone network scores as soon as you start using SEON.

These base models are developed using sanitized, cross-customer data from SEON’s proprietary consortium dataset, designed to maximize predictive accuracy. They serve as a starting point, enabling immediate fraud risk assessment of email addresses and phone numbers.

As you integrate SEON into your workflows — configuring decisioning rules and feeding back verification labels — SEON’s ML evolves to create a bespoke model tailored to your data. This customized model offers enhanced predictive accuracy and is uniquely tuned to the patterns in your business. The customer-specific models will replace the base model as soon as they are available and usually outperform it. Usually, a bespoke model outperforms a base model.

Day 1

SEON desisioning and labeling implemented

1000 transactions with 100-100 Declines and Approved collected

Model Management processes ensure the performance assessment

The best-performing model goes into production

 

Base Models used

Transaction data and verification labels flow in

Customer-Specific Model trained

Continous evolution of models

Customer-Specific Model used usually

This document explains the foundation of these base models and the factors that drive the scores they generate.

The data used: Sanitized, relevant, representative

We utilized sanitized data from our top customers, which included verified fraudulent transactions labeled accordingly. To ensure relevance, we focused on transaction data from August to November 2024. The dataset was curated to reflect the top-tier customer base using SEON, considering factors such as user and phone geographies, email domains, and phone carriers. The final sample included 1.5 million transactions for email-based models and 1 million transactions for phone-based models.

The Email Network Score Base Model

The factors (referred to as features) that influence the network score for email addresses fall into several categories. These include data provided directly by customers to our API (e.g., email domain), information enriched by SEON’s capabilities (e.g., total registrations), insights derived from SEON’s consortium data (e.g., hits), and calculated metrics designed to capture key fraud patterns (e.g., vowel ratio in the email). The table below provides a sample of these features used to determine a higher network score.

 Important feature examples
Consortium dataThe number of SEON customers have seen the email and saw it fraudulent.
Number of customers having the email currently or previously on the backlist.
Email characteristicsLikely a gibberish email username.
Deliverable email address.
The number of data breaches the email was seen.
Social Media registration patternNumber of social media registrations with the email in total.
Number of social media registrations by personal and business types.

The Email Network Score Behavior and Usage Suggestions

Our model training process has achieved a predictive performance deemed ready for deployment. The  AUC (Area Under the Curve) value of 0.94  demonstrates the model’s exceptional ability to detect fraud, where AUC = 0.5 represents random guessing (akin to flipping a coin). An AUC of 0.94 indicates that, with the right decision threshold, the model is highly effective at distinguishing between fraudulent and non-fraudulent activities.
The choice of an appropriate threshold for making decisions based on the Network Score depends on your specific use case and risk tolerance. Since the Network Score is a probability metric, any value above 0.5 suggests a likely fraudulent email address. For businesses aiming to minimize false positives, we recommend using a threshold of 0.85 or higher.

The table below summarizes the model’s performance metrics — Precision, Recall, and Accuracy — at thresholds of 0.5 and 0.85. Higher values for these metrics generally indicate better performance. However, keep in mind that Precision and Recall often trade off against one another as the threshold shifts from the balanced 0.5 point. Precision stands for the ratio of transactions classified correctly as fraudulent compared to all transactions the model predicted to be fraudulent. Recall stands for the ratio of transactions correctly predicted to be fraudulent compared to all verified fraudulent transactions. Accuracy is the proportion of all predictions that were correct, whether positive or negative.

Network Score 

Threshold

Precision

Recall

Accuracy

0.5

0.6846

0.6654

0.9359

0.85

0.9727

0.1295

0.9126

The Phone Network Score Base Model

The factors that significantly influence the network score for phone numbers include data enriched by SEON’s capabilities (e.g., total registrations), insights derived from SEON’s consortium data (e.g., hits), and calculated metrics designed to detect key fraud patterns (e.g., whether the original carrier matches the provider carrier). The table below ranks these three categories of factors by their importance in increasing the network score.

 Important factors
Consortium dataThe number of SEON times SEON has seen the phone and saw it fraudulent.
The number of customers have seen the phone and saw it fraudulent.
Phone number characteristicsPhone number registration country characteristics.
Mobile phone service lookup characteristics
 Phone carrier characteristics.
Social Media registration patternNumber of social media registrations by personal and technology types.
Number of social media registrations with the phone in total.

The Phone Network Score Behavior and Usage Suggestions

Our model training process has achieved a predictive performance deemed ready for deployment The AUC (Area Under the Curve) value of 0.79 indicates the model’s strong ability to detect fraud, whereas an AUC of 0.5 represents random guessing (similar to tossing a coin). An AUC of 0.79 suggests that, with the right decision threshold, the model is effective at distinguishing between fraudulent and non-fraudulent activities.
The optimal threshold for decision-making based on the Network Score depends on your specific use case and risk tolerance. Since the Network Score is a probability metric, any value above 0.5 suggests a likely fraudulent phone number. To minimize false positives, we recommend setting the threshold at 0.90 or higher.

The table below presents the model’s performance metrics—Precision, Recall, and Accuracy—at thresholds of 0.5 and 0.90. Higher values for these metrics generally indicate better performance. However, keep in mind that Precision and Recall often trade off against one another as the threshold shifts from the balanced 0.5 point. Precision stands for the ratio of transactions classified correctly as fraudulent compared to all transactions the model predicted to be fraudulent. Recall stands for the ratio of transactions correctly predicted to be fraudulent compared to all verified fraudulent transactions. Accuracy is the proportion of all predictions that were correct, whether positive or negative.

Network Score 

Threshold

Precision

Recall

Accuracy

0.50

0.7408

0.2373

0.9228

0.90

0.9480

0.0665

0.9144

Learn more