Network Scores in Advanced Digital Footprinting
Updated on 21.01.25
5 minutes to read
Copy link
SEON Blackbox Machine Learning
SEON offers ready-to-use blackbox machine learning models from day one, eliminating the need to wait for data accumulation while fraudulent events might go undetected. Our Advanced Digital Footprint Machine Learning Network Scores leverage pretrained “base models” to instantly calculate email and phone network scores as soon as you start using SEON.
These base models are developed using sanitized, cross-customer data from SEON’s proprietary consortium dataset, designed to maximize predictive accuracy. They serve as a starting point, enabling immediate fraud risk assessment of email addresses and phone numbers.
As you integrate SEON into your workflows — configuring decisioning rules and feeding back verification labels — SEON’s ML evolves to create a bespoke model tailored to your data. This customized model offers enhanced predictive accuracy and is uniquely tuned to the patterns in your business. The customer-specific models will replace the base model as soon as they are available and usually outperform it. Usually, a bespoke model outperforms a base model.
Day 1 | SEON desisioning and labeling implemented | 1000 transactions with 100-100 Declines and Approved collected | Model Management processes ensure the performance assessment | The best-performing model goes into production
|
Base Models used | Transaction data and verification labels flow in | Customer-Specific Model trained | Continous evolution of models | Customer-Specific Model used usually |
This document explains the foundation of these base models and the factors that drive the scores they generate.
The data used: Sanitized, relevant, representative
We utilized sanitized data from our top customers, which included verified fraudulent transactions labeled accordingly. To ensure relevance, we focused on transaction data from August to November 2024. The dataset was curated to reflect the top-tier customer base using SEON, considering factors such as user and phone geographies, email domains, and phone carriers. The final sample included 1.5 million transactions for email-based models and 1 million transactions for phone-based models.
The Email Network Score Base Model
The factors (referred to as features) that influence the network score for email addresses fall into several categories. These include data provided directly by customers to our API (e.g., email domain), information enriched by SEON’s capabilities (e.g., total registrations), insights derived from SEON’s consortium data (e.g., hits), and calculated metrics designed to capture key fraud patterns (e.g., vowel ratio in the email). The table below provides a sample of these features used to determine a higher network score.
Important feature examples | |
Consortium data | The number of SEON customers have seen the email and saw it fraudulent. |
Number of customers having the email currently or previously on the backlist. | |
Email characteristics | Likely a gibberish email username. |
Deliverable email address. | |
The number of data breaches the email was seen. | |
Social Media registration pattern | Number of social media registrations with the email in total. |
Number of social media registrations by personal and business types. |
The Email Network Score Behavior and Usage Suggestions
Our model training process has achieved a predictive performance deemed ready for deployment. The AUC (Area Under the Curve) value of 0.94 demonstrates the model’s exceptional ability to detect fraud, where AUC = 0.5 represents random guessing (akin to flipping a coin). An AUC of 0.94 indicates that, with the right decision threshold, the model is highly effective at distinguishing between fraudulent and non-fraudulent activities.
The choice of an appropriate threshold for making decisions based on the Network Score depends on your specific use case and risk tolerance. Since the Network Score is a probability metric, any value above 0.5 suggests a likely fraudulent email address. For businesses aiming to minimize false positives, we recommend using a threshold of 0.85 or higher.
The table below summarizes the model’s performance metrics — Precision, Recall, and Accuracy — at thresholds of 0.5 and 0.85. Higher values for these metrics generally indicate better performance. However, keep in mind that Precision and Recall often trade off against one another as the threshold shifts from the balanced 0.5 point. Precision stands for the ratio of transactions classified correctly as fraudulent compared to all transactions the model predicted to be fraudulent. Recall stands for the ratio of transactions correctly predicted to be fraudulent compared to all verified fraudulent transactions. Accuracy is the proportion of all predictions that were correct, whether positive or negative.
The Phone Network Score Base Model
The factors that significantly influence the network score for phone numbers include data enriched by SEON’s capabilities (e.g., total registrations), insights derived from SEON’s consortium data (e.g., hits), and calculated metrics designed to detect key fraud patterns (e.g., whether the original carrier matches the provider carrier). The table below ranks these three categories of factors by their importance in increasing the network score.
Important factors | |
Consortium data | The number of SEON times SEON has seen the phone and saw it fraudulent. |
The number of customers have seen the phone and saw it fraudulent. | |
Phone number characteristics | Phone number registration country characteristics. |
Mobile phone service lookup characteristics | |
Phone carrier characteristics. | |
Social Media registration pattern | Number of social media registrations by personal and technology types. |
Number of social media registrations with the phone in total. |
The Phone Network Score Behavior and Usage Suggestions
Our model training process has achieved a predictive performance deemed ready for deployment The AUC (Area Under the Curve) value of 0.79 indicates the model’s strong ability to detect fraud, whereas an AUC of 0.5 represents random guessing (similar to tossing a coin). An AUC of 0.79 suggests that, with the right decision threshold, the model is effective at distinguishing between fraudulent and non-fraudulent activities.
The optimal threshold for decision-making based on the Network Score depends on your specific use case and risk tolerance. Since the Network Score is a probability metric, any value above 0.5 suggests a likely fraudulent phone number. To minimize false positives, we recommend setting the threshold at 0.90 or higher.
The table below presents the model’s performance metrics—Precision, Recall, and Accuracy—at thresholds of 0.5 and 0.90. Higher values for these metrics generally indicate better performance. However, keep in mind that Precision and Recall often trade off against one another as the threshold shifts from the balanced 0.5 point. Precision stands for the ratio of transactions classified correctly as fraudulent compared to all transactions the model predicted to be fraudulent. Recall stands for the ratio of transactions correctly predicted to be fraudulent compared to all verified fraudulent transactions. Accuracy is the proportion of all predictions that were correct, whether positive or negative.