Using Fuzzy Search in AML queries
Updated on 14.11.24
8 minutes to read
Copy link
Overview
SEON's AML tools offer a wide range of solutions and configuration options to support KYC & MLRO officers in their fight against fincrime and money laundering. Simply run quick checks with simple search, or use or complex search to hone in on the correct person. You can also decide to use exact search or fuzzy search and shift through their data quickly with our relevancy score system.
Complex search
Sometimes also known as multi-faceted search, complex search allows you to add several distinct data types to your search queries. Add a country and date of birth to your search term to enhance search results.
By default, SEON will include all name matches in your search results regardless of date of birth and country data. As official sources are sometimes lacking on date of birth and country data this is the best way to ensure that sanctioned individuals or criminals aren't missed by checks due to incomplete information.
Why use Fuzzy Search
Fuzzy search is a search algorithm that uses approximate string matching to find results similar to, but not exactly the same as, your search query. It helps you identify risky customers and users when exact matches fail.
Depending on your settings, fuzzy search will serve results similar to your original search query alongside any exact matches. This can help you catch high-risk individuals who'd otherwise fall through the cracks and save your MLRO officers time and effort.
Working on a global scale, as most online businesses do, you'll quickly encounter names originating from different writing systems. Transliteration – transferring written text between writing systems, for example, from Cyrillic or Arabic script to Latin characters – can quickly become a problem.
The results of transliteration depend on the languages involved. For example, Hungarian, Portuguese, and English all have different rules for recoding the letters of other writing systems, while each using the Latin alphabet. That's how Japan's historic capital becomes Kiotó, Quioto, or Kyoto, respectively. You can probably guess how easily that becomes a problem with the names of people and AML checks.
Then there's the issue of name variants or slightly different spellings of the same name. For example, the name Igor has a less common spelling, Ygor – that's when typos become a true nemesis.
How to use Fuzzy Search
You can easily use Fuzzy Search in any queries sent to SEON over your API integration.
When sending data over an API integration, you can enable and disable fuzzy search using the fuzzy_enabled parameter. You can also adjust fuzzy search settings if needed. Please be careful, as these settings can drastically increase the number of false negatives experienced by your team. The default settings will serve you best in most cases.
Relevancy score
AML API's Relevancy Score feature will help you determine how closely search results found using fuzzy search match the search name you entered. You can change your relevancy score threshold via your AML API request, to ensure your team doesn't encounter a high number of false positives.
The adjustable scoring thresholds or the relevancy score allows your team to set the sensitivity of the fuzzy search engine. In low-risk cases, such as PEP checks, you can chose to set the scoring threshold so strictly that only results with a full match of date of birth and name occur.
However, in high-risk cases such as sanction screening, we suggest that users verify every hit manually with a lower relevancy threshold where only names match using fuzzy search to ensure that the client is not a criminal or sanctioned individual.
Please be careful, as these settings can drastically increase the number of false negatives and false positives experienced by your team. A higher score threshold increases the number of false negatives (lower recall), and a lower score threshold increases the number of false positives.
But with a smart threshold policy, you can reduce manual workloads in low-risk cases, and concentrate on high-profile investigations where human decision-making is essential. The default settings will serve you best in most cases.
Fuzzy Search Settings
When using AML API over an API integration, you can customize your fuzzy search settings by including parameters in your API request.
Include your fuzzy parameters nested within the config.fuzzy_config parameter. You can use the parameters below to tweak what kind of results fuzzy search returns.
phonetic_search_enabled
– Default setting: False.
If enabled the parameter will turn on SEON's Phoinetic Search module. This means that the tokens entered into AML API are converted into a phonetic representation using the double metaphone, koelnerphonetik, haasephonetik, beider-morse, and daitch-mokotoff algorithms. When enabled the AML lookup will only use these phonetic representations of entered name and those in the database.
edit_distance_enabled
– Default setting: True.
edit distance is the number of single-character changes needed to turn one term into another (e.g. mat » bat has an edit distance of 1). When set to True, AML lookups will return names similar to the search term entered: e.g.: 'Anastasia' matches 'Anastasya'. When you enter a search query, our system compares it to the names in our database. If a name token (a name element) has a length equal to or greater than 7 letters, we allow for 1-character edits to find potential matches. If the token length is equal to or greater than 13, we allow for 2-character edits to find potential matches.
For example, you enter the name "Tetjana Donez" which consists of two tokens. With the default value of 7, for each token with a character length above this value, our system will search for variations with 1 edit distance (single-character changes) to find potential matches.
For example, "Tetjana Donez" will be considered a match with "Tetiana Donets" because they differ by just one character.scoring.result_limit
– Default setting: 10.
Use this parameter to define the maximum number of hits AML API should return in the result set. The result set is ordered by the source type hits are identified in: ['sanction', 'warned_entities/crimelist', 'central_bank/watchlist', 'pep']scoring.score_threshold
– Default setting: 0.585.
Set the Relevancy score threshold. The relevancy score is a normalized probability score with possible values between 0-1.
DOB Filter
The Date of Birth (DOB) filter is applied by default during searches to enhance accuracy. Here’s how it works:
- DOB Discrepancy Filtering: If the DOB in the search query does not match the DOB in the result, the result will be filtered out.
- Handling Missing DOB Information: If the database lacks DOB information (e.g., when authorities haven’t disclosed it), the result will not be filtered out, as it’s impossible to conclusively rule out a match based solely on name.
- Customizable Filtering Options: For specific use cases, we can configure the system to filter out all results where the DOB does not match exactly, even if the AML database lacks DOB information. This can be particularly useful in low-risk scenarios, such as in low-risk countries where companies primarily serve only domestic, low-risk users.
DOB Estimation
The DOB Estimation feature is a back-end enhancement to our database designed to reduce false positives in AML screening by filtering out irrelevant results, even when an exact date of birth (DOB) isn’t available.
This improvement helps bridge data gaps commonly seen in OSINT (Open Source Intelligence) and AML data sources, where providers often have only partial DOB coverage. By using an estimation approach, we’ve effectively increased DOB coverage, enhancing match precision and reliability.
Key Features
- Reducing False Positives: Decreases irrelevant matches by estimating DOB ranges, making it easier to focus on genuine matches.
- Back-End Integration: This feature works entirely behind the scenes, requiring no input or configuration from the user.
- Improved Compliance Accuracy: Enhances the screening process by effectively filling DOB gaps, which are common in partial data from AML and OSINT sources.
How DOB Estimation Works
- Back-End Life Event Analysis: The system uses available life event data (e.g., employment, education milestones) to infer an estimated age range in cases where the exact DOB is not available.
- Increased DOB Coverage: By filling in the DOB gaps through estimation, we’re increasing our database coverage—meaning more complete profiles.
- Sharper Relevance Filtering: With estimated age ranges, results are now filtered more precisely, helping you focus on what matters and skip over irrelevant matches.
User Impact
- Reduction in False Positives: Users will notice a decrease in false positives, as the system can now filter more precisely, even without complete DOB data.
- Streamlined Screening Experience: Results will be more relevant, reducing time spent on manually dismissing non-relevant matches.
FAQ
Will I need to configure anything to benefit from DOB Estimation?
No configuration is required. DOB Estimation is implemented on the back end and is automatically applied to your searches.
How will DOB Estimation affect my search results?
DOB Estimation enhances the relevance of search results by reducing false positives, especially where DOB data is not available or incomplete, resulting in more accurate matches and a smoother screening process.
The DOB Estimation feature is designed to address common data limitations in AML and OSINT sources, enhancing our database’s DOB coverage and delivering improved compliance outcomes by reducing false positives in user screenings.
What writing systems (languages) does fuzzy search support?
The default language within the SEON system is English. Our more advanced search tools are only available in English, including the fuzzy search engine.
Exact search
We also support other languages, but only exact search will be available. Even so this exact search engine provides robust text processing capabilities to handle various types of text variations and complexities, including ASCII folding for non-ASCII characters, hyphenation and punctuation differences, out-of-order name matching, missing name components, and casing differences. These features allow our search engine to deliver more accurate and relevant search results, even when dealing with challenging text inputs that would otherwise cause errors or miss relevant matches.
Our exact search engine provides support for the following languages:
Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Bulgarian Burmese Catalan Chinese (Hans) Chinese (Hant) Croatian Czech Danish Dutch English Estonian Filipino (Latin) Finnish French Galician Ganda Georgian | German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Italian Japanese Kannada Kazakh Khmer Kinyarwanda Konkani Korean Lao Latvian Lithuanian Macedonian Malay Malayalam Maltese Marathi Mongolian (Cyrillic) Nepali | Norwegian Bokmål Norwegian Nynorsk Oriya Oromo Polish Portuguese Punjabi Romanian Russian Serbian (Latin) Serbian (Cyrillic) Sinhala Slovak Slovenian Spanish Swahili Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Uzbek Vietnamese Welsh Yoruba (Latin) Zulu |