Using Fuzzy Search in AML queries

Updated on 28.03.25

8 minutes to read

Copy link

Overview

SEON's AML tools offer a wide range of solutions and configuration options to support KYC & MLRO officers in their fight against fincrime and money laundering. Run quick checks with simple search, or use or complex search to hone in on the correct person. You can also decide to use exact search or fuzzy search and shift through their data quickly with our relevancy score system.

Complex search

Sometimes also known as multi-faceted search, complex search allows you to add several distinct data types to your search queries. For example, you can add a country and date of birth to your search term to enhance search results.

By default, SEON will include all name matches in your search results regardless of date of birth and country data. As official sources sometimes lack date of birth and country data this is the best way to ensure that sanctioned individuals or criminals aren't missed by checks due to incomplete information.

Why use Fuzzy Search

Fuzzy search is a search algorithm that uses approximate string matching to find results similar to, but not exactly the same as, your search query. It helps you identify risky customers and users when exact matches fail.

Depending on your settings, fuzzy search will serve results similar to your original search query alongside any exact matches. This can help you catch high-risk individuals who'd otherwise fall through the cracks and save your MLRO officers time and effort.

Working on a global scale, as most online businesses do, you'll quickly encounter names originating from different writing systems. Transliteration – transferring written text between writing systems, for example, from Cyrillic or Arabic script to Latin characters – can quickly become a problem.

The results of transliteration depend on the languages involved. For example, Hungarian, Portuguese and English all have different rules for recoding the letters of other writing systems, while each uses the Latin alphabet. That's how Japan's historic capital becomes Kiotó, Quioto, or Kyoto, respectively. You can probably guess how easily that became a problem with the names of people and AML checks.

Then there's the issue of name variants or slightly different spellings of the same name. For example, the name Igor has a less common spelling, Ygor.

How to use Fuzzy Search

Fuzzy Search allows you to refine searches in SEON, through API integration or manual lookups.

Enabling Fuzzy Search in API Requests

When sending data via API, you can enable or disable fuzzy search using the fuzzy_enabled parameter. Additionally, you can fine-tune fuzzy search settings based on your needs. However, adjusting these settings can significantly impact results, increasing the likelihood of false negatives. In most cases, the default settings provide the best balance.

Adjusting Fuzzy Search in the SEON UI

Customers can now configure Fuzzy Search settings directly in the UI, offering two ways to refine searches:

1. Setting Global Fuzzy Search Defaults

Navigate to Settings > System > AML > Fuzzy Search.
Make adjustments, such as, edit distance, relevancy score and token length to fit your organization’s risk tolerance.
Save settings as the default to apply them across all automated and manual searches in your account.

2. Testing and Fine-Tuning via Manual Lookup

Navigate to Manual Lookup > AML.
Scroll down and open Advanced Fuzzy Search Settings.
Adjust search thresholds to test different configurations without impacting global settings.
If desired, you can Save settings as default for what you confgured.

This sandbox approach lets you experiment with specific risk scenarios before committing changes to your account-wide settings.

Best Practices for Fuzzy Search Configuration

Use Manual Lookup first to fine-tune settings before applying them globally.
Start with broader fuzzy settings to capture more variations, then refine for accuracy.
Avoid overly strict settings, as they may block legitimate users or fail to flag risky entities.
Use different configurations for different risk levels, such as stricter thresholds for high-risk transactions.

Example: Fuzzy Search in Action

Scenario:

A customer signs up as "Johh Smith" instead of their full legal name, "Johnathan Smith."

Without Fuzzy Search

The system may not detect the match, as “Johh Smith” is too different from “Johnathan Smith.”
This could result in a false negative, allowing a potentially high-risk user to bypass screening.

With Default Fuzzy Search Settings

The system identifies partial matches, but the default settings may still miss more complex name variations.
Example Match: "John Smith" (missing full name but still partially relevant).
With Adjusted Fuzzy Search Settings
Increasing the edit distance allows for small misspellings like “Johh” instead of "John."
Reducing the relevancy score threshold ensures that names with minor variations (e.g., "Johnathan" vs. "John") are still detected.
Example Match: "Johnathan Smith" (correct match found despite a shortened name).

Relevancy score

AML API's Relevancy Score feature will help you determine how closely search results found using fuzzy search match the search name you entered. You can change your relevancy score threshold via your AML API request, to ensure your team doesn't encounter a high number of false positives.

The adjustable scoring thresholds or the relevancy score allows your team to set the sensitivity of the fuzzy search engine. In low-risk cases, such as PEP checks, you can chose to set the scoring threshold so strictly that only results with a full match of date of birth and name occur.

However, in high-risk cases such as sanction screening, we suggest that users verify every hit manually with a lower relevancy threshold where only names match using fuzzy search to ensure that the client is not a criminal or sanctioned individual.

Please be careful, as these settings can drastically increase the number of false negatives and false positives experienced by your team. A higher score threshold increases the number of false negatives (lower recall), and a lower score threshold increases the number of false positives.

But with a smart threshold policy, you can reduce manual workloads in low-risk cases, and concentrate on high-profile investigations where human decision-making is essential. The default settings will serve you best in most cases.

Note: Fuzzy search cannot be applied for entity searches; you will only get exact matches for these queries.

Fuzzy Search Settings

When using AML API over an API integration, you can customize your fuzzy search settings by including parameters in your API request.

Learn more: To see how you should structure your API request and the parameters below head over to the API reference page.

Include your fuzzy parameters nested within the config.fuzzy_config parameter. You can use the parameters below to tweak what kind of results fuzzy search returns.

Did you know: In AML API tokens are the name elements you enter into an AML API lookup. You can separate tokens (name elements) using a space, a comma or a full stop. For example, the search terms 'Serhiy Kunitsyn'; 'Serhiy.Kunitsyn' or 'Serhiy,Kunitsyn' become two 2 tokens: 'Serhiy' and 'Kunitsyn'.

phonetic_search_enabled – Default setting: False.
If enabled the parameter will turn on SEON's Phoinetic Search module. This means that the tokens entered into AML API are converted into a phonetic representation using the double metaphone, koelnerphonetik, haasephonetik, beider-morse, and daitch-mokotoff algorithms. When enabled the AML lookup will only use these phonetic representations of entered name and those in the database.
phonetic_term_threshold – (Default: 5) If a word in the input is shorter than this value, phonetic search won’t be applied to it. Example: With a threshold of 5, “John” is skipped, but “Albert” is included.
phonetic_character_threshold – (Default: 12) If a word has more characters than this value, phonetic search is enabled – only if phonetic search is turned on.

Note: Phonetic Search will be disabled automatically for tokens shorter than 7 characters or if the full name entered in your search is shorter than 12 characters or the full name is shorter than 15 characters and contains a token with fewer than 5 characters.

edit_distance_enabled – Default setting: True.
edit distance is the number of single-character changes needed to turn one term into another (e.g. mat » bat has an edit distance of 1). When set to True, AML lookups will return names similar to the search term entered: e.g.: 'Anastasia' matches 'Anastasya'. When you enter a search query, our system compares it to the names in our database. If a name token (a name element) has a length equal to or greater than 7 letters, we allow for 1-character edits to find potential matches. If the token length is equal to or greater than 13, we allow for 2-character edits to find potential matches.
For example, you enter the name "Tetjana Donez" which consists of two tokens. With the default value of 7, for each token with a character length above this value, our system will search for variations with 1 edit distance (single-character changes) to find potential matches.
For example, "Tetjana Donez" will be considered a match with "Tetiana Donets" because they differ by just one character.
edit_distance_1_threshold – Defines how many character changes are allowed for short terms.
edit_distance_2_threshold – Defines how many character changes are allowed for longer terms.
min_nr_token_match: Defines the percentage of name tokens that must match for a valid comparison. Default: 67 (at least 67% of tokens must match). Range: 0 - 100 (100 requires a full match). Example: With 67, "Alexander Gahon Gesmundo" matches "Alexander Gesmundo", but with 100, it does not. Adjust for stricter or more flexible name matching.
enable_lastname_detection – Default: false, Function: When enabled, the search name and result name are split into first name(s) and last name(s). The filter only applies if a contradiction is detected.
How It Works:
- Match: "John Smith" and "John Adam Smith" ? Not filtered out (since "Adam" could be a middle name).
- No Match: "Adam Smith" and "John Adam Smith" ? Filtered out (since "Adam" is a first name in one case but a middle name in the other, creating a contradiction).
glued_words_splitting – Handles names that may be incorrectly merged. Options:
- off: Never used (default).
- on: Always active.
- fallback: Used only when no exact match is found.

Adverse media configs

Adverse Media Fuzziness: Distinct fuzzy settings should be applied for adverse media searches. A value of 0 means an exact match, while 1 indicates high fuzziness.
Adverse Media DOB Filter: Filters out results where the input date of birth does not match the result’s date of birth, based on name similarity.
Adverse Media Country Filter: Excludes results where the input country does not match the result’s country, based on name similarity.

Scoring Parameters

Scoring helps you configure each search effectively. It provides an objective measure of the probability that a search term matches the same person as the result.

result_limit – Limits the number of results returned.
score_threshold – Sets the minimum relevancy score for results to be included. (Recommended: 0.585)

Matching & Scoring Criteria

The API ranks search results based on a weighted scoring system, prioritizing:

Token Similarity (70%) – How closely names match, determined by edit distance and order.
Name IDF Score (8.5%) – Frequency of the name in the dataset.
Date of Birth (15%) – If DOB is available, it heavily influences the ranking.
Year Keyword Score (4%) – The relevance of the year in the dataset.
Country Keyword Score (1.6%) – Relevance based on country name.
Country Text IDF Score (0.9%) – Frequency of country-related terms.

Did you know: Fuzzy search is enabled by default, however, if you want to disable it and turn on exact search for a manual AML lookup, you can do so by using the toggle under the search details on the Manual Lookup page.

Result filtering

filter_mismatching_dob – If set to true, it filters out all results where there is a discrepancy between the searched date of birth (DOB) and the matched records based on name similarity but with a different DOB.
filter_missing_dob – When enabled, if you provide a date of birth (DOB) in your request but the source data does not include a DOB, the result will be excluded from the search results.

DOB Estimation

The DOB Estimation feature is a back-end enhancement to our database designed to reduce false positives in AML screening by filtering out irrelevant results, even when an exact date of birth (DOB) isn’t available.

This improvement helps bridge data gaps commonly seen in OSINT (Open Source Intelligence) and AML data sources, where providers often have only partial DOB coverage. By using an estimation approach, we’ve effectively increased DOB coverage, enhancing match precision and reliability.

Key Features

Reducing False Positives: Decreases irrelevant matches by estimating DOB ranges, making it easier to focus on genuine matches.
Back-End Integration: This feature works entirely behind the scenes, requiring no input or configuration from the user.
Improved Compliance Accuracy: Enhances the screening process by effectively filling DOB gaps, which are common in partial data from AML and OSINT sources.

How DOB Estimation Works

Back-End Life Event Analysis: The system uses available life event data (e.g., employment, education milestones) to infer an estimated age range in cases where the exact DOB is not available.
Increased DOB Coverage: By filling in the DOB gaps through estimation, we’re increasing our database coverage—meaning more complete profiles.
Sharper Relevance Filtering: With estimated age ranges, results are now filtered more precisely, helping you focus on what matters and skip over irrelevant matches.

User Impact

Reduction in False Positives: Users will notice a decrease in false positives, as the system can now filter more precisely, even without complete DOB data.
Streamlined Screening Experience: Results will be more relevant, reducing time spent on manually dismissing non-relevant matches.

FAQ

Will I need to configure anything to benefit from DOB Estimation?

No configuration is required. DOB Estimation is implemented on the back end and is automatically applied to your searches.

How will DOB Estimation affect my search results?

DOB Estimation enhances the relevance of search results by reducing false positives, especially where DOB data is not available or incomplete, resulting in more accurate matches and a smoother screening process.

The DOB Estimation feature is designed to address common data limitations in AML and OSINT sources, enhancing our database’s DOB coverage and delivering improved compliance outcomes by reducing false positives in user screenings.

What writing systems (languages) does fuzzy search support?

The default language within the SEON system is English. Our more advanced search tools are only available in English, including the fuzzy search engine.

Exact search

We also support other languages, but only exact search will be available. Even so this exact search engine provides robust text processing capabilities to handle various types of text variations and complexities, including ASCII folding for non-ASCII characters, hyphenation and punctuation differences, out-of-order name matching, missing name components, and casing differences. These features allow our search engine to deliver more accurate and relevant search results, even when dealing with challenging text inputs that would otherwise cause errors or miss relevant matches.

Our exact search engine provides support for the following languages:

Afrikaans
Albanian
Amharic
Arabic
Armenian
Assamese
Azerbaijani (Latin)
Basque
Belarusian
Bengali
Bosnian
Bulgarian
Burmese
Catalan
Chinese (Hans)
Chinese (Hant)
Croatian
Czech
Danish
Dutch
English
Estonian
Filipino (Latin)
Finnish
French
Galician
Ganda
Georgian

German
Greek
Gujarati
Hausa
Hebrew
Hindi
Hungarian
Icelandic
Igbo
Indonesian
Italian
Japanese
Kannada
Kazakh
Khmer
Kinyarwanda
Konkani
Korean
Lao
Latvian
Lithuanian
Macedonian
Malay
Malayalam
Maltese
Marathi
Mongolian (Cyrillic)
Nepali

Norwegian Bokmål
Norwegian Nynorsk
Oriya
Oromo
Polish
Portuguese
Punjabi
Romanian
Russian
Serbian (Latin)
Serbian (Cyrillic)
Sinhala
Slovak
Slovenian
Spanish
Swahili
Swedish
Tamil
Telugu
Thai
Turkish
Ukrainian
Urdu
Uzbek
Vietnamese
Welsh
Yoruba (Latin)
Zulu