Optimize screening with fuzzy search

Updated on 03.12.25

8 minutes to read

Copy link

Overview

SEON's AML tools help KYC and MLRO officers in their fight against fincrime and money laundering through configurable, real-time screening.

Run quick checks with simple search or use complex search to hone in on the correct person or entity.
You can also use exact search or fuzzy search and sift through data quickly with SEON’s relevancy score system.

Complex search

Complex search (also known as multi-faceted search) allows you to add several distinct data types to your search queries. You can add a country and date of birth (DOB) to your search term to enhance search results.

By default, SEON includes all name matches in your search results, regardless of date of birth and country. This ensures sanctioned individuals or criminals aren’t overlooked due to incomplete information in official databases. However, by using the date-of-birth and country filter settings, you can reduce false positives by filtering out mismatching results.

How fuzzy search works

Fuzzy search is a search algorithm that uses approximate string matching to find results similar to, but not exactly the same as, your search query. It helps you identify risky customers and users when exact matches fail.

Depending on your settings, fuzzy search will serve results similar to your original search query alongside any exact matches. This can help you catch high-risk individuals who'd otherwise fall through the cracks and save your MLRO officers time and effort.

Working on a global scale, as most online businesses do, you'll quickly encounter names originating from different writing systems. Transliteration – transferring written text between writing systems, for example, from Cyrillic or Arabic script to Latin characters – can quickly become a problem.

The results of transliteration depend on the languages involved. For example, Hungarian, Portuguese and English all have different rules for recoding the letters of other writing systems, while each uses the Latin alphabet. That's how Japan's historic capital becomes Kiotó, Quioto, or Kyoto, respectively. You can probably guess how easily that can cause a problem with names in AML checks.

Then there’s the issue of name variants or slightly different spellings of the same name. For example, the name Igor has a less common spelling, Ygor.

Benefits of using different fuzzy profiles

Supports a risk-based approach: Configure multiple fuzzy screening profiles to match different customer segments, products or geographies — ensuring screening sensitivity aligns with the specific level of risk.
Greater flexibility and control: Adjust matching thresholds, data sources, and search parameters to fine-tune results for each use case, from low-risk retail customers to high-value corporate clients.
Reduced false positives: Optimize matching rules per profile to minimize irrelevant alerts while maintaining a strong detection capability.
Improved operational efficiency: Automate screening with pre-defined profiles, saving time and reducing manual review efforts for compliance teams.
Enhanced compliance accuracy: Ensure your screening process consistently meets internal policies and regulatory expectations by tailoring profiles to distinct regulatory environments or risk appetites.
Faster adaptation to regulatory change: Quickly update or create new profiles in response to evolving compliance requirements or emerging risk patterns.
Data-driven decision-making: Gain clearer insights by comparing results across different fuzzy settings to continuously refine and improve your screening strategy.

How to use fuzzy search

Fuzzy search allows you to refine searches in SEON through API integration or fuzzy profiles.

Enabling fuzzy search in API Requests

When sending data via the API, you can enable or disable fuzzy search in two ways. One option is to use the fuzzy_enabled parameter along with a fuzzy_config object – this requires defining all parameters individually in the API payload (see details bellow). Alternatively, you can configure a search profile in the UI and then reference that search profile in the API call. In that case, you don’t need to include the fuzzy_config or sources object in the API request payload, it is sufficient to reference the search profile using "search_profile_id": "ID-FROM-ADMIN".

Adjusting fuzzy search profiles

Customers can configure multiple fuzzy settings profiles directly in the UI. These profiles can be tailored to different screening scenarios, supporting a risk-based approach and ultimately helping to reduce false positives.

1. In Settings, navigate to AML.
2. Head to the Fuzzy settings profile section.
3. Create a new profile or open the one you wish to edit.
4. Adjust the relevancy score and the token length to fit your organization's risk tolerance.
5. Scroll down to test different configurations without impacting any actual settings in use. This sandbox approach lets you experiment with specific risk scenarios before committing changes to your account-wide settings.
6. Click Save settings as a fuzzy profile to apply it across any selected automated and manual search in your account.

Fuzzy profiles can also be added to search profiles, where you can match fuzzy settings with different sources.

Best practices for fuzzy search configuration

First test your fuzzy settings in the fuzzy profile editor to fine-tune settings before applying them.
Start with broader fuzzy settings to capture more variations, then refine for accuracy.
Avoid overly strict settings, as they may block legitimate users or fail to flag risky entities.
Use different configurations for different risk levels, such as stricter thresholds for high-risk transactions.

Example: Fuzzy search in action

Scenario:

A customer signs up as "Johh Smith" instead of their full legal name, "Johnathan Smith."

With default fuzzy search settings

The system may not detect the match, as “Johh Smith” is too different from “Johnathan Smith.”
This could result in a false negative, allowing a potentially high-risk user to bypass screening.

With adjusted fuzzy search settings

Lowering the edit distance threshold for short names (e.g., 4 letters) allows the system to detect misspellings like “Johh” or “John”.
Lowering the relevancy score threshold allows the system to detect larger differences between names.
Result: The correct match is found despite the shortened or misspelled name.

Caution: Loosening fuzzy search parameters can increase the false positive rate, so adjustments should be made carefully.

Relevancy score

The relevancy score will help you determine how closely fuzzy search results match the search name you entered. You can change your relevancy score threshold via your AML API request to ensure your team doesn't encounter a high number of false positives.

The adjustable scoring thresholds for the relevancy score allow your team to set the sensitivity of the fuzzy search engine. In low-risk cases, such as PEP checks, you can choose to set the scoring threshold so strictly that only results with a full match of date of birth (DOB) and name occur.

In high-risk cases, such as sanction screening, we recommend verifying every hit manually with a lower relevancy threshold. With a lower threshold setting, only names match using fuzzy search to ensure that the client is not a criminal or sanctioned individual.

Please be careful, as these settings can drastically increase the number of false negatives and false positives. A higher score threshold increases the number of false negatives (lower recall), and a lower score threshold increases the number of false positives.

But with a smart threshold policy, you can reduce manual workloads in low-risk cases and concentrate on high-profile investigations where human decision-making is essential. The default settings will serve you best in most cases.

Fuzzy search settings

When using the AML API through an API integration, you can customize your fuzzy search settings either via the UI in the search profile settings or by including parameters directly in your API request.

Learn more: To see how you should structure your API request and the parameters below head over to the API reference page.

Include your fuzzy parameters nested within the fuzzy_config parameter. You can use the parameters below to tweak what kind of results fuzzy search returns.

Did you know: Tokens are the name elements you enter into an AML API lookup. You can separate tokens using a space, a comma or a full stop. For example, the search terms 'Serhiy Kunitsyn'; 'Serhiy.Kunitsyn' or 'Serhiy,Kunitsyn' become two 2 tokens: 'Serhiy' and 'Kunitsyn'.

phonetic_search_enabled (Default: False)
If enabled, the parameter will turn on SEON’s phonetic search module. This means that the tokens entered into the AML API are converted into a phonetic representation using the double metaphone, koelnerphonetik, haasephonetik, beider-morse and daitch-mokotoff algorithms. When enabled, the AML lookup will only use these phonetic representations of the entered name and those in the database.
phonetic_term_threshold (Default: 5)
If a word in the input is shorter than this value, phonetic search won’t be applied to it. Example: With a threshold of 5, “John” is skipped, but “Albert” is included.
phonetic_character_threshold (Default: 12)
If a word has more characters than this value, phonetic search is enabled (if phonetic search is turned on).

Note: Phonetic search will be disabled automatically for tokens shorter than 7 characters or if the full name entered in your search is shorter than 12 characters or the full name is shorter than 15 characters and contains a token with fewer than 5 characters.

edit_distance_enabled (Default: True)
Edit distance is the number of single-character changes needed to turn one term into another (e.g. mat to bat has an edit distance of 1). When set to True, AML lookups will return names similar to the search term entered: e.g.: 'Anastasia' matches 'Anastasya'. When you enter a search query, our system compares it to the names in our database. If a name token (a name element) has a length equal to or greater than 7 letters, we allow for 1-character edits to find potential matches. If the token length is equal to or greater than 13, we allow for 2-character edits to find potential matches.
For example, you enter the name "Tetjana Donez" which consists of two tokens. With the default value of 7, for each token with a character length above this value, our system will search for variations with 1 edit distance (single-character changes) to find potential matches.
For example, "Tetjana Donez" will be considered a match with "Tetiana Donets" because they differ by just one character.
edit_distance_1_threshold
Defines how many character changes are allowed for short terms.
edit_distance_2_threshold
Defines how many character changes are allowed for longer terms.
min_nr_token_match (Default: 67, which means that at least 67% of tokens must match)
Defines the percentage of name tokens that must match for a valid comparison. Default: 67 (at least 67% of tokens must match). Range: 0 - 100 (100 requires a full match). Example: With 67, "Alexander Gahon Gesmundo" matches "Alexander Gesmundo", but with 100, it does not. Adjust for stricter or more flexible name matching.
enable_lastname_detection (Default: False)
Function: When enabled, the search and result names are split into first names and last names. The filter only applies if a contradiction is detected.
How it works:
- Match: "John Smith" and "John Adam Smith" ? Not filtered out (since "Adam" could be a middle name).
- No Match: "Adam Smith" and "John Adam Smith" ? Filtered out (since "Adam" is a first name in one case but a middle name in the other, creating a contradiction).
glued_words_splitting
Handles names that may be incorrectly merged. Options:
- off: Never used (default).
- on: Always active.
- fallback: Used only when no exact match is found.

Entity only search parameters

SEON separates company designators (e.g., Ltd, LLC, GmbH) from the base name and uses a centralized designator dictionary to handle abbreviations, translations and country-specific variants to reduces false positives and improve cross-border matching.

allow_designator_translation (Default: False)
If this parameter is true, corresponding designators from different countries are treated as aliases (e.g. Hungarian KFT is LLC). It affects the scoring and filtering of exact search. For example "Company KFT" will be returned even if "Company LLC" is searched for. The default is false.
filter_mismatching_country_designator (Default: False)
Blocks matches if designator doesn’t match expected country mapping.
filter_mismatching_country (Default: True)
Blocks matches where the country differs from the input.

Note: Important entity parameters work only in Fraud API integration, and their type should be set to unknown.

Adverse media configs

Adverse media fuzziness: Distinct fuzzy settings should be applied for adverse media searches. A value of 0 means an exact match, while 1 indicates high fuzziness.
Adverse media DOB filter: Filters out results where the input date of birth does not match the result’s date of birth, based on name similarity.
Adverse media country filter: Excludes results where the input country does not match the result’s country, based on name similarity.

Scoring parameters

Scoring helps you configure each search effectively. It provides an objective measure of the probability that a search term matches the same person as the result.

result_limit: Limits the number of results returned.
score_threshold: Sets the minimum relevancy score for results to be included. (Recommended: 0.585)

Matching and scoring criteria

The API ranks search results based on a weighted scoring system, prioritizing:

Token similarity (70%): How closely names match. Determined by edit distance and order.
Name IDF score (8.5%): Frequency of the name in the dataset.
Date of birth (15%): If DOB is available, it heavily influences the ranking.
Year keyword score (4%): The relevance of the year in the dataset.
Country keyword score (1.6%): Relevance based on country name.
Country text IDF score (0.9%): Frequency of country-related terms.

Did you know: Fuzzy search is enabled by default; however, if you want to disable it and turn on exact search for a manual AML lookup, you can do so by using the toggle under the search details on the Manual Lookup page.

Result filtering

filter_mismatching_dob: If set to true, it filters out all results where there is a discrepancy between the searched date of birth (DOB) and the matched records based on name similarity but with a different DOB.
filter_missing_dob: When enabled, if you provide a date of birth (DOB) in your request but the source data does not include a DOB, the result will be excluded from the search results.

DOB estimation

The DOB estimation feature reduces false positives in AML screening by filtering out irrelevant results, even when an exact date of birth (DOB) isn’t available.

This helps bridge data gaps commonly seen in open source intelligence (OSINT) and AML data sources. The estimation approach increases DOB coverage, enhancing match precision and reliability.

Key features

Reducing false positives: Decreases irrelevant matches by estimating DOB ranges, making it easier to focus on genuine matches.
Back-end integration: This feature works entirely behind the scenes, requiring no input or configuration from the user.
Improved compliance accuracy: Enhances the screening process by effectively filling DOB gaps, which are common in partial data from AML and OSINT sources.
Streamlined screening experience: Results will be more relevant, reducing time spent on manually dismissing non-relevant matches.

How DOB estimation works

Back-end life event analysis: The system uses available life event data (e.g., employment, education milestones) to infer an estimated age range in cases where the exact DOB is not available.
Increased DOB coverage: By filling DOB gaps through estimation, the database coverage is increased, resulting in more complete profiles.
Sharper relevance filtering: With estimated age ranges, results are now filtered more precisely, helping you focus on what matters and skip over irrelevant matches.

FAQ

Will I need to configure anything to benefit from DOB Estimation?

No configuration is required. DOB Estimation is implemented on the back end and is automatically applied to your searches.

How will DOB Estimation affect my search results?

DOB Estimation enhances the relevance of search results by reducing false positives, especially where DOB data is not available or incomplete, resulting in more accurate matches and a smoother screening process.

The DOB Estimation feature is designed to address common data limitations in AML and OSINT sources, enhancing our database’s DOB coverage and delivering improved compliance outcomes by reducing false positives in user screenings.

What writing systems (languages) does fuzzy search support?

The default language within the SEON system is English. Our more advanced search tools are only available in English, including the fuzzy search engine.

Exact search

We also support other languages, but only exact search will be available. Even so this exact search engine provides robust text processing capabilities to handle various types of text variations and complexities, including ASCII folding for non-ASCII characters, hyphenation and punctuation differences, out-of-order name matching, missing name components, and casing differences. These features allow our search engine to deliver more accurate and relevant search results, even when dealing with challenging text inputs that would otherwise cause errors or miss relevant matches.

Our exact search engine provides support for the following languages:

Afrikaans
Albanian
Amharic
Arabic
Armenian
Assamese
Azerbaijani (Latin)
Basque
Belarusian
Bengali
Bosnian
Bulgarian
Burmese
Catalan
Chinese (Hans)
Chinese (Hant)
Croatian
Czech
Danish
Dutch
English
Estonian
Filipino (Latin)
Finnish
French
Galician
Ganda
Georgian

German
Greek
Gujarati
Hausa
Hebrew
Hindi
Hungarian
Icelandic
Igbo
Indonesian
Italian
Japanese
Kannada
Kazakh
Khmer
Kinyarwanda
Konkani
Korean
Lao
Latvian
Lithuanian
Macedonian
Malay
Malayalam
Maltese
Marathi
Mongolian (Cyrillic)
Nepali

Norwegian Bokmål
Norwegian Nynorsk
Oriya
Oromo
Polish
Portuguese
Punjabi
Romanian
Russian
Serbian (Latin)
Serbian (Cyrillic)
Sinhala
Slovak
Slovenian
Spanish
Swahili
Swedish
Tamil
Telugu
Thai
Turkish
Ukrainian
Urdu
Uzbek
Vietnamese
Welsh
Yoruba (Latin)
Zulu