Optimize screening with fuzzy search
Updated on 14.11.25
8 minutes to read
Copy link
Overview
SEON's AML tools help KYC and MLRO officers in their fight against fincrime and money laundering through configurable, real-time screening.
- Run quick checks with simple search or use complex search to hone in on the correct person.
- You can also use exact search or fuzzy search and sift through data quickly with SEON’s relevancy score system.
Complex search
Complex search (also known as multi-faceted search) allows you to add several distinct data types to your search queries. For example, you can add a country and date of birth (DOB) to your search term to enhance search results.
By default, SEON includes all name matches in your search results, regardless of date of birth and country. This ensures sanctioned individuals or criminals aren’t overlooked due to incomplete information in official databases.
How fuzzy search works
Fuzzy search is a search algorithm that uses approximate string matching to find results similar to, but not exactly the same as, your search query. It helps you identify risky customers and users when exact matches fail.
Depending on your settings, fuzzy search will serve results similar to your original search query alongside any exact matches. This can help you catch high-risk individuals who'd otherwise fall through the cracks and save your MLRO officers time and effort.
Working on a global scale, as most online businesses do, you'll quickly encounter names originating from different writing systems. Transliteration – transferring written text between writing systems, for example, from Cyrillic or Arabic script to Latin characters – can quickly become a problem.
The results of transliteration depend on the languages involved. For example, Hungarian, Portuguese and English all have different rules for recoding the letters of other writing systems, while each uses the Latin alphabet. That's how Japan's historic capital becomes Kiotó, Quioto, or Kyoto, respectively. You can probably guess how easily that can cause a problem with names in AML checks.
Then there’s the issue of name variants or slightly different spellings of the same name. For example, the name Igor has a less common spelling, Ygor.
Benefits of using different fuzzy profiles
- Supports a risk-based approach: Configure multiple fuzzy screening profiles to match different customer segments, products or geographies — ensuring screening sensitivity aligns with the specific level of risk.
- Greater flexibility and control: Adjust matching thresholds, data sources, and search parameters to fine-tune results for each use case, from low-risk retail customers to high-value corporate clients.
- Reduced false positives: Optimize matching rules per profile to minimize irrelevant alerts while maintaining a strong detection capability.
- Improved operational efficiency: Automate screening with pre-defined profiles, saving time and reducing manual review efforts for compliance teams.
- Enhanced compliance accuracy: Ensure your screening process consistently meets internal policies and regulatory expectations by tailoring profiles to distinct regulatory environments or risk appetites.
- Faster adaptation to regulatory change: Quickly update or create new profiles in response to evolving compliance requirements or emerging risk patterns.
- Data-driven decision-making: Gain clearer insights by comparing results across different fuzzy settings to continuously refine and improve your screening strategy.
How to use fuzzy search
Fuzzy search allows you to refine searches in SEON through API integration or fuzzy profiles.
Enabling fuzzy search in API Requests
When sending data via API, you can enable or disable fuzzy search using the fuzzy_enabled parameter. Additionally, you can fine-tune fuzzy search settings based on your needs. However, adjusting these settings can significantly impact results, increasing the likelihood of false negatives. In most cases, the default settings provide the best balance.
Adjusting fuzzy search profiles
Customers can configure multiple fuzzy settings profiles directly in the UI. These profiles can be tailored to different screening scenarios, supporting a risk-based approach and ultimately helping to reduce false positives.
1. In Settings, navigate to AML.
2. Head to the Fuzzy settings profile section.
3. Create a new profile or open the one you wish to edit.
4. Adjust the relevancy score and the token length to fit your organization's risk tolerance.
5. Scroll down to test different configurations without impacting any actual settings in use. This sandbox approach lets you experiment with specific risk scenarios before committing changes to your account-wide settings.
6. Click Save settings as a fuzzy profile to apply it across any selected automated and manual search in your account.
Fuzzy profiles can also be added to search profiles, where you can match fuzzy settings with different sources.
Best practices for fuzzy search configuration
- First test your fuzzy settings in the fuzzy profile editor to fine-tune settings before applying them.
- Start with broader fuzzy settings to capture more variations, then refine for accuracy.
- Avoid overly strict settings, as they may block legitimate users or fail to flag risky entities.
- Use different configurations for different risk levels, such as stricter thresholds for high-risk transactions.
Example: Fuzzy search in action
Scenario:
A customer signs up as "Johh Smith" instead of their full legal name, "Johnathan Smith."
Without fuzzy search
- The system may not detect the match, as “Johh Smith” is too different from “Johnathan Smith.”
- This could result in a false negative, allowing a potentially high-risk user to bypass screening.
With default fuzzy search settings
- The system identifies partial matches, but the default settings may still miss more complex name variations.
- Example match: "John Smith" (missing full name but still partially relevant).
With adjusted fuzzy search settings
- Increasing the edit distance allows for small misspellings like “Johh” instead of "John."
Reducing the relevancy score threshold ensures that names with minor variations (e.g., "Johnathan" vs. "John") are still detected. - Example match: "Johnathan Smith" (correct match found despite a shortened name).
Relevancy score
The relevancy score will help you determine how closely fuzzy search results match the search name you entered. You can change your relevancy score threshold via your AML API request to ensure your team doesn't encounter a high number of false positives.
The adjustable scoring thresholds for the relevancy score allow your team to set the sensitivity of the fuzzy search engine. In low-risk cases, such as PEP checks, you can choose to set the scoring threshold so strictly that only results with a full match of date of birth (DOB) and name occur.
In high-risk cases, such as sanction screening, we recommend verifying every hit manually with a lower relevancy threshold. With a lower threshold setting, only names match using fuzzy search to ensure that the client is not a criminal or sanctioned individual.
Please be careful, as these settings can drastically increase the number of false negatives and false positives. A higher score threshold increases the number of false negatives (lower recall), and a lower score threshold increases the number of false positives.
But with a smart threshold policy, you can reduce manual workloads in low-risk cases and concentrate on high-profile investigations where human decision-making is essential. The default settings will serve you best in most cases.
Fuzzy search settings
When using the AML API over an API integration, you can customize your fuzzy search settings by including parameters in your API request.
Include your fuzzy parameters nested within the config.fuzzy_config parameter. You can use the parameters below to tweak what kind of results fuzzy search returns.
phonetic_search_enabled(Default: False)
If enabled, the parameter will turn on SEON’s phonetic search module. This means that the tokens entered into the AML API are converted into a phonetic representation using the double metaphone, koelnerphonetik, haasephonetik, beider-morse and daitch-mokotoff algorithms. When enabled, the AML lookup will only use these phonetic representations of the entered name and those in the database.phonetic_term_threshold(Default: 5)
If a word in the input is shorter than this value, phonetic search won’t be applied to it. Example: With a threshold of 5, “John” is skipped, but “Albert” is included.phonetic_character_threshold(Default: 12)
If a word has more characters than this value, phonetic search is enabled (if phonetic search is turned on).
edit_distance_enabled(Default: True)
Edit distance is the number of single-character changes needed to turn one term into another (e.g. mat to bat has an edit distance of 1). When set to True, AML lookups will return names similar to the search term entered: e.g.: 'Anastasia' matches 'Anastasya'. When you enter a search query, our system compares it to the names in our database. If a name token (a name element) has a length equal to or greater than 7 letters, we allow for 1-character edits to find potential matches. If the token length is equal to or greater than 13, we allow for 2-character edits to find potential matches.
For example, you enter the name "Tetjana Donez" which consists of two tokens. With the default value of 7, for each token with a character length above this value, our system will search for variations with 1 edit distance (single-character changes) to find potential matches.
For example, "Tetjana Donez" will be considered a match with "Tetiana Donets" because they differ by just one character.edit_distance_1_threshold
Defines how many character changes are allowed for short terms.edit_distance_2_threshold
Defines how many character changes are allowed for longer terms.min_nr_token_match(Default: 67, which means that at least 67% of tokens must match)
Defines the percentage of name tokens that must match for a valid comparison. Default: 67 (at least 67% of tokens must match). Range: 0 - 100 (100 requires a full match). Example: With 67, "Alexander Gahon Gesmundo" matches "Alexander Gesmundo", but with 100, it does not. Adjust for stricter or more flexible name matching.enable_lastname_detection(Default: False)
Function: When enabled, the search and result names are split into first names and last names. The filter only applies if a contradiction is detected.
How it works:- Match: "John Smith" and "John Adam Smith" ? Not filtered out (since "Adam" could be a middle name).
- No Match: "Adam Smith" and "John Adam Smith" ? Filtered out (since "Adam" is a first name in one case but a middle name in the other, creating a contradiction).
glued_words_splitting
Handles names that may be incorrectly merged. Options:- off: Never used (default).
- on: Always active.
- fallback: Used only when no exact match is found.
Entity search parameters
SEON separates company designators (e.g., Ltd, LLC, GmbH) from the base name and uses a centralized designator dictionary to handle abbreviations, translations and country-specific variants to reduces false positives and improve cross-border matching.
allow_designator_translation(Default: False)
If this parameter is true, corresponding designators from different countries are treated as aliases (e.g. Hungarian KFT is LLC). It affects the scoring and filtering of exact search. For example "Company KFT" will be returned even if "Company LLC" is searched for. The default is false.filter_mismatching_country_designator(Default: False)
Blocks matches if designator doesn’t match expected country mapping.filter_mismatching_country(Default: True)
Blocks matches where the country differs from the input.
Adverse media configs
- Adverse media fuzziness: Distinct fuzzy settings should be applied for adverse media searches. A value of 0 means an exact match, while 1 indicates high fuzziness.
- Adverse media DOB filter: Filters out results where the input date of birth does not match the result’s date of birth, based on name similarity.
- Adverse media country filter: Excludes results where the input country does not match the result’s country, based on name similarity.
Scoring parameters
Scoring helps you configure each search effectively. It provides an objective measure of the probability that a search term matches the same person as the result.
result_limit: Limits the number of results returned.score_threshold: Sets the minimum relevancy score for results to be included. (Recommended: 0.585)
Matching and scoring criteria
The API ranks search results based on a weighted scoring system, prioritizing:
- Token similarity (70%): How closely names match. Determined by edit distance and order.
- Name IDF score (8.5%): Frequency of the name in the dataset.
- Date of birth (15%): If DOB is available, it heavily influences the ranking.
- Year keyword score (4%): The relevance of the year in the dataset.
- Country keyword score (1.6%): Relevance based on country name.
- Country text IDF score (0.9%): Frequency of country-related terms.
Result filtering
filter_mismatching_dob: If set to true, it filters out all results where there is a discrepancy between the searched date of birth (DOB) and the matched records based on name similarity but with a different DOB.filter_missing_dob: When enabled, if you provide a date of birth (DOB) in your request but the source data does not include a DOB, the result will be excluded from the search results.
DOB estimation
The DOB estimation feature reduces false positives in AML screening by filtering out irrelevant results, even when an exact date of birth (DOB) isn’t available.
This helps bridge data gaps commonly seen in open source intelligence (OSINT) and AML data sources. The estimation approach increases DOB coverage, enhancing match precision and reliability.
Key features
- Reducing false positives: Decreases irrelevant matches by estimating DOB ranges, making it easier to focus on genuine matches.
- Back-end integration: This feature works entirely behind the scenes, requiring no input or configuration from the user.
- Improved compliance accuracy: Enhances the screening process by effectively filling DOB gaps, which are common in partial data from AML and OSINT sources.
- Streamlined screening experience: Results will be more relevant, reducing time spent on manually dismissing non-relevant matches.
How DOB estimation works
- Back-end life event analysis: The system uses available life event data (e.g., employment, education milestones) to infer an estimated age range in cases where the exact DOB is not available.
- Increased DOB coverage: By filling DOB gaps through estimation, the database coverage is increased, resulting in more complete profiles.
- Sharper relevance filtering: With estimated age ranges, results are now filtered more precisely, helping you focus on what matters and skip over irrelevant matches.
FAQ
Will I need to configure anything to benefit from DOB Estimation?
No configuration is required. DOB Estimation is implemented on the back end and is automatically applied to your searches.
How will DOB Estimation affect my search results?
DOB Estimation enhances the relevance of search results by reducing false positives, especially where DOB data is not available or incomplete, resulting in more accurate matches and a smoother screening process.
The DOB Estimation feature is designed to address common data limitations in AML and OSINT sources, enhancing our database’s DOB coverage and delivering improved compliance outcomes by reducing false positives in user screenings.
What writing systems (languages) does fuzzy search support?
The default language within the SEON system is English. Our more advanced search tools are only available in English, including the fuzzy search engine.
Exact search
We also support other languages, but only exact search will be available. Even so this exact search engine provides robust text processing capabilities to handle various types of text variations and complexities, including ASCII folding for non-ASCII characters, hyphenation and punctuation differences, out-of-order name matching, missing name components, and casing differences. These features allow our search engine to deliver more accurate and relevant search results, even when dealing with challenging text inputs that would otherwise cause errors or miss relevant matches.
Our exact search engine provides support for the following languages:
| Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Bulgarian Burmese Catalan Chinese (Hans) Chinese (Hant) Croatian Czech Danish Dutch English Estonian Filipino (Latin) Finnish French Galician Ganda Georgian | German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Italian Japanese Kannada Kazakh Khmer Kinyarwanda Konkani Korean Lao Latvian Lithuanian Macedonian Malay Malayalam Maltese Marathi Mongolian (Cyrillic) Nepali | Norwegian Bokmål Norwegian Nynorsk Oriya Oromo Polish Portuguese Punjabi Romanian Russian Serbian (Latin) Serbian (Cyrillic) Sinhala Slovak Slovenian Spanish Swahili Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Uzbek Vietnamese Welsh Yoruba (Latin) Zulu |