Using Fuzzy Search in AML queries

Updated on 28.03.25
8 minutes to read
Copy link

Overview

SEON's AML tools offer a wide range of solutions and configuration options to support KYC & MLRO officers in their fight against fincrime and money laundering. Run quick checks with simple search, or use or complex search to hone in on the correct person. You can also decide to use exact search or fuzzy search and shift through their data quickly with our relevancy score system.

 

Fuzzy Search Settings

When using AML API over an API integration, you can customize your fuzzy search settings by including parameters in your API request.

Include your fuzzy parameters nested within the config.fuzzy_config parameter. You can use the parameters below to tweak what kind of results fuzzy search returns.

  • phonetic_search_enabled – Default setting: False. 
    If enabled the parameter will turn on SEON's Phoinetic Search module. This means that the tokens entered into AML API are converted into a phonetic representation using the double metaphone, koelnerphonetik, haasephonetik, beider-morse, and daitch-mokotoff algorithms. When enabled the AML lookup will only use these phonetic representations of entered name and those in the database.
  • phonetic_term_threshold – (Default: 5) If a word in the input is shorter than this value, phonetic search won’t be applied to it. Example: With a threshold of 5, “John” is skipped, but “Albert” is included.
  • phonetic_character_threshold –  (Default: 12) If a word has more characters than this value, phonetic search is enabled – only if phonetic search is turned on.
  • edit_distance_enabled – Default setting: True. 
    edit distance is the number of single-character changes needed to turn one term into another (e.g. mat » bat has an edit distance of 1). When set to True, AML lookups will return names similar to the search term entered: e.g.: 'Anastasia' matches 'Anastasya'. When you enter a search query, our system compares it to the names in our database. If a name token (a name element) has a length equal to or greater than 7 letters, we allow for 1-character edits to find potential matches. If the token length is equal to or greater than 13, we allow for 2-character edits to find potential matches.
    For example, you enter the name "Tetjana Donez" which consists of two tokens. With the default value of 7, for each token with a character length above this value, our system will search for variations with 1 edit distance (single-character changes) to find potential matches.
    For example, "Tetjana Donez" will be considered a match with "Tetiana Donets" because they differ by just one character.
  • edit_distance_1_threshold – Defines how many character changes are allowed for short terms.
  • edit_distance_2_threshold – Defines how many character changes are allowed for longer terms.
  • min_nr_token_match: Defines the percentage of name tokens that must match for a valid comparison. Default: 67 (at least 67% of tokens must match). Range: 0 - 100 (100 requires a full match). Example: With 67, "Alexander Gahon Gesmundo" matches "Alexander Gesmundo", but with 100, it does not. Adjust for stricter or more flexible name matching.
  • enable_lastname_detection – Default: false, Function: When enabled, the search name and result name are split into first name(s) and last name(s). The filter only applies if a contradiction is detected.
    How It Works:
    • Match: "John Smith" and "John Adam Smith" ? Not filtered out (since "Adam" could be a middle name).
    • No Match: "Adam Smith" and "John Adam Smith" ? Filtered out (since "Adam" is a first name in one case but a middle name in the other, creating a contradiction).
  • glued_words_splitting – Handles names that may be incorrectly merged. Options:
    • off: Never used (default).
    • on: Always active.
    • fallback: Used only when no exact match is found.
Adverse media configs
  • Adverse Media Fuzziness: Distinct fuzzy settings should be applied for adverse media searches. A value of 0 means an exact match, while 1 indicates high fuzziness.
  • Adverse Media DOB Filter: Filters out results where the input date of birth does not match the result’s date of birth, based on name similarity.
  • Adverse Media Country Filter: Excludes results where the input country does not match the result’s country, based on name similarity.

Scoring Parameters

Scoring helps you configure each search effectively. It provides an objective measure of the probability that a search term matches the same person as the result.

  • result_limit – Limits the number of results returned.
  • score_threshold – Sets the minimum relevancy score for results to be included. (Recommended:  0.585)

Matching & Scoring Criteria

The API ranks search results based on a weighted scoring system, prioritizing:

  • Token Similarity (70%) – How closely names match, determined by edit distance and order.
  • Name IDF Score (8.5%) – Frequency of the name in the dataset.
  • Date of Birth (15%) – If DOB is available, it heavily influences the ranking.
  • Year Keyword Score (4%) – The relevance of the year in the dataset.
  • Country Keyword Score (1.6%) – Relevance based on country name.
  • Country Text IDF Score (0.9%) – Frequency of country-related terms.

Result filtering

  • filter_mismatching_dob – If set to true, it filters out all results where there is a discrepancy between the searched date of birth (DOB) and the matched records based on name similarity but with a different DOB.
  • filter_missing_dob – When enabled, if you provide a date of birth (DOB) in your request but the source data does not include a DOB, the result will be excluded from the search results.

 

DOB Estimation

The DOB Estimation feature is a back-end enhancement to our database designed to reduce false positives in AML screening by filtering out irrelevant results, even when an exact date of birth (DOB) isn’t available. 

This improvement helps bridge data gaps commonly seen in OSINT (Open Source Intelligence) and AML data sources, where providers often have only partial DOB coverage. By using an estimation approach, we’ve effectively increased DOB coverage, enhancing match precision and reliability.

Key Features

  • Reducing False Positives: Decreases irrelevant matches by estimating DOB ranges, making it easier to focus on genuine matches.
  • Back-End Integration: This feature works entirely behind the scenes, requiring no input or configuration from the user.
  • Improved Compliance Accuracy: Enhances the screening process by effectively filling DOB gaps, which are common in partial data from AML and OSINT sources.

How DOB Estimation Works

  • Back-End Life Event Analysis: The system uses available life event data (e.g., employment, education milestones) to infer an estimated age range in cases where the exact DOB is not available.
  • Increased DOB Coverage: By filling in the DOB gaps through estimation, we’re increasing our database coverage—meaning more complete profiles.
  • Sharper Relevance Filtering: With estimated age ranges, results are now filtered more precisely, helping you focus on what matters and skip over irrelevant matches.

User Impact

  • Reduction in False Positives: Users will notice a decrease in false positives, as the system can now filter more precisely, even without complete DOB data.
  • Streamlined Screening Experience: Results will be more relevant, reducing time spent on manually dismissing non-relevant matches.

FAQ

Will I need to configure anything to benefit from DOB Estimation?

No configuration is required. DOB Estimation is implemented on the back end and is automatically applied to your searches.

How will DOB Estimation affect my search results?

DOB Estimation enhances the relevance of search results by reducing false positives, especially where DOB data is not available or incomplete, resulting in more accurate matches and a smoother screening process.

The DOB Estimation feature is designed to address common data limitations in AML and OSINT sources, enhancing our database’s DOB coverage and delivering improved compliance outcomes by reducing false positives in user screenings.
 

What writing systems (languages) does fuzzy search support?

The default language within the SEON system is English. Our more advanced search tools are only available in English, including the fuzzy search engine.

Was this article helpful?