Using Fuzzy Search in AML queries
Updated on 18.08.23
8 minutes to read
Copy link
Overview
SEON's AML tools offer a wide range of solutions and configuration options to support KYC & MLRO officers in their fight against fincrime and money laundering. Simply run quick checks with simple search, or use or complex search to hone in on the correct person. You can also decide to use exact search or fuzzy search and shift through their data quickly with our relevancy score system.
Complex search
Sometimes also known as multi-faceted search, complex search allows you to add several distinct data types to your search queries. Add a country and date of birth to your search term to enhance search results.
By default, SEON will include all name matches in your search results regardless of date of birth and country data. As official sources are sometimes lacking on date of birth and country data this is the best way to ensure that sanctioned individuals or criminals aren't missed by checks due to incomplete information.
Why use Fuzzy Search
Fuzzy search is a search algorithm that uses approximate string matching to find results similar to, but not exactly the same as, your search query. It helps you identify risky customers and users when exact matches fail.
Depending on your settings, fuzzy search will serve results similar to your original search query alongside any exact matches. This can help you catch high-risk individuals who'd otherwise fall through the cracks and save your MLRO officers time and effort.
Working on a global scale, as most online businesses do, you'll quickly encounter names originating from different writing systems. Transliteration – transferring written text between writing systems, for example, from Cyrillic or Arabic script to Latin characters – can quickly become a problem.
The results of transliteration depend on the languages involved. For example, Hungarian, Portuguese, and English all have different rules for recoding the letters of other writing systems, while each using the Latin alphabet. That's how Japan's historic capital becomes Kiotó, Quioto, or Kyoto, respectively. You can probably guess how easily that becomes a problem with the names of people and AML checks.
Then there's the issue of name variants or slightly different spellings of the same name. For example, the name Igor has a less common spelling, Ygor – that's when typos become a true nemesis.
How to use Fuzzy Search
You can easily use Fuzzy Search in any queries sent to SEON over your API integration.
When sending data over an API integration, you can enable and disable fuzzy search using the fuzzy_enabled parameter. You can also adjust fuzzy search settings if needed. Please be careful, as these settings can drastically increase the number of false negatives experienced by your team. The default settings will serve you best in most cases.
Relevancy score
AML API's Relevancy Score feature will help you determine how closely search results found using fuzzy search match the search name you entered. You can change your relevancy score threshold via your AML API request, to ensure your team doesn't encounter a high number of false positives.
The adjustable scoring thresholds or the relevancy score allows your team to set the sensitivity of the fuzzy search engine. In low-risk cases, such as PEP checks, you can chose to set the scoring threshold so strictly that only results with a full match of date of birth and name occur.
However, in high-risk cases such as sanction screening, we suggest that users verify every hit manually with a lower relevancy threshold where only names match using fuzzy search to ensure that the client is not a criminal or sanctioned individual.
Please be careful, as these settings can drastically increase the number of false negatives and false positives experienced by your team. A higher score threshold increases the number of false negatives (lower recall), and a lower score threshold increases the number of false positives.
But with a smart threshold policy, you can reduce manual workloads in low-risk cases, and concentrate on high-profile investigations where human decision-making is essential. The default settings will serve you best in most cases.
Fuzzy Search Settings
When using AML API over an API integration, you can customize your fuzzy search settings by including parameters in your API request.
Include your fuzzy parameters nested within the config.fuzzy_config parameter. You can use the parameters below to tweak what kind of results fuzzy search returns.
phonetic_search_enabled
– Default setting: False.
If enabled the parameter will turn on SEON's Phoinetic Search module. This means that the tokens entered into AML API are converted into a phonetic representation using the double metaphone, koelnerphonetik, haasephonetik, beider-morse, and daitch-mokotoff algorithms. When enabled the AML lookup will only use these phonetic representations of entered name and those in the database.
edit_distance_enabled
– Default setting: True.
edit distance is the number of single-character changes needed to turn one term into another (e.g. mat » bat has an edit distance of 1). When set to True, AML lookups will return names similar to the search term entered: e.g.: 'Anastasia' matches 'Anastasya'. When you enter a search query, our system compares it to the names in our database. If a name token (a name element) has a length equal to or greater than 7 letters, we allow for 1-character edits to find potential matches. If the token length is equal to or greater than 13, we allow for 2-character edits to find potential matches.
For example, you enter the name "Tetjana Donez" which consists of two tokens. With the default value of 7, for each token with a character length above this value, our system will search for variations with 1 edit distance (single-character changes) to find potential matches.
For example, "Tetjana Donez" will be considered a match with "Tetiana Donets" because they differ by just one character.scoring.result_limit
– Default setting: 10.
Use this parameter to define the maximum number of hits AML API should return in the result set. The result set is ordered by the source type hits are identified in: ['sanction', 'warned_entities/crimelist', 'central_bank/watchlist', 'pep']scoring.score_threshold
– Default setting: 0.585.
Set the Relevancy score threshold. The relevancy score is a normalized probability score with possible values between 0-1.
What writing systems (languages) does fuzzy search support?
The default language within the SEON system is English. Our more advanced search tools are only available in English, including the fuzzy search engine.
Exact search
We also support other languages, but only exact search will be available. Even so this exact search engine provides robust text processing capabilities to handle various types of text variations and complexities, including ASCII folding for non-ASCII characters, hyphenation and punctuation differences, out-of-order name matching, missing name components, and casing differences. These features allow our search engine to deliver more accurate and relevant search results, even when dealing with challenging text inputs that would otherwise cause errors or miss relevant matches.
Our exact search engine provides support for the following languages:
Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani (Latin) Basque Belarusian Bengali Bosnian Bulgarian Burmese Catalan Chinese (Hans) Chinese (Hant) Croatian Czech Danish Dutch English Estonian Filipino (Latin) Finnish French Galician Ganda Georgian | German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Italian Japanese Kannada Kazakh Khmer Kinyarwanda Konkani Korean Lao Latvian Lithuanian Macedonian Malay Malayalam Maltese Marathi Mongolian (Cyrillic) Nepali | Norwegian Bokmål Norwegian Nynorsk Oriya Oromo Polish Portuguese Punjabi Romanian Russian Serbian (Latin) Serbian (Cyrillic) Sinhala Slovak Slovenian Spanish Swahili Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Uzbek Vietnamese Welsh Yoruba (Latin) Zulu |