In order to comply with increasingly stringent Anti-Money Laundering regulations and a growing list of sanctions, organizations must use the best possible name screening technology in their sanctions & AML screening processes.
Inadequate name screening can be extremely detrimental, resulting in hefty fines, reputational damage and loss of customers. If sanctioned individuals are not detected, the firm may be found negligent and fined. If a customer who is not sanctioned is mistakenly identified as such, they will have an extremely poor experience and likely take their business elsewhere.
In one example, credit reporting agency TransUnion lost a class-action lawsuit for (wrongly) flagging customers as criminals. In another, online transaction company PayPal was fined for failing to prevent transactions to sanctioned Iran due to filter failures.
This article will examine the various name matching methods available, as well the pros and cons of each.
Methods of Name Matching and their Respective Strengths and Weaknesses
A typically structured database will use a combination of metadata to look up records, such as names, addresses, phone numbers, ID numbers and other identifying data points. However, sometimes an organization has little more than a name to use as a reference. (Human beings prefer sharing names to numbers, and privacy laws may prevent sharing ID numbers or contact information).
As such, name matching is critically important. However, it’s simply not as easy as it sounds.
International sanctions and watch lists may contain names belonging to Arabic, Chinese, Russian or other nationalities that do not use the Latin alphabet, requiring conversion, which may lead to inaccuracies. Titles (such as Dr, Mrs, Mr) or nicknames in input data can lead to false positives and irrelevant results that need to be verified and checked, which can tie teams up for days. Then there is the challenge of big data. Customer databases may contain hundreds of millions of names and nearly unlimited comparisons, which need to be processed accurately and quickly.
While there are numerous search and comparison tools on the market, a name search is a uniquely complicated beast that requires a unique approach. A one-size-fits-all name-matching method will not suffice: different name-matching methods are required to solve different name-matching challenges. A modern, hybrid name matching solution that uses different methods to produce the most accurate result is required to successfully screen names.
Common key method
The common key method reduces names to a key or code based on English pronunciation. In the Soundex key method, similar sounding names (e.g. Cyndi, Candy, Condie) share a single code, C530. The Metaphone and Double Metaphone key methods use phonetic algorithms to turn similar-sounding names into the same key, which identifies similar names and improves matching. The Metaphone method can be seen as the evolution of the Soundex method as it uses a wider set of pronunciation rules and allows for varying lengths of keys, whereas Soundex uses a fixed-length key.
The Double Metaphone refines the matching process even further by introducing a primary and secondary code for each name. It also encompasses pronunciation other than English, including Slavic, Celtic, French, Spanish, Germanic and Chinese. For example, Metaphone codes the name “Smith” with a primary code of SM0 and a secondary code of XMT. Schmidt is tagged with the primary code of SM0 and a secondary code of SMT. This indicates a degree of similarity that Soundex might overstate and Metaphone miss altogether.
The principal benefit of this method is that it can be executed very quickly and has high recall (meaning that the algorithm returns most of the relevant results). On the downside, it is based on Latin languages, which means it may not be as effective in matching non-Latin names that need to be transliterated from languages like Mandarin or Arabic. In languages like Japanese, where a single character can have several correct pronunciations, transliteration can lead to grave errors and further complicate the difficult task of matching.
For example, the Arabic name Abdal-Rachid could potentially be transliterated as Ar-Rashid. Depending on the software, the names may come back as a match or not. These errors are usually only picked up during manual checks. (It should be mentioned that the Beider-Morse Phonetic Matching algorithm accepts Russian in Cyrillic script and Hebrew in Hebrew Script, but even that algorithm is mostly Latin-bound).
List or Dictionary method
The list method lists all possible spelling variations of each name component and then looks for matches from these lists of name variations. For example, searching for the Arabic name دیشرلا دبع will result in more than 3000 possible transliterations, including Abdal-Rashid, Abdal-Rashide and Abdal-Rasheed, before eventually delivering results like Abd-errchide, Abd-errcheed and so on. The drawbacks of this method are fairly obvious.
Multi-part, non-English names can produce an ever-growing list of variations that requires a good deal of time and effort to search. A simple name with three components and twenty possible variations per name will result in 203 possibilities and 8000 possible variant matches. Add to that other variations, such as nicknames, titles and initials, and your single name search can deliver tens of thousands of results.
The benefit of the list method is that it’s easy to maintain. Missed names can be added to the database immediately. However, speedy maintenance does not result in a speedy methodology. The sheer computational power required to process the huge volume of information, as well as the need for manual checks, makes it unsuitable for anti-money laundering, know your customer and watchlist screening checks.
It’s also worth noting that a name variation that isn’t found in the list will not result in a match, which makes it inefficient for many checks that require precision.
Edit distance method
The edit distance method is one of the easiest to implement. It considers how many character changes it takes to get from one name to another. For example, names like Carl and Karl have an edit distance of 1 since the C and the K are transposed, while Catherine and Katharine have an edit distance of 2 as the C becomes K and e becomes a.
There are several edit distance methods used, including the Levenshtein, Jaro-Winkler and the Jaccard similarity coefficient. These approaches take the number of similar characters and number of edit operations (to turn one name into another) into account. Comparisons are very quick, although they don’t capture linguistic nuance. For one thing, all edits are given the same weighting. Changing c to p is weighted equally as c to k, for example, even though the substitution of c with k is far more likely to indicate a similar name.
There are challenges with non-Latin names as well. For example, the Arabic character sheen is often mapped to sh in English, but one-to-many character mapping isn’t possible, so transliteration will be required whenever non-Latin script names are used. In another example, a Vietnamese name like Hang could be translated as Heng, a common Chinese surname. The spelling is similar, and both are Asian in origin, but it may result in irrelevant matches as the method cannot account for nuances in language.
Some companies prefer using a rule-based approach. This is a labor-intensive method that incorporates real-world knowledge about different cultures, naming conventions and ethnicities. Unfortunately, it’s not very practical. Human knowledge is limited, and it requires a great deal of work to input multiple name variations into a system based on human knowledge alone. It is also one of the slower methods as it has to sift through millions of names to find a good match.
Statistical similarity model
One method that manages to address the problem of matching non-Latin script names is the statistical similarity model. This approach takes thousands of matching name pairs and trains a computation to recognize what similar names look like. The model assigns a similarity score to names with high accuracy and can directly match names written in different languages without transliteration.
This method requires data training and adjustment and a high level of skill, but it is one of the most accurate methods described here. Execution is slow and timeous, which may make it unsuitable for environments where large volumes of data need to be checked regularly.
Word embedding method for organizational and company names
Many of the methods covered in this article cannot make semantic connections the way a human being could. These semantic connections are often found in business names. In many cases, names may be phonetically different but semantically very similar. For example, a human being would quickly determine that Eagle Drugs and Eagle Pharmaceuticals are likely one and the same, but the edit distance method (for example) would not pick it up at all.
Embeddings can make the match where these standard spelling-centric name matching techniques can’t. It’s only relevant to organization name matching, however, so it will not meet all of your matching requirements.
Clearly, there isn’t a single method that will meet every name-matching challenge or requirement. A hybrid approach that backfills the weakness of one approach with the strength of another is the best possible approach.
For example, a company could use the rapid-fire common key method (with its reliably high recall) to winnow the initial candidate pool of millions of names down to a smaller set of highly likely matches. Then, the list can be culled even further using the high-precision statistical similarity model to whittle the list down even further by using the linguistic variations of names in each language. This produces the most accurate results possible in a short amount of time.
Advanced Name Matching Technology is essential for a reliable Sanctions Screening Process
One of the most important steps in any sanctions compliance process is screening names of customers and business partners against global Sanctions lists to identify sanctioned individuals or entities. Misidentifications (false positives), as well as Non-identifications (false negatives), can be very costly and entail criminal charges and considerable reputational damage.
The challenges in that step lie mainly in the matching process. As simple it sounds to look up a name in a database, it can get very complex and challenging as we can see when going deeper into the detailed technicalities above. Transliteration issues, nicknames, honorifics, and phonetic similarities are just some of the real-world challenges besides the usual issues like typos, spelling differences etc. A simple fuzzy algorithm can’t mitigate those challenges and the risks that result from there.
sanctions.io’s matching technology is based on Rosette, an Industry leading name matching technology incorporating the latest research and achievements in AI and Natural Language Processing (NLP) covering those matching challenges in more than 20 languages and transliteration standards.
This helps our customers to reduce significantly false positives while making sure no real matches slip through the cracks.
Do you want to avoid an overload of false positives or, even worse, false negatives? Let a sanctions.io team member show you how our new name-matching technology solves these challenges by blending machine learning with a set of traditional name-matching approaches. Click here to schedule a quick demo call.
*) Rosette is a product of BABEL STREET