Search fuzziness

Search fuzziness

Sanctions.io has supported fuzzy search for quite some time already, but today I’d like to introduce new feature which will allow our customers to tweak its’ output more precisely.

A bit of theory

So what is fuzzy search in the first place? When you’re searching for some name you might have it written in a slightly different way than it is listed in sanctions lists. There are various ways to write it in different languages, due to different pronunciations and sometimes even entities on sanctions lists might deliberately use slightly different name to try to bypass screening processes. Government agencies who create sanctions lists try to add as many variations of names of each sanctioned entity as possible, but it’s not feasible to list thousands of ways a name can be written due to changed letter. For instance my first name can be written as Boris in most languages using latin alphabet, but can also be written as Borys in Ukrainian or Boriss in Estonian languages.

This is where fuzzy search comes in. By using approximate string matching algorithm called Trigram search we can find all of these variants and more. For techies who are interested in detailed description of trigram search that we use and comparison to other similar algorithms I’ll leave few links below.

Fuzziness tweaking

Of course when you want to use non-exact matching like our fuzzy search does you end up with a lot of answers some of which might be useful while others are not at all. Previously we set a cutoff threshold according to what we thought best (we cut off results with over 85% dissimilarity), but some of our customers wanted an option to tweak this themselves to increase or decrease amount of false positives they’re getting with non-exact matching.

From today they can – our API‘s /search endpoint now supports additional GET parameter called fuzziness, which can take any input between 0 to 100 to increase or decrease fuzzy search fuzziness from default 85% level. The higher the number the more (fuzzy) results you will get and vice versa.

Links

[1] Fuzzy string matching with Trigram and Trigraphs 

[2] String Similarity Algorithms Compared


Boris Maryshev