What is Fuzziness?
Fuzziness (Edit/Levenshtein Distance) is a matching technique that allows for a variation in spelling or small variations in the spelling of a search term and the entities returned in the search results. The fuzziness will allow 1 phonetic typo per each word from the search term, the fuzziness percentage has more to do with the length of the word. Setting the interval is entirely dependent on your risk-based approach and how sure you are that the names you input for searching are correct (e.g. if you take the info directly from the customers' IDs, or if they input it themselves - which would be more prone to error).
Please note that the impact and use of Edit/Levenshtein Distance is inversely proportional to the length of the name. As the search term length increases the relative importance of a deviation in spelling will decrease. For example Leederheimer - Lexderheimer are far more likely to be misspellings of each other than Lee - Lex
● Exact Match ->
Difference between 0% fuzziness and exact match:
- The exact match does not allow for extra words to be added i.e. Robert Mugabe will not match with Robert Gabriel Mugabe
- We allow +/- 1 year difference in Year of Birth when fuzziness is between 10% and 100%. For exact match and 0% fuzziness, the Year of Birth has to match exactly
- An exact match doesn’t account for any preprocessing, for example, we do not strip out honorifics or suffixes like Mr./Ms./Dr./PHD etc
Why is fuzziness useful?
It allows for variations in spellings of the search term. If you misspell or are unsure of the spelling of a search term, you will be returned entities that are spelt differently to the search term by an inserted, omitted, or replaced character. This principle is useful when searching for non-Latin Characters. Fuzziness will not be performed on non-Latin characters, however, the search term will be converted from the native non-Latin text into Latin. The Latin transliteration is what we conduct fuzziness on. Through transliteration, there may be variation in the spelling of the search term to what it was in the non-Latin text, therefore having a higher fuzziness setting for non-Latin names is useful to prevent false negatives from occurring.