Suppose you had a task to deduplicate records on an object by exact match. Probably nothing worth a blog post. Use DataLoader and Excel, use Demandtools, spawn a text field and fill it with an MD5 hash – you name it. The most challenging part is probably the scalability on both ends – make sure your process works for millions of records, but don’t crack the proverbial walnut with a sledgehammer.
But what if requirements demand fuzzy matching? Typos, spelling mistakes, the usual “gamil” not “gmail”? That’s the wildcards’ time to shine. Wildcards exist in both SOSL and SOQL, and are somewhat intelligent search placeholders.
In SOQL,
- The % wildcard matches zero or more characters.
- The _ wildcard matches exactly one character.
In SOSL,
- The * wildcard matches zero or more characters at the middle or end of the search term.
- The ? wildcard matches only one character in the middle or end of the search term.
The decision whether to use SOQL or SOSL mostly depends on whether all fields should be scanned for the string to fuzzy match (SOSL), or selected fields only (SOQL), whereas the decision how to place wildcards define how strict your fuzzy matching model should be.
In the following, I’m doing a fuzzy match of an entry string against the Website standard field of Accounts, with levels of strictness being easily configurable through parameters at the top of the method:
public static List<Account> checkWebsiteDuplicates(String fuzzyMatchBase){
// Parameters for strictness
Boolean wrapAround = false;
Boolean strictNumberCharacters = false; // if true, the number of deviant characters needs to be equal to the number of skipped characters
Integer numberCharactersToSkip = 1; // the higher, the more lenient
// Variables derived from strictness
String wildcard = strictNumberCharacters ? '_' : '%';
Set<String> allFuzzyExpressions = new Set<String>{fuzzyMatchBase};
if(wrapAround){
allFuzzyExpressions.add(fuzzyMatchBase + wildcard);
allFuzzyExpressions.add(wildcard + fuzzyMatchBase);
allFuzzyExpressions.add(wildcard + fuzzyMatchBase + wildcard);
}
Integer baseLength = fuzzyMatchBase.length();
if(baseLength > numberCharactersToSkip){
for(Integer i = 1; i < ((baseLength + 1) - numberCharactersToSkip); i++){
String fuzzySearchString = fuzzyMatchBase.substring(0, i) + wildcard + fuzzyMatchBase.substring(i + numberCharactersToSkip, baseLength);
allFuzzyExpressions.add(fuzzySearchString);
if(wrapAround){
allFuzzyExpressions.add(wildcard + fuzzySearchString);
allFuzzyExpressions.add(fuzzySearchString + wildcard);
allFuzzyExpressions.add(wildcard + fuzzySearchString + wildcard);
}
}
}
return [SELECT Id, Name, Website FROM Account WHERE Website LIKE : allFuzzyExpressions];
}
In SOSL, you can’t limit the search to a defined list of fields, and the syntax is slightly different (and bear the 2,000 records limit in mind!), but it might come in handy especially for those who would like to scan all fields for fuzzy dupes, rather than specific and specified ones:
public static List<Account> checkWebsiteDuplicatesSosl(String fuzzyMatchBase){
// Parameters for strictness
Boolean withTrailing = false;
Boolean strictNumberCharacters = false; // if true, the number of deviant characters needs to be equal to the number of skipped characters
Integer numberCharactersToSkip = 1; // the higher, the more lenient
// Variables derived from strictness
String wildcard = strictNumberCharacters ? '?' : '*';
Set<String> allFuzzyExpressions = new Set<String>{fuzzyMatchBase};
if(withTrailing){
allFuzzyExpressions.add(fuzzyMatchBase + wildcard);
}
Integer baseLength = fuzzyMatchBase.length();
if(baseLength > numberCharactersToSkip){
for(Integer i = 1; i < ((baseLength + 1) - numberCharactersToSkip); i++){
String fuzzySearchString = fuzzyMatchBase.substring(0, i) + wildcard + fuzzyMatchBase.substring(i + numberCharactersToSkip, baseLength);
allFuzzyExpressions.add(fuzzySearchString);
if(withTrailing){
allFuzzyExpressions.add(fuzzySearchString + wildcard);
}
}
}
List<Account> returnList = new List<Account>();
String fuzzyString = String.join(allFuzzyExpressions, ' OR ');
for(List<sObject> sObjList : [FIND :fuzzyString IN ALL FIELDS Returning Account(Id, Name, Website)]){
List<Account> accs = (List<Account>) sObjList;
returnList.addAll(accs);
}
return returnList;
}