Friday, 29 March 2013

Extended Soundex

I think Soundex is great to be able to compare two names and find possible matches. One of the limitations though that I have found is that it will only return 4 characters which can mean it only checks the start of the name. This can be good when you are comparing two names and one has been shortened but what about longer names?

Anyway so I created the below one day to help out someone when they wanted to compare names and the traditional soundex wasnt enough.

It uses the same principles as Soundex but it searches through all of the name.

Wikipedia has a good explanation of how the traditional Soundex works.

Of course this is just a basic example, Id suggest to use it for larger pieces you place it into a User Defined Table Function perhaps Or a Stored Procedure which accepts tables - I believe you can do that now but haven't tried it myself yet.


--Create my Name List
IF OBJECT_ID('tempdb..#NameList') IS NOT NULL 
DROP TABLE #NameList
CREATE TABLE #NameList (ID INT IDENTITY, Surname varchar(120), GivenNames varchar(120), GivenNameLength int)
INSERT INTO #NameList (Surname, GivenNames, GivenNameLength)
Select 'Dummy 'Surname, 'Robert William Francis' as GivenNames, 22 as GivenNameLength
Declare @NameID as int = 1
Declare @Letter as int = 1

Declare @GivenNameLength as int
Declare @LetterToCheck as varchar(1)
Declare @SoundexValue as varchar(50)
Declare @Names as bigint = 0
DECLARE @ReturnList TABLE (Surname varchar(120), GivenNames varchar(120), SoundexValue varchar(50))
--Determine the number of Names to check
Select @Names = COUNT(*) from #NameList
while @NameID <= @Names
 begin 
 --The first letter of the Name is put straight in, grab the length to know how many loops to perform
 select @GivenNameLength = GivenNameLength, @SoundexValue = LEFT(GivenNames, 1)  
 from #NameList
 where ID = @NameID
 Select @Letter = 2
 --For each letter in the Name assign a Soundex Value
 while @Letter <= @GivenNameLength 
  begin
  select @LetterToCheck = UPPER(SUBSTRING(GivenNames, @Letter, 1)) from #NameList
  where ID = @NameID
 
  --Assign Soundex Values
  select @SoundexValue = @SoundexValue + case when @LetterToCheck IN ('b', 'f', 'p', 'v') then '1'
     when @LetterToCheck IN ('c', 'g', 'j', 'k', 'q', 's', 'x', 'z') then '2'
     when @LetterToCheck IN ('d', 't') then '3'
     when @LetterToCheck IN ('l') then '4'
     when @LetterToCheck IN ('m', 'n') then '5'
     when @LetterToCheck IN ('r') then '6'
     --when @LetterToCheck IN ('h', 'w') then 'hw'
     when @LetterToCheck IN ('a', 'e', 'i', 'o', 'u') then 'v'
     else '' end
  Select @Letter = @Letter + 1
  end
  --Return a record
  INSERT INTO @ReturnList
  select Surname, GivenNames, 
  --If there are two letters with the same number next to each other return only 1
  --Unless there is a vowel in which case they remain
  REPLACE( 
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  Replace(@SoundexValue
  , '11', '1') 
  , '22', '2') 
  , '33', '3') 
  , '44', '4') 
  , '55', '5') 
  , '66', '6') 
  , 'v', '') as NewSoundexValue
  from #NameList
  where ID = @NameID
 select @NameID = @NameID + 1
 end
SELECT * FROM @ReturnList