牛骨文教育服务平台(让学习变的简单)

SOUNDEX 返回由四个字符组成的代码 (SOUNDEX) 以评估两个字符串的相似性。返回的第一个字符是输入字符串的第一个字符,返回的第二个字符到第四个字符是数字。

soundex 代码如下:

def soundex(name, len=4):
    """ soundex module conforming to Knuth"s algorithm
        implementation 2000-12-24 by Gregory Jorgensen
        public domain
    """

    # digits holds the soundex values for the alphabet
    digits = "01230120022455012623010202"
    sndx = ""
    fc = ""

    # translate alpha chars in name to soundex digits
    for c in name.upper():
        if c.isalpha():
            if not fc:
                fc = c   # remember first letter
            d = digits[ord(c) - ord("A")]
            # duplicate consecutive soundex digits are skipped
            if not sndx or (d != sndx[-1]):
                sndx += d
    print sndx

    # replace first digit with first alpha character
    sndx = fc + sndx[1:]

    # remove all 0s from the soundex code
    sndx = sndx.replace("0", "")

    # return soundex code padded to len characters
    return (sndx + (len * "0"))[:len]

需要注意的是代码设计为处理英文名称。