*10.15: news.*

**adds/algorithm/**efficient Levenshtein distance:. the Levenshtein distance between two words

is the minimum number of single-character edits

(insertion, deletion, substitution)

required to change one word into the other.

. finding the Lev' distance takes a long time,

on the order of (length of longer string

times length of shorter string),

so we are looking for ways to avoid it .

. Leo Polovets, a quora.com participant,

came up with an approach that computed

a character distribution histogram .

. given a string, see if it is within

a given Levenshtein distance

to any string in a dictionary.

. view a 64-bit integer as

16 nibbles (sets of 4 bits),

and assign characters to each nibble

by associating a given character

with the (character mod 16)th nibble .

. then in each nibble of a string's histogram

have a count of each time

one of that nibble's assigned characters

was used by the input string .

For example, in the string "hello",

the ASCII values of characters are

104, 101, 108, 108, 111, which modulo 16 are

8, 5, 12, 12, 15, so there is

a count of 2 in the 12th nibble;

because, 2 of the characters mod 16 = 12;

then for similar reasons,

there's a 1 in the 8th 5th and 15th nibbles,

and zeroes in every other nibble .

. with every string boiled down to

a single 64-bit histogram,

use bit arithmetic to quickly determine

how much 2 words differ in their use of

each of the mod-16 classes of characters .

. if there are many differences in

these mod-16 character classes

there is an even greater edit distance;

so, if we are looking for

an edit distance of less than 2

as a spell checker might do,

then we can rule out histogram differences

of greater than 2,

and computing the full Levenshtein distance

can be avoided most of the time .