Hello, I have an array. It contains approximately twenty elements which
are strings. I also have one string - this string was obtained using an
OCR system. One of the strings in the array should ‘match’ the string
gotten using the OCR system - unfortunately OCRs aren’t perfect!
I want to take this string, and compare it to every string in the array,
and attempt to return the closest match.
I.E.,
array = [‘Hello there, how are you?’, ‘What did you do over your
break?’, 'I like my coffee brown.", “I just bought a new car.”]
string = “What did you d0 over your brcak?”
And then have my comparison function return array[1]. As you can see,
string has some ‘OCR errors’ - it’s usually 80-95% accurate, if not
dead-on.
If you know all the possibilities that your OCR system could pick up
then you could always do something like this…
knownStrings = [‘Hello’,‘Goodbye’]
out = []
OCR_strings = #new array of strings
OCR_strings.each do |ocr|
matches,len = 0,0
knownStrings.each do |known|
len = known.length
(len-1).times do |i|
if (i+1) >= ocr.length
break
else
if ocr[i] == known[i]
matches += 1
end
end
if matches / known.length > 0.85
out << known
else
out << “!#{known}”
end
end
end
end
…completely untested but i think you know what im getting at
Hello, I have an array. It contains approximately twenty elements which
are strings. I also have one string - this string was obtained using an
OCR system. One of the strings in the array should ‘match’ the string
gotten using the OCR system - unfortunately OCRs aren’t perfect!
I want to take this string, and compare it to every string in the array,
and attempt to return the closest match.
I.E.,
array = [‘Hello there, how are you?’, ‘What did you do over your
break?’, 'I like my coffee brown.", “I just bought a new car.”]
string = “What did you d0 over your brcak?”
And then have my comparison function return array[1]. As you can see,
string has some ‘OCR errors’ - it’s usually 80-95% accurate, if not
dead-on.
array = [‘Hello there, how are you?’, ‘What did you do over your
break?’,
‘I like my coffee brown.’, ‘I just bought a new car.’]
string = “What did you d0 over your brcak?”
def comp(str1,str2)
a=str1.split(‘’).uniq
b=str2.split(‘’).uniq
(a+b).uniq.length*1.0/(a.length+b.length)
end