Re: Longest Repeated Substring (#153)

jameskelly · January 21, 2008, 11:38pm

My solution finds substrings of length n, groups the identical ones
together in a hash of their starting indices, prunes out cases where two
of the “repeated” substring overlap each other, and then looks in the
same places for substrings of length n+1.

The aggressive pruning does, however, cause it to fail in the case where
a substring that overlaps at one length does not overlap at a higher
length. For example, take the string “aaaaaa”. The substring “aa” exists
at indicies [0,1,2,3,4]. My program will prune that down to [0,2,4]. It
will then record the substring of length 3 that starts at each of
[0,2,4], prune out the ones that start at 2 (overlaps with the one
startign at 0) and 4 (goes out of bounds), leaving only an unrepeated
substring, and would thus fall back and print “aa,” even though the
correct answer is “aaa,” because the index of 3 had already been pruned.

On the other hand, the aggressive pruning does make my program somewhat
fast. It takes about 45s on Also Sprach Zarathustra.

text = ARGF.read

substr_locs = {}

text.split(//).each_with_index do |c,idx|
if substr_locs[c]
substr_locs[c] << idx
else
substr_locs[c] = [idx]
end
end

substr_locs.each_pair do |substr,locs|
substr_locs.delete(substr) if locs.length == 1
end

len = 1
done = false
until done
len += 1
new_substr_locs = {}
substr_locs.each_pair do |substr,locs|
locs.each do |idx|
s = text[idx,len]
next if idx+len>text.length
if new_substr_locs[s]
new_substr_locs[s] << idx
else
new_substr_locs[s] = [idx]
end
end
end
##Must reduce for the case where a substring overlaps itself
new_substr_locs.each_key do |s|
locs = new_substr_locs[s]
idx = 1
while idx < locs.length
if locs[idx]-locs[idx-1] < len
locs.delete_at(idx)
else
idx += 1
end
end
new_substr_locs.delete(s) if locs.length == 1
end
unless new_substr_locs.keys.empty?
substr_locs = new_substr_locs
else
done = true
end
end
puts substr_locs.keys.first

  ____________________________________________________________________________________

Looking for last minute shopping deals?
Find them fast with Yahoo! Search.
http://tools.search.yahoo.com/newsearch/category.php?category=shopping

jameskelly · January 23, 2008, 6:36pm

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2…x (x=text.length/2), until regex ‘response’ is
‘nil’

jameskelly · January 23, 2008, 7:09pm

On Jan 23, 2008, at 11:35 AM, Raffa wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2…x (x=text.length/2), until regex ‘response’ is
‘nil’

You can use whatever you like.

Try it out on War & Peace though and see how it does.

James Edward G. II

jameskelly · January 23, 2008, 7:22pm

Why not?
The simplest solution could be:

def longest_repeated_substring str
(str.size/2).downto(1) { |i|
/(.{#{i}}).*\1/m =~ str and return $1
}
nil
end

but it’s too slow;

jameskelly · January 24, 2008, 6:40am

On Wed, 23 Jan 2008 13:21:31 -0500, SERGEY VOLKOV wrote:

but it’s too slow;

On Jan 23, 2008 12:35 PM, Raffa [email protected] wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2…x (x=text.length/2), until regex ‘response’ is ‘nil’

Well, one would think that the absolute simplest solution is
/(.+).\1/,
but that’s what I was testing when I invented the “your banana my
banana”
test case. (Which it failed, returning only “y” as the repeated
substring).

Yours is a nice extension of this basic idea.

–Ken

jameskelly · January 24, 2008, 1:18pm

It’s linear search for repeated substring length,
I’ve tried binary search, too slow too