# Re: Longest Repeated Substring (#153)

My solution finds substrings of length n, groups the identical ones
together in a hash of their starting indices, prunes out cases where two
of the “repeated” substring overlap each other, and then looks in the
same places for substrings of length n+1.

The aggressive pruning does, however, cause it to fail in the case where
a substring that overlaps at one length does not overlap at a higher
length. For example, take the string “aaaaaa”. The substring “aa” exists
at indicies [0,1,2,3,4]. My program will prune that down to [0,2,4]. It
will then record the substring of length 3 that starts at each of
[0,2,4], prune out the ones that start at 2 (overlaps with the one
startign at 0) and 4 (goes out of bounds), leaving only an unrepeated
substring, and would thus fall back and print “aa,” even though the

On the other hand, the aggressive pruning does make my program somewhat
fast. It takes about 45s on Also Sprach Zarathustra.

substr_locs = {}

text.split(//).each_with_index do |c,idx|
if substr_locs[c]
substr_locs[c] << idx
else
substr_locs[c] = [idx]
end
end

substr_locs.each_pair do |substr,locs|
substr_locs.delete(substr) if locs.length == 1
end

len = 1
done = false
until done
len += 1
new_substr_locs = {}
substr_locs.each_pair do |substr,locs|
locs.each do |idx|
s = text[idx,len]
next if idx+len>text.length
if new_substr_locs[s]
new_substr_locs[s] << idx
else
new_substr_locs[s] = [idx]
end
end
end
##Must reduce for the case where a substring overlaps itself
new_substr_locs.each_key do |s|
locs = new_substr_locs[s]
idx = 1
while idx < locs.length
if locs[idx]-locs[idx-1] < len
locs.delete_at(idx)
else
idx += 1
end
end
new_substr_locs.delete(s) if locs.length == 1
end
unless new_substr_locs.keys.empty?
substr_locs = new_substr_locs
else
done = true
end
end
puts substr_locs.keys.first

``````  ____________________________________________________________________________________
``````

Looking for last minute shopping deals?
Find them fast with Yahoo! Search.
http://tools.search.yahoo.com/newsearch/category.php?category=shopping

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2…x (x=text.length/2), until regex ‘response’ is
‘nil’

On Jan 23, 2008, at 11:35 AM, Raffa wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2…x (x=text.length/2), until regex ‘response’ is
‘nil’

You can use whatever you like.

Try it out on War & Peace though and see how it does.

James Edward G. II

Why not?
The simplest solution could be:

def longest_repeated_substring str
(str.size/2).downto(1) { |i|
/(.{#{i}}).*\1/m =~ str and return \$1
}
nil
end

but it’s too slow;

On Wed, 23 Jan 2008 13:21:31 -0500, SERGEY VOLKOV wrote:

but it’s too slow;

On Jan 23, 2008 12:35 PM, Raffa [email protected] wrote:

cannot use regex, i suppose:

(.{aNumber}).*\1

where aNumber = 2…x (x=text.length/2), until regex ‘response’ is ‘nil’

Well, one would think that the absolute simplest solution is
/(.+).\1/,
but that’s what I was testing when I invented the “your banana my
banana”
test case. (Which it failed, returning only “y” as the repeated
substring).

Yours is a nice extension of this basic idea.

–Ken

It’s linear search for repeated substring length,
I’ve tried binary search, too slow too