Forum: Ruby Determining the common prefix for several strings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Todd B. (Guest)
on 2007-08-03 04:31
I have several strings to analyze.  The strings are the names of a list
of things that were once ordered and now are not.  A typical array of
names might look like this:

item001
item004
item002
item002b
item002C
item 5
item 10
itemize this

I want to parse these names and determine that "item" is the common
prefix.

I know how to armstrong my way through it a character at a time, but I'm
thinking that Ruby probably has a very elegant way of doing this that
I'm not familiar with yet.

Thanks, Todd
Michael G. (Guest)
on 2007-08-03 05:32
(Received via mailing list)
On Aug 2, 2007, at 19:31 , Todd B. wrote:

> item 5
> item 10
> itemize this
>
> I want to parse these names and determine that "item" is the common
> prefix.
>
> I know how to armstrong my way through it a character at a time,
> but I'm
> thinking that Ruby probably has a very elegant way of doing this that
> I'm not familiar with yet.

I don't know of a particularly elegant way to do this in Ruby, but my
initial thoughts are

list = <<-EOS
item001
item004
item002
item002b
item002C
item 5
item 10
itemize this
EOS
# sort by length: prefix can't be longer than shortest string.
items = list.split("\n").sort_by { |i| i.length }
# use shortest as prefix
first = items.shift
items.inject(first) do |prefix, item|
   next prefix if item.match(/^#{prefix}/)
   # have to go char by char
   n = 0
   for i in 0..(prefix.length - 1) do
     break if prefix[i] != item[i]
     n = i
   end
   prefix[0..n]
end

Michael G.
grzm seespotcode net
Xavier N. (Guest)
on 2007-08-03 06:27
(Received via mailing list)
El Aug 3, 2007, a las 2:31 AM, Todd B.
escribió:
> item 5
> item 10
> itemize this
>
> I want to parse these names and determine that "item" is the common
> prefix.
>
> I know how to armstrong my way through it a character at a time,
> but I'm
> thinking that Ruby probably has a very elegant way of doing this that
> I'm not familiar with yet.

This is a solution that avoids going char by char:

   items = [
     'item001',
     'item004',
     'item002',
     'item002b',
     'item002C',
     'item 5',
     'item 10',
     'itemize this'
   ]

   min_length = items.map {|i| i.length}.min
   cuts = items.map {|i| i.split(//)[0, min_length]}
   cuts.transpose.each_with_index do |t, i|
     if t.uniq.length != 1
       puts cuts.first[0, i].join('')
       break
     end
   end

We first check which is the length of the shortest string in the
array, then split by chars to store those many ones, and then
transpose the matrix. Note that transposing is OK because we cut the
strings. That's longer to explain than to write it in Ruby as you
see. Once you get the transpose you can go row by row to detect the
one that has more than one distinct element. Again that's easy
delegating the job to uniq.

-- fxn
Xavier N. (Guest)
on 2007-08-03 06:42
(Received via mailing list)
El Aug 3, 2007, a las 4:26 AM, Xavier N.
escribió:
>   min_length = items.map {|i| i.length}.min
>   cuts = items.map {|i| i.split(//)[0, min_length]}
>   cuts.transpose.each_with_index do |t, i|
>     if t.uniq.length != 1
>       puts cuts.first[0, i].join('')
>       break
>     end
>   end

Hmmm, too convoluted. I realized you don't need the transpose and
thus the min length and cuts if you go by columns in the loop. This
is the same idea only more simple:

splits = items.map {|i| i.split(//)}
(0...splits.length).each do |i|
   if splits.map {|s| s[i]}.uniq.length != 1
     puts splits.first[0, i].join('')
     break
   end
end

-- fxn
Xavier N. (Guest)
on 2007-08-03 06:46
(Received via mailing list)
El Aug 3, 2007, a las 4:41 AM, Xavier N.
escribió:
> end
Ah, and the prefix can be extracted from the original string instead
of the split:

   splits = items.map {|i| i.split(//)}
   (0...splits.length).each do |i|
     if splits.map {|s| s[i]}.uniq.length != 1
       puts items.first[0, i]
       break
     end
   end

-- fxn
Xavier N. (Guest)
on 2007-08-03 07:23
(Received via mailing list)
El Aug 3, 2007, a las 4:46 AM, Xavier N.
escribió:
>   splits = items.map {|i| i.split(//)}
>   (0...splits.length).each do |i|
>     if splits.map {|s| s[i]}.uniq.length != 1
>       puts items.first[0, i]
>       break
>     end
>   end

There's a bug in the upper limit of the range, it should be the
length of some element of the splits

   splits = items.map {|i| i.split(//)}
   (0...splits.first.length).each do |i|
     if splits.map {|s| s[i]}.uniq.length != 1
       puts items.first[0, i]
       break
     end
   end

We could compute the length of the shortest string and use that finer
upper limit, but that's not necessary because even in the case that
the shortest string is the common prefix, we would get some nil in
the map through columns in the next iteration, and thus at least two
distinct values. On the other hand there's the corner case when all
strings are equal, that's not handled in the above code because they
way to do it depends on the details.

Please forgive those many revisions, I guess I need some sleep at
5:22 AM.

-- fxn
Harry K. (Guest)
on 2007-08-03 08:26
(Received via mailing list)
On 8/3/07, Todd B. <removed_email_address@domain.invalid> wrote:
>
> I want to parse these names and determine that "item" is the common
> prefix.
>
If I understood you correctly,  I think this will do it.

r = ["itemz14","item004","item002","item002b","item 5","itemize this"]

s = []
a ,b = r.max.split(//), r.min.split(//)
a.each_with_index{|x,y|s << y if x != b[y]}
(0...s[0]).each {|x| print a[x]}

Harry
ronald braswell (Guest)
on 2007-08-03 08:26
(Received via mailing list)
>>       puts items.first[0, i]
>       puts items.first[0, i]
>depends on the details.
>
>Please forgive those many revisions, I guess I need some sleep at  5:22 AM.
>
>-- fxn
>
>

Here is yet another way to solve this problem.

min_len = items.min{ |i,j| i.length <=> j.length }.length
common_prefix = ""
(0...min_len).each do |i|
  col = items.inject(""){ |sum,elem| sum << elem[i] }.squeeze
  if col.length > 1
    puts common_prefix
    break
  end
  common_prefix << col
end

Ron
Todd B. (Guest)
on 2007-08-03 09:25
Holy Cow!   You guys are geniuses!   Y'all are using these methods like
I've never seen or even thought of using them before.

Problem is solved - script is working - THANKS!!!

Todd
Peña, Botp (Guest)
on 2007-08-03 11:59
(Received via mailing list)
On Behalf Of Todd B.:
# Holy Cow!   You guys are geniuses!   Y'all are using these
# methods like. I've never seen or even thought of using them before.

don’t count me pls. i'm not holy nor genius, (possibly a cow :) but just
a nuby learning ruby fr the marvelous ruby community (thanks to matz,
nobu, guydecoux, mauricio, dblack, ...ml-list.size ).

ff is another example (of my nubiness). it shows niftiness of bang
methods, chains, and the underutilized all?,

CC:\family\ruby>cat test.rb
list = %w(itemone itemtwo item222 item0033 item11 itexthis
itemthisandthat itemt
hose)

min = list.sort!{|a,b| a.size <=> b.size}.first
break if list.all?{|a| a =~ /^#{min}/} while min.chop!

puts min
C:\family\ruby>ruby test.rb
ite

C:\family\ruby>

i luv my family. i luv ruby :)

kind regards -botp
Daniel DeLorme (Guest)
on 2007-08-03 14:34
(Received via mailing list)
>> item 5
>> item 10
>> itemize this
>>
>> I want to parse these names and determine that "item" is the common
>> prefix.

Haha! Regexp magic!

list = <<-EOS
item001
item004
item002
item002b
item002C
item 5
item 10
itemize this
EOS

/\A(.*).*(\n\1.*)*\Z/.match(list)[1]
=> "item"

:-D
Daniel
Xavier N. (Guest)
on 2007-08-03 15:35
(Received via mailing list)
El Aug 3, 2007, a las 6:23 AM, Harry K.
escribió:
> If I understood you correctly,  I think this will do it.
>
> r = ["itemz14","item004","item002","item002b","item 5","itemize this"]
>
> s = []
> a ,b = r.max.split(//), r.min.split(//)
> a.each_with_index{|x,y|s << y if x != b[y]}
> (0...s[0]).each {|x| print a[x]}

Good, so lexicographic order indeed helps. I would prefer to order
the array only once though, and construct the prefix right ahead as a
string so that the code reads more direct:

   prefix = ''
   min, max = items.sort.values_at(0, -1)
   min.split(//).each_with_index do |c, i|
     break if c != max[i, 1]
     prefix << c
   end
   puts prefix

My approach with splits removed and handling of the corner case where
all items are equal is:

   prefix = items.first
   (0...items.first.length).each do |i|
     if items.map {|item| item[i, 1]}.uniq.length != 1
       prefix = prefix[0, i]
       break
     end
   end
   puts prefix

That solution does not order the array, but explicitly loops over all
the columns in the prefix plus one via map, and uniqs them. The
approach based on sort seems a bit better to me.

-- fxn
Robert K. (Guest)
on 2007-08-03 15:39
(Received via mailing list)
Attachment: pref.rb (0 Bytes)
2007/8/3, Daniel DeLorme <removed_email_address@domain.invalid>:
> >> item002C
> item001
> => "item"
Amazing!  I wonder how this performs for large input sets (because of
the two ".*" and the "()*"). Could generate a lot of backtracking
(apart from the string length itself).

This is probably better in terms of individual comparisons.  The idea
is to reduce the prefix length whenever a mismatch is found
(attached). That solution also avoids sorting and multiple traversal
of the input in general. So it's suited for stream based processing
(i.e. for reading large files in one go).

Kind regards

robert
Daniel DeLorme (Guest)
on 2007-08-03 16:47
(Received via mailing list)
Robert K. wrote:
> 2007/8/3, Daniel DeLorme <removed_email_address@domain.invalid>:
>> /\A(.*).*(\n\1.*)*\Z/.match(list)[1]
>> => "item"
>
> Amazing!  I wonder how this performs for large input sets (because of
> the two ".*" and the "()*"). Could generate a lot of backtracking
> (apart from the string length itself).

I think it's ok because the .* pattern is always anchored to line
endings. In this context, .*\n is the equivalent of [^\n]*\n which
leaves no room for backtracking.

Daniel
Robert K. (Guest)
on 2007-08-03 16:50
(Received via mailing list)
2007/8/3, Daniel DeLorme <removed_email_address@domain.invalid>:
> endings. In this context, .*\n is the equivalent of [^\n]*\n which
> leaves no room for backtracking.

I believe you are wrong.  Look at "\A(.*).*" - there are multiple ways
this can match (including matching \n inside any of the .* parts).
Anchoring does not help much here IMHO.

Kind regards

robert
Daniel DeLorme (Guest)
on 2007-08-03 17:11
(Received via mailing list)
Robert K. wrote:
> I believe you are wrong.  Look at "\A(.*).*" - there are multiple ways
> this can match (including matching \n inside any of the .* parts).
> Anchoring does not help much here IMHO.

Yes, there is backtracking for the first part (there has to be, the
whole point being to find the longest matching string), but \n CANNOT
match inside any of the .* parts unless you specify the m modifier.

Daniel
Lloyd L. (Guest)
on 2007-08-03 17:13
I think that this thread explained to me just why so many people love
Ruby so much.  There are wildly different approaches to solving this
that all work.  People are very different in how they think and how they
solve problems.  Ruby can be seen here as SO flexible that, no matter
how you think things through, Ruby still covers all the bases making it
a comfortable shoe for everyone.  This may be a unique example of one
size actually fitting all.  That is just amazing and borders on magic.

While I am quite good in a few languages, I cannot wait to get good at
Ruby.
Xavier N. (Guest)
on 2007-08-03 17:41
(Received via mailing list)
El Aug 3, 2007, a las 1:35 PM, Xavier N.
escribió:
>   prefix = ''
>   min, max = items.sort.values_at(0, -1)
>   min.split(//).each_with_index do |c, i|
>     break if c != max[i, 1]
>     prefix << c
>   end
>   puts prefix

Yet another iteration: the prefix can be extracted with a regexp,
it's shorter although perhaps more obscure:

   min, max = items.sort.values_at(0, -1)
   puts min+max =~ /(.).{#{min.length-1}}(?!\1)/m ? $` : min

I don't like the ternary and the fact that we look for the first
mismatch instead of extracting directly the prefix in one shot.

-- fxn
Robert K. (Guest)
on 2007-08-03 17:50
(Received via mailing list)
Attachment: pref-rb.rb (0 Bytes)
2007/8/3, Daniel DeLorme <removed_email_address@domain.invalid>:
> Robert K. wrote:
> > I believe you are wrong.  Look at "\A(.*).*" - there are multiple ways
> > this can match (including matching \n inside any of the .* parts).
> > Anchoring does not help much here IMHO.
>
> Yes, there is backtracking for the first part (there has to be, the
> whole point being to find the longest matching string), but \n CANNOT
> match inside any of the .* parts unless you specify the m modifier.

Correct.  I always confuse Ruby's /m with other language's variants.
Thanks for the reeducation!

Still it would be interesting to do benchmarks with larger numbers of
words to see what happens....  I have attached a test. It seems the RX
approach is faster very long.

Kind regards

robert
Xavier N. (Guest)
on 2007-08-03 19:48
(Received via mailing list)
El Aug 3, 2007, a las 3:39 PM, Xavier N.
escribió:
> it's shorter although perhaps more obscure:
>
>   min, max = items.sort.values_at(0, -1)
>   puts min+max =~ /(.).{#{min.length-1}}(?!\1)/m ? $` : min
>
> I don't like the ternary and the fact that we look for the first
> mismatch instead of extracting directly the prefix in one shot.

This is a direct way:

   min, max = items.sort.values_at(0, -1)
   puts (min+max).match(/\A(.*).*(?=.{#{max.length}}\z)\1/m)[1]

Of course if you can rely on some character that does no belong to
the items that's easier, it just takes a simplification of Daniel's
in the case of "\n":

   min, max = items.sort.values_at(0, -1)
   puts "#{min}\n#{max}".match(/^(.*).*\n\1/)[1]

Albeit those positive approaches are direct they backtrack quite a
lot, perhaps the fixed-length look-ahead with \z in the latter may
get some optimization, I don't knw. On the contrary the negative
approach, the one that looks for a mismatch, does not backtrack at
all, in that sense that's the direct one.

Anyway, I doubt I'd use a regexp approach in production code.

-- fxn
Simon Kröger (Guest)
on 2007-08-04 02:28
(Received via mailing list)
Peña wrote:
> [...]
> ff is another example (of my nubiness). it shows niftiness of bang methods, chains, and 
the underutilized all?,

don't forget the any? :)

puts r.first[0...(0..r.first.size).find{|i|
r.any?{|s|r.first[0..i]!=s[0..i]}}]

cheers

Simon
Eric P. (Guest)
on 2007-08-04 02:36
(Received via mailing list)
The obligatory solution using zip...

# data contains a list of strings
aa = data.map{ |x| x.split(//)}
bz = aa[0].zip(*(aa[1 .. -1])).map{ |x| x.join("")}
target = bz.index( bz.find{ |x| x.squeeze.size != 1})
answer = (target.nil? ? aa[0] :
          target >= 1 ? aa[0][0 .. target - 1] :
          []).join("")

- Eric
Eric P. (Guest)
on 2007-08-05 20:41
(Received via mailing list)
On Aug 3, 3:30 pm, Eric P. <removed_email_address@domain.invalid> wrote:
> - Eric
Doh!  This isn't correct.  It's hard to improve over
Michael's routine back in the initial response.  You
need to find any one of the shortest strings, remove it,
and use that instead of aa[0] as above.
Wolfgang N. (Guest)
on 2007-08-06 00:46
Xavier N. wrote:
> El Aug 3, 2007, a las 2:31 AM, Todd B.
>escribi�:
> item 5
>> item 10
>> itemize this
>>
>> I want to parse these names and determine that "item" is the common
>> prefix.
>>
>> I know how to armstrong my way through it a character at a time,
>> but I'm
>> thinking that Ruby probably has a very elegant way of doing this that
>> I'm not familiar with yet.
>
> This is a solution that avoids going char by char:
>
>    items = [
>      'item001',
>      'item004',
>      'item002',
>      'item002b',
>      'item002C',
>      'item 5',
>      'item 10',
>      'itemize this'
>    ]
>
>    min_length = items.map {|i| i.length}.min
>    cuts = items.map {|i| i.split(//)[0, min_length]}
>    cuts.transpose.each_with_index do |t, i|
>      if t.uniq.length != 1
>        puts cuts.first[0, i].join('')
>        break
>      end
>    end
>
> We first check which is the length of the shortest string in the
> array, then split by chars to store those many ones, and then
> transpose the matrix. Note that transposing is OK because we cut the
> strings. That's longer to explain than to write it in Ruby as you
> see. Once you get the transpose you can go row by row to detect the
> one that has more than one distinct element. Again that's easy
> delegating the job to uniq.
>
> -- fxn

A variant of this is:

list = <<-EOS
item001
item004
item002
item002b
item002C
item 5
item 10
itemize this
EOS
s =  list.split("\n").sort{|a,b|a.length<=>b.length}
pre =
s.collect{|e|e.split('')[0...(s[0].length)]}.transpose.collect{|e|e.uniq}.reject{|e|e.length>1}.flatten.join('')
p pre # => "item"

Wolfgang N#adasi-Donner
Wolfgang N. (Guest)
on 2007-08-06 10:42
Wolfgang Nádasi-donner wrote:
EOS
> s =  list.split("\n").sort{|a,b|a.length<=>b.length}
> pre =
> 
s.collect{|e|e.split('')[0...(s[0].length)]}.transpose.collect{|e|e.uniq}.reject{|e|e.length>1}.flatten.join('')
> p pre # => "item"
>
> Wolfgang N#adasi-Donner

:oops: - I made an error!

The corrected version is...

list = <<-EOS
item0aa
item1aa
item3a
EOS
s =  list.split("\n").sort{|a,b|a.length<=>b.length}
pre =
s.collect{|e|e.split('')[0...(s[0].length)]}.transpose.collect{|e|e.uniq}.
         inject(''){|r,c|(c.length>1)?(break r):(r<<c[0])}
p pre # => "item"


Wolfgang Nádasi-Donner
Gavin S. (Guest)
on 2007-08-06 11:21
(Received via mailing list)
On Aug 3, 10:31 am, Todd B. <removed_email_address@domain.invalid> wrote:
> I want to parse these names and determine that "item" is the common
> prefix.

I know this has been done to death, but here's my preferred code:

  list = %{
    item001
    item004
    item002
    item002b
    item002C
    item 5
    item 10
    itemize this
  }.strip.map { |line| line.strip }.sort_by { |str| str.length }

  prefix = list.first.dup

  until prefix.empty? or list.all? { |str| str =~ /^#{prefix}/ }
    prefix.chop!
  end

  p prefix


I didn't think of using chop! until someone else posted a solution
with that in it.  It's the first really good use of chop! I've seen.

This solution also demonstrates the lovely #emtpy? and #all? methods.
(I know #all? has been seen before in this thread, but it's such a
handy method.)

Cheers,
Gavin
Gavin S. (Guest)
on 2007-08-06 11:26
(Received via mailing list)
On Aug 3, 5:58 pm, Peña, Botp <removed_email_address@domain.invalid> wrote:
>
> min = list.sort!{|a,b| a.size <=> b.size}.first
> break if list.all?{|a| a =~ /^#{min}/} while min.chop!
>

I just realised that the solution I just posted elsewhere in this
thread is very very similar to this one.

Be careful that using bang methods in chains is sometimes problematic,
as they can return nil if nothing changes (sort! doesn't have this
problem).  For interested readers, find the problem with this code:

  title = "the moon in june"
  title.downcase!.capitalize!

Cheers,
Gavin
Peña, Botp (Guest)
on 2007-08-06 12:42
(Received via mailing list)
From: Gavin S. [mailto:removed_email_address@domain.invalid]
# On Aug 3, 5:58 pm, Peña, Botp <removed_email_address@domain.invalid> wrote:
# > min = list.sort!{|a,b| a.size <=> b.size}.first
# > break if list.all?{|a| a =~ /^#{min}/} while min.chop!
# I just realised that the solution I just posted elsewhere in this
# thread is very very similar to this one.
# Be careful that using bang methods in chains is sometimes problematic,
# as they can return nil if nothing changes (sort! doesn't have this
# problem).

indeed. as long as the array has no nil elements, w're ok w sort!

irb(main):009:0> s="test"
=> "test"
irb(main):010:0> s.downcase!.downcase!
NoMethodError: undefined method `downcase!' for nil:NilClass
        from (irb):10
        from :0
irb(main):011:0> a=[1,1,1]
=> [1, 1, 1]
irb(main):012:0> a.sort!.sort!
=> [1, 1, 1]

i realized just now that i posted the first code.  That should be,

min = list.sort!{|a,b| a.size <=> b.size}.shift  # <--shift instead of
first
break if list.all?{|a| a =~ /^#{min}/} while min.chop!

but anyway, after a couple of tests, the sort does not help much, so

min = list.min{|a,b| a.size <=> b.size}
break if list.all?{|a| a =~ /^#{min}/} while min.chop!

hmm..  Gavin, your until code just gave me a hint. if we reverse logic,
we can save the break and make it more readable too.

min = list.min{|a,b| a.size <=> b.size}
min.chop! until list.all?{|a| a =~ /^#{min}/}

and we can use switch w #any? anytime too :)

min = list.min{|a,b| a.size <=> b.size}
min.chop! while list.any?{|a| a !~ /^#{min}/}

ruby power, indeed.

kind regards -botp
Robert K. (Guest)
on 2007-08-06 15:51
(Received via mailing list)
Attachment: prefix.rb (0 Bytes)
2007/8/6, Peña, Botp <removed_email_address@domain.invalid>:
> indeed. as long as the array has no nil elements, w're ok w sort!
> => [1, 1, 1]
>
> ruby power, indeed.
All these solutions have the issue that they modify the list which
might or might not be ok.  I tend to prefer to view the item list as
read only.  So I'd do:

min = list.min{|a,b| a.size <=> b.size}.dup
min.chop! until list.all?{|a| a =~ /^#{min}/}

I have added some interesting benchmarking:

$ ./prefix.rb
                                    user     system      total
real
1000 * list.rx                  0.016000   0.000000   0.016000 (
0.014000)
1000 * list.rx with conversion  0.031000   0.000000   0.031000 (
0.030000)
1000 * list.one_pass            0.219000   0.000000   0.219000 (
0.218000)
1000 * list.min                 0.328000   0.000000   0.328000 (
0.337000)
1000 * list.first.dup           0.640000   0.000000   0.640000 (
0.641000)
1 * large.rx                    0.000000   0.000000   0.000000 (
0.000000)
1 * large.rx with conversion    0.000000   0.000000   0.000000 (
0.000000)
1 * large.one_pass             25.985000   0.000000  25.985000 (
26.298000)
1 * large.min                  34.953000   0.000000  34.953000 (
35.098000)
1 * large.first.dup            27.719000   0.000000  27.719000 (
27.732000)


(see attachment for code)

Kind regards

robert
Stefan R. (Guest)
on 2007-08-06 19:12
Todd B. wrote:
> I have several strings to analyze.  The strings are the names of a list
> of things that were once ordered and now are not.  A typical array of
> names might look like this:
>
> item001
> item004
> item002
> item002b
> item002C
> item 5
> item 10
> itemize this
>
> I want to parse these names and determine that "item" is the common
> prefix.
>
> I know how to armstrong my way through it a character at a time, but I'm
> thinking that Ruby probably has a very elegant way of doing this that
> I'm not familiar with yet.
>
> Thanks, Todd

Wow, I'm surprised nobody had the idea of simply using Array#abbrev,
which is included in the stdlib.

Regards
Stefan
Todd B. (Guest)
on 2007-08-06 23:12
Stefan R. wrote:
>
> Wow, I'm surprised nobody had the idea of simply using Array#abbrev,
> which is included in the stdlib.
>
> Regards
> Stefan

Well that looks simple enough!   Looks like I would take the shortest
key returned and chop! the last character to get the common prefix.   If
the shortest key length was 1, there is no common prefix.

I had never used Abbrev before - thanks for pointing it out.   This
wasn't a contest, but if it were - I think you won!

Todd
Xavier N. (Guest)
on 2007-08-06 23:27
(Received via mailing list)
El Aug 6, 2007, a las 5:13 PM, Stefan R.
escribió:
> Wow, I'm surprised nobody had the idea of simply using Array#abbrev,
> which is included in the stdlib.

There's no guarantee you can obtain the maximum common prefix from
its output, because it computes _unambiguous_ prefixes:

   %w{ item1 item2 }.abbrev
   => {"item1"=>"item1", "item2"=>"item2"}

See, no "item" there, you end up basically with the information you
had before the call.

-- fxn
Peña, Botp (Guest)
on 2007-08-07 05:17
(Received via mailing list)
From: Robert K. [mailto:removed_email_address@domain.invalid]
# I have added some interesting benchmarking:
#
# $ ./prefix.rb
#                                     user     system       total
real
# 1000 * list.rx                  0.016000   0.000000    0.016000 (
0.014000)
# 1000 * list.rx with conversion  0.031000   0.000000    0.031000 (
0.030000)
# 1 * large.rx                    0.000000   0.000000    0.000000 (
0.000000)
# 1 * large.rx with conversion    0.000000   0.000000    0.000000 (
0.000000)

Robert, thanks for the benchmark. I included Brian|Martin's solution in
the mix but still the regex scheme is way a lot better w regards to
speed.

Could some kind soul enlighten me why the regex solution works better,
and even best using large strings?

kind regards -botp
Logan C. (Guest)
on 2007-08-07 05:48
(Received via mailing list)
On 8/6/07, Xavier N. <removed_email_address@domain.invalid> wrote:
>    => {"item1"=>"item1", "item2"=>"item2"}
>
> See, no "item" there, you end up basically with the information you
> had before the call.



irb(main):027:0>    items = [
irb(main):028:1*     'item001',
irb(main):029:1*     'item004',
irb(main):030:1*     'item002',
irb(main):031:1*     'item002b',
irb(main):032:1*     'item002C',
irb(main):033:1*     'item 5',
irb(main):034:1*     'item 10',
irb(main):035:1*     'itemize this'
irb(main):036:1>   ]
irb(main):039:0> Abbrev.abbrev(items).sort_by { |s,|
s.length}.first.first[0..-
2]
=> "item"
irb(main):040:0> items = ["happy", "happening", "hapless"]
=> ["happy", "happening", "hapless"]
irb(main):041:0> Abbrev.abbrev(items).sort_by { |s,|
s.length}.first.first[0..-
2]
=> "hap"
irb(main):042:0> items = ["c", "b", "a"]
=> ["c", "b", "a"]
irb(main):043:0> Abbrev.abbrev(items).sort_by { |s,|
s.length}.first.first[0..-
2]
=> ""
irb(main):044:0>

I've edited out some false starts. Please feel free to point out any
flaws
that I've missed

Abbrev.abbrev(items).sort_by { |s,| s.length }.first.first[0..-2]
Peña, Botp (Guest)
on 2007-08-07 06:26
(Received via mailing list)
From: Logan C. [mailto:removed_email_address@domain.invalid]
# I've edited out some false starts. Please feel free to point
# out any flaws
# that I've missed
#
# Abbrev.abbrev(items).sort_by { |s,| s.length }.first.first[0..-2]

I cannot prove it wrong, hacker Logan :)

i just put some cosmetics though by getting the extract fr min,

irb(main):099:0> list=[
irb(main):100:1* "item two of those",
irb(main):101:1* "item two",
irb(main):102:1* "item1",
irb(main):103:1* "item22",
irb(main):104:1* "item333",
irb(main):105:1* "item 444",
irb(main):106:1* "item 5555",
irb(main):107:1* "item6a",
irb(main):108:1* "item6b",
irb(main):109:1* "item---x",
irb(main):110:1* "itemy",
irb(main):111:1* "item whatever :)",
irb(main):112:1* ]
=> ["item two of those", "item two", "item1", "item22", "item333", "item
444", "item 5555", "item6a", "item6b", "item---x", "itemy", "item
whatever :)"]
irb(main):113:0>
list.abbrev.min{|(a1,),(b1,)|a1.length<=>b1.length}.first[0..-2]
=> "item"

so,
  list.abbrev.min{|(a1,),(b1,)|a1.length<=>b1.length}.first[0..-2]

your magic number "0..2" is an amazingly simple hack (it looks like 42
:). but i'm not sure how this abbrev scheme stands stands against the
regex solutions though.

kind regards -botp
Robert K. (Guest)
on 2007-08-07 09:16
(Received via mailing list)
On 07.08.2007 03:16, Peña wrote:
> Robert, thanks for the benchmark. I included Brian|Martin's solution in the mix but 
still the regex scheme is way a lot better w regards to speed.
>
> Could some kind soul enlighten me why the regex solution works better, and even best 
using large strings?

My theory is that it's completely done in C and there is only *one*
invocation.  All other solutions need some form of iteration implemented
in Ruby.

Kind regards

  robert
Xavier N. (Guest)
on 2007-08-07 15:11
(Received via mailing list)
On Aug 7, 2007, at 3:47 AM, Logan C. wrote:

>>    => {"item1"=>"item1", "item2"=>"item2"}
> irb(main):031:1*     'item002b',
> => ["happy", "happening", "hapless"]
> irb(main):044:0>
>
> I've edited out some false starts. Please feel free to point out
> any flaws
> that I've missed

If I understand correctly that approach that says: "since I can get
some absolute shortest (perhaps not unique as fas as length is
concerned) unambiguous minimum, belonging to string S, that shortest
minus one is ambiguous". That's the key idea, and I think there's
something into it.

What happens is that, as far as I can see, you can only conclude
there's ambiguity with some other string, which is not the same than
saying that you've got the maximum common prefix. See for example:

   %w{abc acb abd acd}.abbrev.sort_by {|s,| s.length}.first.first[0..-2]
   => "ac"

-- fxn

PS: I think abbrev does too much work compared to more specific
solutions, but it is worth exploring for the sake of thinking
nonetheless.
Peña, Botp (Guest)
on 2007-08-08 06:20
(Received via mailing list)
From: Xavier N. [mailto:removed_email_address@domain.invalid]
#    %w{abc acb abd acd}.abbrev.sort_by {|s,|
s.length}.first.first[0..-2]
#    => "ac"

duh, i should read further on what abrrev really does. Thanks for the
enlightenment.

# PS: I think abbrev does too much work compared to more specific
# solutions, but it is worth exploring for the sake of thinking
nonetheless.

yes. this has been an enlightening thread nonetheless.
kind regards -botp
This topic is locked and can not be replied to.