Matches -> regexp?


#1

Hello,

this is not exactly a ruby problem, i hope people will not consider
this spamming.

does anyone know of any reference or an algorithm to build a regular
expression (a finite automaton, in general) that would match a given
set of strings? i of course mean a regexp that would match just the
given strings, and not any other. so solutions like /^.*$/ are not
acceptable :wink:

thanks
konstantin


#2

ako… schrieb:

thanks
konstantin

nice challenge, reminds me a bit of the 4th rubyquiz :>


#3

ako… schrieb:

does anyone know of any reference or an algorithm to build a regular
expression (a finite automaton, in general) that would match a given
set of strings? i of course mean a regexp that would match just the
given strings, and not any other. so solutions like /^.*$/ are not
acceptable :wink:

This is essentially an information compression problem. There’s
always a trivial solution:

s = %w{find only these strings}
r = Regexp.new(s.map {|w| ‘\A’ + w + ‘\Z’}.join(’|’))

The real problem is finding the shortest possible regexp. :wink:


#4

well, right, that is what i mean.


#5

i suspect that from your trivial solution we can arrive at the final
solution by just performing a DFA minimization. the same thing that lex
and yacc do when they generate code. so i guess i have answered my own
question. thanks for helping me. your trivial solution just made my
brain work.

konstantin


#6

ako… wrote:

i suspect that from your trivial solution we can arrive at the final
solution by just performing a DFA minimization. the same thing that
lex and yacc do when they generate code. so i guess i have answered
my own question. thanks for helping me. your trivial solution just
made my brain work.

Another option is to build up a tree and create the RX from that. I
once
wrote that:

14:12:23 [c]: ( echo foo; echo bar; echo band ) | create-rx.rb
(?:ba(?:nd|r)|foo)

Kind regards

robert