People,
Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn’t . .
Thanks,
Phil.
Philip R.
Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]
Phil R. wrote:
Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn’t . .
Perhaps if you explained what this mysterious ‘agrep’ was, we might
help.
Something from another language? A unix utility?
Give us a sample array, and what you’d like the result to be after
calling this method on that array.
On Sat, 2008-04-26 at 13:15 +0900, Phrogz wrote:
Phil R. wrote:
Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn’t . .
Perhaps if you explained what this mysterious ‘agrep’ was, we might
help.
Something from another language? A unix utility?
Give us a sample array, and what you’d like the result to be after
calling this method on that array.
NAME
agrep - print lines approximately matching a pattern
SYNOPSIS
agrep [OPTION]… PATTERN [FILE]…
DESCRIPTION
Searches for approximate matches of PATTERN in each FILE or
standard input. Exam-
ple: ‘agrep -2 optimize foo.txt’ outputs all lines in file
‘foo.txt’ that match
“optimize” within two errors. E.g. lines which contain
“optimise”, “optmise”, and
“opitmize” all match.
–
Philip R.
Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]
NAME
agrep - print lines approximately matching a pattern
Enurable#grep can do that, if you pass it the right block. When you pass
a block to grep it’s the block’s job to match the elements.
Now the interesting question is: How would that block look like?
mfg, simon … l
On Apr 26, 2008, at 03:35 , Simon K. wrote:
NAME
agrep - print lines approximately matching a pattern
Enurable#grep can do that, if you pass it the right block. When you
pass
a block to grep it’s the block’s job to match the elements.
no.
enum.grep(pattern) => array
enum.grep(pattern) {| obj | block } => array
Returns an array of every element in _enum_ for which +Pattern
===
element+. If the optional block is supplied, each matching
element is passed to it, and the block’s result is stored in the
output array.
The block just morphs the result, it doesn’t morph the match.
hi phil!
if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn’t support
regular expressions in the pattern because i don’t how to achieve
that easily without re-implementing agrep’s algorithm
it’s
really just a quick hack that might get you started, hopefully.
[1]
http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M000091
cheers
jens
jens,
On Sat, 2008-04-26 at 23:15 +0900, Jens W. wrote:
hi phil!
if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn’t support
regular expressions in the pattern because i don’t how to achieve
that easily without re-implementing agrep’s algorithm
it’s
really just a quick hack that might get you started, hopefully.
[1]
http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M000091
This might work but it would be more difficult without regexs - the
current application does a system call to agrep but of course it is very
slow for large numbers of calls. A typical call is something like:
agrep -2 “Smith|J.*12345” list1.txt list2.txt list3.txt
This allows two differences on a minimum amount of information
consisting of last name, first initial and zip code. If I use the
Enumerable version, I would have to use the whole, delimited, name &
address string and increase the differences/distance number?
Did you just do that hack now? - how do I get/install it? (Fedora 8).
Thanks,
Phil.
Philip R.
Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]
Phil R. [2008-04-26 19:13]:
name & address string and increase the differences/distance
number?
i think something like that could work in your case (requires the
Text gem):
File.open(‘list1.txt’).select { |line|
# extract name and zip code from line
line =~ /\A(.?|.).\b(\d{5})\b/ # adjust appropriately!
# name may have two errors, zip only one -- or whatever...
Text::Levenshtein.distance($1, 'Smith|J') <= 2 &&
Text::Levenshtein.distance($2, ‘12345’) <= 1
}
Did you just do that hack now?
that’s right. but i just read a bit on agrep’s algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).
as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits… 
- how do I get/install it? (Fedora 8).
well, i don’t think that particular implementation suits your needs
and is obviously easily adapted (after all, it’s just a select with
an appropriate block utilizing Text::Levenshtein.distance). but you
can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
if the new version hasn’t found its way onto the mirrors yet, from
our own gem server at http://prometheus.khi.uni-koeln.de/rubygems/.
cheers
jens
jens,
On Sun, 2008-04-27 at 02:50 +0900, Jens W. wrote:
the Enumerable version, I would have to use the whole, delimited,
# name may have two errors, zip only one – or whatever…
Text::Levenshtein.distance($1, ‘Smith|J’) <= 2 &&
Text::Levenshtein.distance($2, ‘12345’) <= 1
}
I see what you are doing but this would have to be repeated for the
three different lists (list1.txt, list2.txt, list3.txt) - I guess that
should still be faster than a single system call . .
Did you just do that hack now?
that’s right. but i just read a bit on agrep’s algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).
I don’t know if it helps but there is this:
http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A3858438834D.aspx
as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits… 
I was wondering about something like that but I have never created a
Ruby binding before . .
- how do I get/install it? (Fedora 8).
well, i don’t think that particular implementation suits your needs
and is obviously easily adapted (after all, it’s just a select with
an appropriate block utilizing Text::Levenshtein.distance). but you
can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
if the new version hasn’t found its way onto the mirrors yet, from
our own gem server at http://prometheus.khi.uni-koeln.de/rubygems/.
Thanks!
Phil.
Philip R.
Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]
Jens W. [2008-04-26 22:45]:
maybe i’ll be able to come up with something that wraps flori’s
Amatch into (Enumerable|File)#agrep.
that was actually pretty easy and is definitely an improvement (see
ruby-nuggets v0.1.9), but it still won’t give us support for regular
expression patterns 
i also added IO::agrep, so you would now be able to do:
%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.agrep(file, /Smith|J.*12345/, 2)
}
– if only you had regular expressions at your disposal!
cheers
jens
jens,
On Sun, 2008-04-27 at 07:03 +0900, Jens W. wrote:
matches + File.agrep(file, /Smith\|J.*12345/, 2)
}
– if only you had regular expressions at your disposal!
Yes, that would be nice! . . I guess it will be there sometime.
Thanks for looking at this!
Regards,
Phil.
Philip R.
Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]
Phil R. [2008-04-26 22:26]:
I see what you are doing but this would have to be repeated for
the three different lists (list1.txt, list2.txt, list3.txt)
well, yeah. but that’s not really a problem, is it?
%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.open(file).select { |line|
# …same as before…
}
}
I don’t know if it helps but there is this:
http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A3858438834D.aspx
=> http://amatch.rubyforge.org
silly me!! totally forgot about that one
thanks for the reminder!
maybe i’ll be able to come up with something that wraps flori’s
Amatch into (Enumerable|File)#agrep.
I was wondering about something like that but I have never
created a Ruby binding before . .
neither have i. but that shouldn’t stop us, right? 
cheers
jens