Howto get array.agrep (NOT array.grep)

People,

Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn’t . .

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

Phil R. wrote:

Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn’t . .

Perhaps if you explained what this mysterious ‘agrep’ was, we might
help.
Something from another language? A unix utility?

Give us a sample array, and what you’d like the result to be after
calling this method on that array.

On Sat, 2008-04-26 at 13:15 +0900, Phrogz wrote:

Phil R. wrote:

Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn’t . .

Perhaps if you explained what this mysterious ‘agrep’ was, we might
help.
Something from another language? A unix utility?

Give us a sample array, and what you’d like the result to be after
calling this method on that array.

NAME
agrep - print lines approximately matching a pattern

SYNOPSIS
agrep [OPTION]… PATTERN [FILE]…

DESCRIPTION
Searches for approximate matches of PATTERN in each FILE or
standard input. Exam-
ple: ‘agrep -2 optimize foo.txt’ outputs all lines in file
‘foo.txt’ that match
“optimize” within two errors. E.g. lines which contain
“optimise”, “optmise”, and
“opitmize” all match.


Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

NAME
agrep - print lines approximately matching a pattern

Enurable#grep can do that, if you pass it the right block. When you pass
a block to grep it’s the block’s job to match the elements.

Now the interesting question is: How would that block look like?

mfg, simon … l

On Apr 26, 2008, at 03:35 , Simon K. wrote:

NAME
agrep - print lines approximately matching a pattern

Enurable#grep can do that, if you pass it the right block. When you
pass
a block to grep it’s the block’s job to match the elements.

no.

 enum.grep(pattern)                   => array
 enum.grep(pattern) {| obj | block }  => array

 Returns an array of every element in _enum_ for which +Pattern  

===
element+. If the optional block is supplied, each matching
element is passed to it, and the block’s result is stored in the
output array.

The block just morphs the result, it doesn’t morph the match.

hi phil!

if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn’t support
regular expressions in the pattern because i don’t how to achieve
that easily without re-implementing agrep’s algorithm :wink: it’s
really just a quick hack that might get you started, hopefully.

[1]
http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M000091

cheers
jens

jens,

On Sat, 2008-04-26 at 23:15 +0900, Jens W. wrote:

hi phil!

if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn’t support
regular expressions in the pattern because i don’t how to achieve
that easily without re-implementing agrep’s algorithm :wink: it’s
really just a quick hack that might get you started, hopefully.

[1]
http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M000091

This might work but it would be more difficult without regexs - the
current application does a system call to agrep but of course it is very
slow for large numbers of calls. A typical call is something like:

agrep -2 “Smith|J.*12345” list1.txt list2.txt list3.txt

This allows two differences on a minimum amount of information
consisting of last name, first initial and zip code. If I use the
Enumerable version, I would have to use the whole, delimited, name &
address string and increase the differences/distance number?

Did you just do that hack now? - how do I get/install it? (Fedora 8).

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

Phil R. [2008-04-26 19:13]:

name & address string and increase the differences/distance
number?
i think something like that could work in your case (requires the
Text gem):

File.open(‘list1.txt’).select { |line|
# extract name and zip code from line
line =~ /\A(.?|.).\b(\d{5})\b/ # adjust appropriately!

# name may have two errors, zip only one -- or whatever...
Text::Levenshtein.distance($1, 'Smith|J') <= 2 &&

Text::Levenshtein.distance($2, ‘12345’) <= 1
}

Did you just do that hack now?
that’s right. but i just read a bit on agrep’s algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).
as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits… :wink:

  • how do I get/install it? (Fedora 8).
    well, i don’t think that particular implementation suits your needs
    and is obviously easily adapted (after all, it’s just a select with
    an appropriate block utilizing Text::Levenshtein.distance). but you
    can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
    if the new version hasn’t found its way onto the mirrors yet, from
    our own gem server at http://prometheus.khi.uni-koeln.de/rubygems/.

cheers
jens

jens,

On Sun, 2008-04-27 at 02:50 +0900, Jens W. wrote:

the Enumerable version, I would have to use the whole, delimited,
# name may have two errors, zip only one – or whatever…
Text::Levenshtein.distance($1, ‘Smith|J’) <= 2 &&
Text::Levenshtein.distance($2, ‘12345’) <= 1
}

I see what you are doing but this would have to be repeated for the
three different lists (list1.txt, list2.txt, list3.txt) - I guess that
should still be faster than a single system call . .

Did you just do that hack now?
that’s right. but i just read a bit on agrep’s algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).

I don’t know if it helps but there is this:

http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A3858438834D.aspx

as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits… :wink:

I was wondering about something like that but I have never created a
Ruby binding before . .

  • how do I get/install it? (Fedora 8).
    well, i don’t think that particular implementation suits your needs
    and is obviously easily adapted (after all, it’s just a select with
    an appropriate block utilizing Text::Levenshtein.distance). but you
    can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
    if the new version hasn’t found its way onto the mirrors yet, from
    our own gem server at http://prometheus.khi.uni-koeln.de/rubygems/.

Thanks!

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

Jens W. [2008-04-26 22:45]:

maybe i’ll be able to come up with something that wraps flori’s
Amatch into (Enumerable|File)#agrep.
that was actually pretty easy and is definitely an improvement (see
ruby-nuggets v0.1.9), but it still won’t give us support for regular
expression patterns :frowning:

i also added IO::agrep, so you would now be able to do:

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.agrep(file, /Smith|J.*12345/, 2)
}

– if only you had regular expressions at your disposal!

cheers
jens

jens,

On Sun, 2008-04-27 at 07:03 +0900, Jens W. wrote:

matches + File.agrep(file, /Smith\|J.*12345/, 2)

}

– if only you had regular expressions at your disposal!

Yes, that would be nice! . . I guess it will be there sometime.

Thanks for looking at this!

Regards,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

Phil R. [2008-04-26 22:26]:

I see what you are doing but this would have to be repeated for
the three different lists (list1.txt, list2.txt, list3.txt)
well, yeah. but that’s not really a problem, is it?

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.open(file).select { |line|
# …same as before…
}
}

I don’t know if it helps but there is this:

http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A3858438834D.aspx
=> http://amatch.rubyforge.org

silly me!! totally forgot about that one :wink: thanks for the reminder!

maybe i’ll be able to come up with something that wraps flori’s
Amatch into (Enumerable|File)#agrep.

I was wondering about something like that but I have never
created a Ruby binding before . .
neither have i. but that shouldn’t stop us, right? :wink:

cheers
jens

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs