Help on best way to gather/sort results [Array/Hash]?

Greetings ruby fans,

I’m a greenhorn at this cool lang ruby. Much to learn. Perhaps you
chaps could help me with an issue I have. I’ve read through a number of
the post on sorting Arrays and Hashes. And yet I can’t seem to put my
finger on the solution. I want to sort on the second column. So it
seemed from what information I gathered, that I need to gather my
results into a hash. Am I on the right track? Oh, let me tell you what
your looking at here; I am scanning each mail file in our queue for
commonalites (spammer) instead of the useless (my opinoin) qmHandle we
have for qmail. So, I’ve got a working prototype. If you could help me
on my sort and if you have any other comments/suggestions to throw my
way I’m sure I could learn a thing or two. Being new to ruby, there’s a
lot of new ideas here. Thank guys.

Code:
#!/usr/local/bin/ruby -w
require ‘find’

@results = Array.new

Iterate through the child directories & call the parse file method

def scan_dirs
root = “/var/qmail/queue/mess”
Find.find(root) do |file|
parse_file(file)
end
@results.sort!
print_results
end

Parse each file for the information we want

def parse_file(path)
file = path[(path.length-7), path.length]
sourceip = “”
email = “”
subject = “”
email_found = false
line_no = 0

File.open(path, ‘r’).each do |line|

line = line.strip # Remove any \n\r nil, etc
line_no += 1

if line_no == 1
  if line.match("invoked for bounce")
    # Internal Bounce Msg
    sourceip = "SMTP"
  end
end

if (line_no == 2 and sourceip.empty?)
  if line.match("webmail.commspeed.net")
    sourceip = "Webmail"
  else
    sourceip = line.scan(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/)
    if sourceip.empty?
      sourceip = "No Source IP**"
    end
  end
end

if (line.match("SquirrelMail") and sourceip == "Webmail") or
   (line.match("From:") and sourceip != "Webmail")
   if email.empty?
       email = get_email(line)
   end
end

if line.match("Subject:") and subject.empty?
  subject = truncate(line,50)
end

if line_no == 20 #Nothing more we want to read in the file
@results << ["#{file}", "#{sourceip}", "#{email}", "#{subject}"]
  line_no = 0
  return
end

end
end

Truncate subject line

def truncate(string, width)
if string.length <= width
string
else
string[0, width-3] + “…”
end
end

Print out results

def print_results
print “\e[2J\e[f”

print “Mess#”.ljust(10," “)
print “Source”.ljust(18,” ")
print “Email Addrress”.ljust(30, " ")
print “Subject”.ljust(50, " ")
1.times { print “\n” }
111.times { print “-” }
1.times { print “\n” }

@results.each do |line|
print line[0].ljust(10," “)
print line[1].ljust(18,” ")
print line[2].ljust(30, " ")
print line[3].ljust(50, " ")

1.times { print "\n" }

end
end

Get email address from line/string

def get_email(line_to_parse)

Pull the email address from the line

line_to_parse.scan(/\b[A-Z0-9._%±][email protected][A-Z0-9.-]+.[A-Z]{2,4}\b/i).flatten
end

Ok, begin our scan

scan_dirs
exit

Partial results listing: (I’ve modified the content to protect privacy)
Mess# Source Email Addrress Subject

3360108 111.111.17.1 [email protected]
3360167 111.111.7.213 [email protected] Subject:
Removed to protect the innocent…
3360186 Webmail [email protected] Subject:
Removed to protect the innocent
3360209 111.111.40.10 [email protected]
3360215 111.111.15.110 [email protected] Subject:
Removed to protect the innocent
3360217 111.111.9.248 [email protected] Subject:
Removed to protect the innocent
3360226 111.111.11.43 [email protected] Subject:
Removed to protect the innocent
3360228 111.111.16.34 [email protected] Subject:
Pictures
3360241 111.111.18.73 [email protected] Subject:
Removed to protect the innocent
3360242 111.111.14.109 [email protected] Subject:
Emailing: maps.htm

On Sat, Mar 29, 2008 at 12:02 AM, Tony De [email protected] wrote:

have for qmail. So, I’ve got a working prototype. If you could help me
on my sort and if you have any other comments/suggestions to throw my
way I’m sure I could learn a thing or two. Being new to ruby, there’s a
lot of new ideas here. Thank guys.

Being a little lazy at the moment to go through the code, have you
looked at #sort_by{}?

Todd

Todd B. wrote:

On Sat, Mar 29, 2008 at 12:02 AM, Tony De [email protected] wrote:

have for qmail. So, I’ve got a working prototype. If you could help me
on my sort and if you have any other comments/suggestions to throw my
way I’m sure I could learn a thing or two. Being new to ruby, there’s a
lot of new ideas here. Thank guys.

Being a little lazy at the moment to go through the code, have you
looked at #sort_by{}?

Todd

I believe I ran across it but I recall it blew up when I worked with it.
Likely an issue with my as the unskilled ruby coder than the method
itself. I’ll take another look. Thanks

tonyd

Wouldn’t you want something like @results.sort {|x, y| y[1] <=> x[1]}?

On Sat, Mar 29, 2008 at 3:49 PM, Tony De [email protected] wrote:

Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array “Source”. No sort was performed
at all. Scratching my head…

tonyd

Posted via http://www.ruby-forum.com/.

“Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that’s another story.”

Tony De wrote:

Todd B. wrote:

On Sat, Mar 29, 2008 at 12:02 AM, Tony De [email protected] wrote:

have for qmail. So, I’ve got a working prototype. If you could help me
on my sort and if you have any other comments/suggestions to throw my
way I’m sure I could learn a thing or two. Being new to ruby, there’s a
lot of new ideas here. Thank guys.

Being a little lazy at the moment to go through the code, have you
looked at #sort_by{}?

Todd

I believe I ran across it but I recall it blew up when I worked with it.
Likely an issue with my as the unskilled ruby coder than the method
itself. I’ll take another look. Thanks

tonyd

Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array “Source”. No sort was performed
at all. Scratching my head…

tonyd

Or perhaps what you’re really wanting, is something like
@results.sort_by{|a| @results[1]}?

On Sat, Mar 29, 2008 at 4:03 PM, Christian [email protected]
wrote:

on my sort and if you have any other comments/suggestions to throw

cigar, but that’s another story."

“Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that’s another story.”

Christian wrote:

Wouldn’t you want something like @results.sort {|x, y| y[1] <=> x[1]}?

On Sat, Mar 29, 2008 at 3:49 PM, Tony De [email protected] wrote:

Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array “Source”. No sort was performed
at all. Scratching my head…

tonyd

Posted via http://www.ruby-forum.com/.

“Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that’s another story.”

Just tried that. No sort. Just for ref. My array struct looks like
this:
@results << ["#{file}", “#{sourceip}”, “#{email}”, “#{subject}”]

I want to sort on sourceip. Thanks guys.

Does the sort work if you just put in the sourceip? Say, you did
@results <<
“#{sourceip}”, and then used @results.sort. Does it still not sort?

On Sat, Mar 29, 2008 at 4:14 PM, Tony De [email protected] wrote:

cigar,

Posted via http://www.ruby-forum.com/.

“Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that’s another story.”

On Fri, Mar 28, 2008 at 10:49 PM, Tony De [email protected] wrote:

Being a little lazy at the moment to go through the code, have you
Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array “Source”. No sort was performed
at all. Scratching my head…

Did you capture the result? #sort and #sort_by are non-destructive, so
this:


a= [ [0,3], [4,1], [5,2]]
a.sort_by {|i| i[1]}
puts a.inspect

gives: “[[0,3],[4,1],[5,2]]”

but this:

a= [ [0,3], [4,1], [5,2]]
b=a.sort_by {|i| i[1]}
puts b.inspect

gives: “[[4,1],[5,2],[0,3]]”

There is a destructive version of #sort (#sort!), but not of #sort_by.

Christian wrote:

Does the sort work if you just put in the sourceip? Say, you did
@results <<
“#{sourceip}”, and then used @results.sort. Does it still not sort?

On Sat, Mar 29, 2008 at 4:14 PM, Tony De [email protected] wrote:

cigar,

Posted via http://www.ruby-forum.com/.

“Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that’s another story.”

Yeah, if I only collect the “sourceip” and do a @results.sort it doesn’t
work. I have to do a .sort!. And the same behaviour with @results <<
["#{file}", “#{sourceip}”, “#{email}”, “#{subject}”]. @results.sort
does not sort. @results.sort! does. And I tried sort! as an after
thought.

tonyd

Hi –

On Sat, 29 Mar 2008, Tony De wrote:

why? I would really like to understand this a little more. Thanks
guys, all of you, for your help.

Just to be clear: they both work. They’re different methods, though,
and they do different things. sort! stores its results back in the
original object; sort returns the results in a new object.

The significance of the ! at the end is that the method is considered
to be the “dangerous” version of the non-! method of the same name.
(This is the conventional, intended meaning of !, although it has no
language-level significance to the interpreter.) You can think of the
“danger”, in this case, as consisting of the fact that your original
object will be altered. The ! is a kind of “heads up!” sign.

David

David A. Black wrote:

Hi –

On Sat, 29 Mar 2008, Tony De wrote:

why? I would really like to understand this a little more. Thanks
guys, all of you, for your help.

Just to be clear: they both work. They’re different methods, though,
and they do different things. sort! stores its results back in the
original object; sort returns the results in a new object.

The significance of the ! at the end is that the method is considered
to be the “dangerous” version of the non-! method of the same name.
(This is the conventional, intended meaning of !, although it has no
language-level significance to the interpreter.) You can think of the
“danger”, in this case, as consisting of the fact that your original
object will be altered. The ! is a kind of “heads up!” sign.

David

Oooooooo! We’ll that makes perfect sense. Thanks! You know, sometimes
you read your handy pickaxe or a blog somewhere, buy it slides right
past you. I appreciate the clarification.

tonyd

Tony De wrote:

Christian wrote:

Does the sort work if you just put in the sourceip? Say, you did
@results <<
“#{sourceip}”, and then used @results.sort. Does it still not sort?

On Sat, Mar 29, 2008 at 4:14 PM, Tony De [email protected] wrote:

cigar,

Posted via http://www.ruby-forum.com/.

“Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that’s another story.”

Yeah, if I only collect the “sourceip” and do a @results.sort it doesn’t
work. I have to do a .sort!. And the same behaviour with @results <<
["#{file}", “#{sourceip}”, “#{email}”, “#{subject}”]. @results.sort
does not sort. @results.sort! does. And I tried sort! as an after
thought.

tonyd

“as an after thought” got me to thinking. Try @results.sort! {|x, y|
y[1] <=> x[1]}. And it works. .sort fails, .sort! works. Any ideas
why? I would really like to understand this a little more. Thanks
guys, all of you, for your help.

tonyd

On Sat, Mar 29, 2008 at 6:02 AM, Tony De [email protected] wrote:

I’ve read through a number of
the post on sorting Arrays and Hashes. And yet I can’t seem to put my
finger on the solution. I want to sort on the second column.

sort_by is your friend. This is an example:

irb(main):002:0> result = [[1,2,3],[4,5,6],[1,3,7],[3,2,1]]
=> [[1, 2, 3], [4, 5, 6], [1, 3, 7], [3, 2, 1]]
irb(main):003:0> result.sort_by{|a| a[1]}
=> [[1, 2, 3], [3, 2, 1], [1, 3, 7], [4, 5, 6]]

Regards,

Jesus.

On Sat, Mar 29, 2008 at 1:55 AM, Tony De removed_email_a[email protected] wrote:

Oooooooo! We’ll that makes perfect sense. Thanks! You know, sometimes
you read your handy pickaxe or a blog somewhere, buy it slides right
past you. I appreciate the clarification.

tonyd

They all work. I use #sort_by all the time for legibility, and I
don’t care that much about speed for the stuff I work on. According
to the docs, sort_by doesn’t scale by speed for small key sets and
large populations (that might be your case). It does, however,
perform better when there occurs object creation for the comparison
test.

Todd

Todd B. wrote:

On Sat, Mar 29, 2008 at 1:55 AM, Tony De [email protected] wrote:

Oooooooo! We’ll that makes perfect sense. Thanks! You know, sometimes
you read your handy pickaxe or a blog somewhere, buy it slides right
past you. I appreciate the clarification.

tonyd

They all work. I use #sort_by all the time for legibility, and I
don’t care that much about speed for the stuff I work on. According
to the docs, sort_by doesn’t scale by speed for small key sets and
large populations (that might be your case). It does, however,
perform better when there occurs object creation for the comparison
test.

Todd

Thanks Jesus & Todd for your posts also. I appreciate the education.
Forums are great for getting real world experience on language usage and
gotcha’s. So I do have another question on my sort. I realize that in
addition to the sort on the second element in each row of my array
(sourceip) I would also like to then sort on the third element (email).
So my current sort is:

@results.sort! {|x, y| y[1] <=> x[1]}

So this now sorts first by element[2] and then by element[3]:
new_results = @results.sort_by { |x| [x[1], x[2]] }

There are so many ways to accomplish the same result. That dosen’t
mean, however, it’s the most efficient. Would there be a more efficient
way to do this? Not that this script is costing me a great deal in
resources. But it nice to code tight when possible. Thanks again.

tonyd

On Sat, Mar 29, 2008 at 8:45 PM, Tony De [email protected] wrote:

They all work. I use #sort_by all the time for legibility, and I
gotcha’s. So I do have another question on my sort. I realize that in
There are so many ways to accomplish the same result. That dosen’t
mean, however, it’s the most efficient. Would there be a more efficient
way to do this? Not that this script is costing me a great deal in
resources. But it nice to code tight when possible. Thanks again.

tonyd

On my machine…

a = [[3, 2, 1], [4, 5, 6], [1, 5, 7], [1, 2, 3]]

t = Time.now
10_000.times do
a.sort_by {|x| [x[1], x[2]]}
end
puts Time.now - t

t = Time.now
10_000.times do
a.sort {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
end
puts Time.now - t

10_000.times do
a.sort! {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
end
puts Time.now - t

=> 0.25 #sort_by
=> 0.453 #sort
=> 0.859 #sort!

This may be due to the creation of addition Array objects within the
block.

Just a guess.

Todd

On Sun, Mar 30, 2008 at 3:47 PM, Todd B. [email protected]
wrote:

a = [[3, 2, 1], [4, 5, 6], [1, 5, 7], [1, 2, 3]]
a.sort {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
=> 0.859 #sort!

This may be due to the creation of addition Array objects within the block.

The difference between sort and sort_by is that sort calls the block
every time it
needs to make a comparison between two elements, passing both elements.
sort_by, on the other hand, calls the block once for each element in the
array,
and calculates and records the sort value for each element. Then it
performs
the sorting algorithm against those values.

So, when is one more efficient than the other depends on the length of
the array
(well, the number of comparisons made by the sorting algorithm) and the
cost of
calculating the sort value.

Jesus.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs