Using ruby hash on array

sclarke · December 17, 2008, 11:28pm

I would like to process some data from an array and using hash to
perform a count on one aspect of the data in the array. The array holds
lines of similarly formatted data like so

data A data B data C data D data E data F
data A data B data C data D data E data F
data A data B data C data D data E data F

data A is the identifier of each row full of data, data F contains a lot
of data which requires a regular expression which is passed to a method
for example:

“#{data F[/Name:\t(.+?)\r\n/, 1]}”

I am stuck writing the hash section of the code. The alogorithm is

if data A == 100
load data F reg expression into hash
end
count data F to determine the number of each name extracted by the
regular expression
print bob 6, sue 12, tim 1

I hope this makes sense and many thanks in advance

sclarke · December 17, 2008, 11:49pm

I don’t know if I got you right…
If a is the identifier, then your data structure could be something
like:

hash = { ‘data A’ => [‘data B’, ‘data C’, ‘data D’, ‘data E’, ‘data F’],
‘data A’ => [‘data B’, ‘data C’, ‘data D’, ‘data E’, ‘data F’],
‘data A’ => [‘data B’, ‘data C’, ‘data D’, ‘data E’, ‘data F’] }

where you could access a data A like hash[‘data A’] or with if
hash.has_key?(‘data A’)
and data F with hash[‘data A’][4].

2008/12/17 Stuart C. [email protected]

sclarke · December 18, 2008, 9:42am

2008/12/17 Stuart C. [email protected]:

I would like to process some data from an array and using hash to
perform a count on one aspect of the data in the array. The array holds
lines of similarly formatted data like so

data A data B data C data D data E data F
data A data B data C data D data E data F
data A data B data C data D data E data F

You do not say whether the format is delimited or fixed width.

end
count data F to determine the number of each name extracted by the
regular expression
print bob 6, sue 12, tim 1

A framework:

Data = Struct.new :a, , :c, :d, :e, :f

def Data.parse(line)
d = new(*line.strip.split(/\s+/))
d.f = Integer(d.f)
d
end

count = Hash.new 0

ARGF.each do |line|
data = Data.parse(line)
count[data.f[/Name:\t(.+?)\r\n/, 1]] += 1 if data.a == 100
end

count.each do |a, cnt|
printf “%20s %6d\n”, a, cnt
end

Kind regards

robert

sclarke · December 18, 2008, 12:41pm

I am having a difficult time understanding what you are asking, but
perhaps this will help:

names=Hash.new(0)
lines.each do |line|
Â a,b,c,d,e=line.chomp.split("|")
Â names[e[/Name:([a-z]+)/,1]] += 1 if a==“200”
end

names

=> {“tim”=>2, “sue”=>1}

sclarke · December 18, 2008, 3:57pm

Sorry for not being clear. I am actually parsing data from Windows event
logs and as a result the data is held in structured fields. Earlier in
my program I load the contents of a number of event logs into an array
and then ready the data using structs for example to read the ID number

event.event_id more complicated for descriptions -
event.description[/Name:\t(.+?)\r\n/, 1]}

So what I would like to do is read my event log array and check for
specific event ID’s (if event.event_id == 100).

If the if statement finds the ID 100 it reads name from
event.description. Everything mention thus far is working correctly.

When this is complete I then want to do a count on how many times each
name occurs e.g. bob = 2, sue = 12.
My thoughts were to load the event.description[/name:/] into a hash and
then do a count on each name in the hash and print it out.

Does this make more sense??

Louis-Philippe wrote:

I don’t know if I got you right…
If a is the identifier, then your data structure could be something
like:

hash = { ‘data A’ => [‘data B’, ‘data C’, ‘data D’, ‘data E’, ‘data F’],
‘data A’ => [‘data B’, ‘data C’, ‘data D’, ‘data E’, ‘data F’],
‘data A’ => [‘data B’, ‘data C’, ‘data D’, ‘data E’, ‘data F’] }

where you could access a data A like hash[‘data A’] or with if
hash.has_key?(‘data A’)
and data F with hash[‘data A’][4].

2008/12/17 Stuart C. [email protected]

sclarke · December 18, 2008, 3:57pm

Sorry for not being clear. I am actually parsing data from Windows event
logs and as a result the data is held in structured fields. Earlier in
my program I load the contents of a number of event logs into an array
and then ready the data using structs for example to read the ID number

event.event_id more complicated for descriptions -
event.description[/Name:\t(.+?)\r\n/, 1]}

So what I would like to do is read my event log array and check for
specific event ID’s (if event.event_id == 100).

If the if statement finds the ID 100 it reads name from
event.description. Everything mention thus far is working correctly.

When this is complete I then want to do a count on how many times each
name occurs e.g. bob = 2, sue = 12.
My thoughts were to load the event.description[/name:/] into a hash and
then do a count on each name in the hash and print it out.

Does this make more sense??

Dan D. wrote:

I am having a difficult time understanding what you are asking, but
perhaps this will help:

lines = <<EOF
100|data B|data C|data D|data E:Name:bob
200|data B|data C|data D|data E:Name:sue
200|data B|data C|data D|data E:Name:tim
200|data B|data C|data D|data E:Name:tim
EOF

names=Hash.new(0)
lines.each do |line|
Â a,b,c,d,e=line.chomp.split("|")
Â names[e[/Name:([a-z]+)/,1]] += 1 if a==“200”
end

names

=> {“tim”=>2, “sue”=>1}

sclarke · December 18, 2008, 3:58pm

Sorry for not being clear. I am actually parsing data from Windows event
logs and as a result the data is held in structured fields. Earlier in
my program I load the contents of a number of event logs into an array
and then ready the data using structs for example to read the ID number

event.event_id more complicated for descriptions -
event.description[/Name:\t(.+?)\r\n/, 1]}

So what I would like to do is read my event log array and check for
specific event ID’s (if event.event_id == 100).

If the if statement finds the ID 100 it reads name from
event.description. Everything mention thus far is working correctly.

When this is complete I then want to do a count on how many times each
name occurs e.g. bob = 2, sue = 12.
My thoughts were to load the event.description[/name:/] into a hash and
then do a count on each name in the hash and print it out.

Does this make more sense??

Robert K. wrote:

2008/12/17 Stuart C. [email protected]:

I would like to process some data from an array and using hash to
perform a count on one aspect of the data in the array. The array holds
lines of similarly formatted data like so

data A data B data C data D data E data F
data A data B data C data D data E data F
data A data B data C data D data E data F

You do not say whether the format is delimited or fixed width.

end
count data F to determine the number of each name extracted by the
regular expression
print bob 6, sue 12, tim 1

A framework:

Data = Struct.new :a, , :c, :d, :e, :f

def Data.parse(line)
d = new(*line.strip.split(/\s+/))
d.f = Integer(d.f)
d
end

count = Hash.new 0

ARGF.each do |line|
data = Data.parse(line)
count[data.f[/Name:\t(.+?)\r\n/, 1]] += 1 if data.a == 100
end

count.each do |a, cnt|
printf “%20s %6d\n”, a, cnt
end

Kind regards

robert

sclarke · December 18, 2008, 4:04pm

2008/12/18 Stuart C. [email protected]:

Does this make more sense??

Did you actually look at my reply? You then would probably not have
sent the same answer three times…

robert

sclarke · December 18, 2008, 8:20pm

Sorry I was getting to replying to you then got called off in a hurry. I
do not understand some of the code you have used as I am relatively new
to Ruby. Can you briefly outline the code, it would be greatly
appreciated.

Robert K. wrote:

2008/12/18 Stuart C. [email protected]:

Does this make more sense??

Did you actually look at my reply? You then would probably not have
sent the same answer three times…

robert

sclarke · December 18, 2008, 11:35pm

On 18.12.2008 20:12, Stuart C. wrote:

Sorry I was getting to replying to you then got called off in a hurry. I
do not understand some of the code you have used as I am relatively new
to Ruby. Can you briefly outline the code, it would be greatly
appreciated.

I define a class holding the data you want to parse from the file via
Struct. I then also define a parse method on that class that will take
a line (basically a String) and parse it into a new data structure. You
will likely have to change these as field names like “a” and “b” are not
really telling and also you might want to do the parsing differently.

Then I initialize a Hash with default value 0. This is the value
returned for keys that are not present in the Hash.

Then the code reads from all input files (ARGF) named on the command
line (or stdin if there is no name), parses each line and increments the
counter for the entry.

Finally the Hash is sorted by key and key value pairs are printed.

Cheers

robert

sclarke · December 19, 2008, 12:00am

On Thu, Dec 18, 2008 at 1:12 PM, Stuart C.
[email protected] wrote:

Sorry I was getting to replying to you then got called off in a hurry. I
do not understand some of the code you have used as I am relatively new
to Ruby. Can you briefly outline the code, it would be greatly
appreciated.

Robert K. wrote earlier:
A framework:

Data = Struct.new :a, , :c, :d, :e, :f
(leave off the extra : after b

Struct is an automatic way to define a simple class. There, he’s
defining the structure of a class that he wants to call Data with the
use of Struct. So with this statement, we will have a class that has
attributes a, b, c, d, e, f, all with getters and setters, and (at
least in 1.8.7 AFAIK) an initialize method. For example…

MySimpleClass = Struct.new :instance_var
m = MySimpleClass.new “hi”
puts m.instance_var
m.instance_var = “bye”
puts m.instance_var
#hi
#bye

def Data.parse(line)
d = new(*line.strip.split(/\s+/))
d.f = Integer(d.f)
d
end

There he is defining a class method called parse on the class Data.
This method returns a Data object filled with the info that was in the
string “line”.

count = Hash.new 0

As explained, that was a Hash initialization with a default value of 0

ARGF.each do |line|
data = Data.parse(line)
count[data.f[/Name:\t(.+?)\r\n/, 1]] += 1 if data.a == 100
end

Rack up the count if the conditions are met

count.each do |a, cnt|
printf “%20s %6d\n”, a, cnt
end

Print the darn thing out!

hth,
Todd

sclarke · December 21, 2008, 5:34pm

Thanks for your help Todd. I have gone for a different approach which I
will post should it work.

I was wondering could you explain this code:

count.each do |a, cnt|
printf “%20s %6d\n”, a, cnt
end

I have no idea what the %20s … is doing.

Many thanks

Todd B. wrote:

On Thu, Dec 18, 2008 at 1:12 PM, Stuart C.
[email protected] wrote:

Sorry I was getting to replying to you then got called off in a hurry. I
do not understand some of the code you have used as I am relatively new
to Ruby. Can you briefly outline the code, it would be greatly
appreciated.

Robert K. wrote earlier:
A framework:

Data = Struct.new :a, , :c, :d, :e, :f
(leave off the extra : after b

Struct is an automatic way to define a simple class. There, he’s
defining the structure of a class that he wants to call Data with the
use of Struct. So with this statement, we will have a class that has
attributes a, b, c, d, e, f, all with getters and setters, and (at
least in 1.8.7 AFAIK) an initialize method. For example…

MySimpleClass = Struct.new :instance_var
m = MySimpleClass.new “hi”
puts m.instance_var
m.instance_var = “bye”
puts m.instance_var
#hi
#bye

def Data.parse(line)
d = new(*line.strip.split(/\s+/))
d.f = Integer(d.f)
d
end

There he is defining a class method called parse on the class Data.
This method returns a Data object filled with the info that was in the
string “line”.

count = Hash.new 0

As explained, that was a Hash initialization with a default value of 0

ARGF.each do |line|
data = Data.parse(line)
count[data.f[/Name:\t(.+?)\r\n/, 1]] += 1 if data.a == 100
end

Rack up the count if the conditions are met

count.each do |a, cnt|
printf “%20s %6d\n”, a, cnt
end

Print the darn thing out!

hth,
Todd

sclarke · December 21, 2008, 8:06pm

Stuart C. wrote:

count.eachÂ doÂ |a,Â cnt|
printfÂ “%20sÂ %6d\n”,Â a,Â cnt
end

I have no idea what the %20s … is doing.

It’s just formatting/padding. It will right adjust the output on the
column 20 chars over.

sclarke · December 22, 2008, 11:29am

I have no idea what the %20s … is doing.

The format delimiters are quite common in the whole “programming world”.

I.e. c printf and so on.

For example:

“%-9s” % “12251” # => “12251 "
“%9s” % “12251” # => " 12251”
“%09d” % 12251 # => “000012251”
“%015d” % “123456” # => “000000000123456”