How to do this complicated logic in ruby


#1

Dear all

I have an array with size around 1000, I want to perform some data
checking and correction in this array.

For instance, the first record of this array is a hash, as follow
my_array[0] = {“server”=>“AHN”, “hosp”=>“AHN”, “loc”=>“PC1”,
“pspec”=>“ANA”, “number”=>“1”, “pcat”=>“1”}

server hosp loc pspec pcat
AHN AHN PC1 ANA 1
PWH AHN PC1 ANA 1
NDH AHN PC1 ANA 2 <= This pcat value need update in
array1
TMH AHN PC1 ANA 2 <= This pcat value need update in
array1



(around 1000 records)

When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array[“server”] equal to
array[“hosp”].

I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

Many thanks
Valentino


#2

Loop the array, changing the values of the hash as you go based on
some conditional. It’s not complex at all. What are you finding
difficult?

Blog: http://random8.zenunit.com/
Learn rails: http://sensei.zenunit.com/


#3

Using Symbols here make a big sense. Try to structure your array like:

my_array[0] = {:server => “AHN”, :hosp =>“AHN”, :loc =>“PC1”,
:pspec=>“ANA”, :number=>“1”, :pcat=>“1”}

And for all the values that are frequently repeated use Symbols.
Basically
when you use Symbols you create one object and all the times that you
use
one object with the same name you create a referece to this object and
NOT
another object. Making that you will free memory.

Regards,
Luiz Vitor.

On Mon, Feb 16, 2009 at 7:57 AM, Martin DeMello
removed_email_address@domain.invalidwrote:

server hosp loc pspec pcat

def signature(ary, row)
3. See if there are any problems
example, it would be


Regards,

Luiz Vitor Martinez C.
cel.: (11) 8187-8662
blog: rubz.org
engineer student at maua.br

“Posso nunca chegar a ser o melhor engenheiro do mundo, mas tenha
certeza de
que eu vou lutar com todas as minhas forças para ser o melhor engenheiro
que
eu puder ser”


#4

On Mon, Feb 16, 2009 at 4:43 PM, Luiz Vitor Martinez C.
removed_email_address@domain.invalid wrote:

Using Symbols here make a big sense. Try to structure your array like:

my_array[0] = {:server => “AHN”, :hosp =>“AHN”, :loc =>“PC1”,
:pspec=>“ANA”, :number=>“1”, :pcat=>“1”}

And for all the values that are frequently repeated use Symbols. Basically
when you use Symbols you create one object and all the times that you use
one object with the same name you create a referece to this object and NOT
another object. Making that you will free memory.

Even better: http://www.codeforpeople.com/lib/ruby/arrayfields/

martin


#5

2009/2/16 Valentino L. removed_email_address@domain.invalid:

AHN AHN PC1 ANA 1
When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array[“server”] equal to
array[“hosp”].

I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

IMHO this is plainly the wrong data structure for the task. Since you
identify entries by their hosp, loc, pspec you should index the
whole thing by these columns. Also, since your Hashes seem to be
uniform I would rather define a particular type for this, e.g.

Entry = Struct.new :server, :hosp, :loc, :pspec, :pcat

EntryKey = Struct.new :server, :hosp, :loc do
def self.create(entry)
new(*members.map {|m| entry[m]})
end
end

index = Hash.new {|h,k| h[k] = []}

loop reading input

entry = …
index[EntryKey.create(entry)] << entry

now you can process them or do it while reading

See also Martin’s reply which goes into the same direction just with a
different approach.

Cheers

robert


#6

On Mon, Feb 16, 2009 at 5:37 PM, Robert K.
removed_email_address@domain.invalid wrote:

now you can process them or do it while reading

See also Martin’s reply which goes into the same direction just with a
different approach.

The different approach is mostly due to the fact that I’m
uncomfortable using objects with mutable fieds as hash keys. I prefer
to explicitly map them to a string, and then use that string as a hash
key.

martin


#7

On Mon, Feb 16, 2009 at 3:30 PM, Valentino L. removed_email_address@domain.invalid wrote:

AHN AHN PC1 ANA 1
When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array[“server”] equal to
array[“hosp”].

Simple way:

  1. Have a ‘signature’ for each row, composed of the hosp, loc and
    pspec. Could be as simple as

def signature(ary, row)
%w(hosp loc pspec).map {|k| ary[row][k]}.join(",")
end

  1. Collect all the rows with the same signature

verify = Hash.new {|h,k| h[k] = []}
ary.each_with_index {|row, i|
h[signature(ary, row)] << [i, row[‘pcat’]]
}

  1. See if there are any problems

verify.each_pair {|k, v|
if v.length > 1
fix_array_for(v)
end
}

  1. Write fix_array_for(v)

Note that v is an array of pairs of [index, pcat]. So for your
example, it would be
[[0,1], [1,1], [2,2], [3,2]]

you basically need to iterate over that array, see which pcat is
right, then iterate over it once more and set all the pcats to the
right value.

There are probably more efficient ways to do all this, but this has
the advantage of being straightforward.

martin


#8

2009/2/16 Martin DeMello removed_email_address@domain.invalid:

entry = …
key.
Hehe, that would be something I would be uncomfortable with. :slight_smile: It
is interesting that you advertise this approach as a more robust one.
Because IMHO this is more on the hackish side of things because
instead of using a structured type you lump everything into a single
unstructured object. This can break awfully (i.e. in your example, if
fields contain “,” in different places).

The nice thing about Struct is that it defines #==, #eql? and #hash
properly making generated classes suitable as Hash keys. If you are
afraid of mutations you can always freeze keys.

Kind regards

robert


#9

2009/2/16 Valentino L. removed_email_address@domain.invalid:

I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

While I agree on what the others have said, that you should create a
better data structure, here’s a way to do what you wanted with your
array of hashes. But look at the other posts. It’s easy to build good
data structures in Ruby.

create a key for the given record to be used in the pcat hash

def pcat_key(record)
[record[“hosp”], record[“loc”], record[“psec”]]
end

build hash with valid pcat values

pcat = {}
my_array.each do |record|
next unless record[“server”] == record[“hosp”]
pcat[pcat_key(record)] = record[“pcat”]
end

look for invalid records

my_array.each do |record|
next if record[“pcat”] == pcat[pcat_key(record)]
# do something with the invalid record
p record
end

Regards,
Pit


#10

Dear all

Thank you for your help. Finally, I used about 5 hours (>_<) to figure
out my solution and it works…But it takes long time to execute.

Below is my code to share with you all, and I am seeking your expert
advices if any optimization can be done. Thank you.

data collection about 5000 records for each variable (lis, gcrs)

lis = ActiveRecord::Base.connection.execute(“select * from lis_requests
order by hosp, spec, loc, pspec”)
gcrs = ActiveRecord::Base.connection.execute(“select * from
gcrs_requests order by hosp, spec, loc, pspec”)

def find_correct_pcat(arr)

server_ref = {“AHN” => “AHN”, “TPH” => “AHN”,
“NDH” => “NDH”, “BBH” => “NDH”, “CHS” => “NDH”,
“PWH” => “PWH”, “SH” => “PWH”}

arr.each do |x|
return x[“pcat”] if x[“server”] == server_ref[x[“hosp”]]
end

#if not, then find the pcat with the largest “number”
a.sort_by {|y| y[“number”].to_i}.last[“pcat”]

end

The result will put in this hash

result = {}

#looping in all index key and get the result.
lis.collect {|x| [x[“hosp”],x[“spec”],x[“loc”],x[“pspec”]]}.uniq.each do
|index_key|

lis_record = lis.select {|x| x[“hosp”] == index_key[0] and x[“spec”]
== index_key[1] and x[“loc”] == index_key[2] and x[“pspec”] ==
index_key[3]}
gcrs_record = gcrs.select {|x| x[“hosp”] == index_key[0] and x[“spec”]
== index_key[1] and x[“loc”] == index_key[2] and x[“pspec”] ==
index_key[3]}
lis_req_count = lis_record.inject(0) {|sum,n| sum + n[“number”].to_i }
gcrs_req_count = gcrs_record.inject(0) {|sum,n| sum + n[“number”].to_i
}

if lis_record.collect {|x| x[“pcat”]}.uniq.size == 1
pcat = lis_record.first[“pcat”]
else
pcat = find_correct_pcat(lis_record)
end

result[index_key] = [pcat, gcrs_req_count, lis_req_count]

end

Thanks again
Valentino