Forum: Ruby How to do this complicated logic in ruby

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Af3cecc8af253b5acd3c09c5b67c0074?d=identicon&s=25 Valentino Lun (on9west)
on 2009-02-16 11:00
Dear all

I have an array with size around 1000, I want to perform some data
checking and correction in this array.

For instance, the first record of this array is a hash, as follow
my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
"pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}

server  hosp  loc   pspec   pcat
AHN     AHN   PC1   ANA     1
PWH     AHN   PC1   ANA     1
NDH     AHN   PC1   ANA     2     <= This pcat value need update in
array1
TMH     AHN   PC1   ANA     2     <= This pcat value need update in
array1
.......
.....
...
(around 1000 records)

When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array["server"] equal to
array["hosp"].

I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

Many thanks
Valentino
3131fcea0a711e5ad89c8d49cc9253b4?d=identicon&s=25 Julian Leviston (Guest)
on 2009-02-16 11:57
(Received via mailing list)
Loop the array, changing the values of the hash as you go based on
some conditional. It's not complex at all. What are you finding
difficult?

Blog: http://random8.zenunit.com/
Learn rails: http://sensei.zenunit.com/
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2009-02-16 11:59
(Received via mailing list)
On Mon, Feb 16, 2009 at 3:30 PM, Valentino Lun <sumwo@yahoo.com> wrote:
> AHN     AHN   PC1   ANA     1
> When keys hosp, loc, pspec has the same values, their pcat must be
> identical. So, there is problem in the last two records, the key pcat
> should be 1, because the pcat is correct if array["server"] equal to
> array["hosp"].

Simple way:

1. Have a 'signature' for each row, composed of the hosp, loc and
pspec. Could be as simple as

def signature(ary, row)
  %w(hosp loc pspec).map {|k| ary[row][k]}.join(",")
end

2. Collect all the rows with the same signature

verify = Hash.new {|h,k| h[k] = []}
ary.each_with_index {|row, i|
   h[signature(ary, row)] << [i, row['pcat']]
}

3. See if there are any problems

verify.each_pair {|k, v|
  if v.length > 1
    fix_array_for(v)
  end
}

4. Write fix_array_for(v)

Note that v is an array of pairs of [index, pcat]. So for your
example, it would be
[[0,1], [1,1], [2,2], [3,2]]

you basically need to iterate over that array, see which pcat is
right, then iterate over it once more and set all the pcats to the
right value.

There are probably more efficient ways to do all this, but this has
the advantage of being straightforward.

martin
045393257ca8795742d87e6b2945f151?d=identicon&s=25 Luiz Vitor Martinez Cardoso (Guest)
on 2009-02-16 12:14
(Received via mailing list)
Using Symbols here make a big sense. Try to structure your array like:

 my_array[0] = {:server => "AHN", :hosp =>"AHN", :loc =>"PC1",
:pspec=>"ANA", :number=>"1", :pcat=>"1"}

And for all the values that are frequently repeated use Symbols.
Basically
when you use Symbols you create one object and all the times that you
use
one object with the same name you create a referece to this object and
NOT
another object. Making that you will free memory.

Regards,
Luiz Vitor.

On Mon, Feb 16, 2009 at 7:57 AM, Martin DeMello
<martindemello@gmail.com>wrote:

> > server  hosp  loc   pspec   pcat
> >
> def signature(ary, row)
> 3. See if there are any problems
> example, it would be
>
>


--
Regards,

Luiz Vitor Martinez Cardoso
cel.: (11) 8187-8662
blog: rubz.org
engineer student at maua.br

"Posso nunca chegar a ser o melhor engenheiro do mundo, mas tenha
certeza de
que eu vou lutar com todas as minhas forças para ser o melhor engenheiro
que
eu puder ser"
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2009-02-16 12:42
(Received via mailing list)
On Mon, Feb 16, 2009 at 4:43 PM, Luiz Vitor Martinez Cardoso
<grabber@gmail.com> wrote:
> Using Symbols here make a big sense. Try to structure your array like:
>
>  my_array[0] = {:server => "AHN", :hosp =>"AHN", :loc =>"PC1",
> :pspec=>"ANA", :number=>"1", :pcat=>"1"}
>
> And for all the values that are frequently repeated use Symbols. Basically
> when you use Symbols you create one object and all the times that you use
> one object with the same name you create a referece to this object and NOT
> another object. Making that you will free memory.

Even better: http://www.codeforpeople.com/lib/ruby/arrayfields/

martin
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-02-16 13:09
(Received via mailing list)
2009/2/16 Valentino Lun <sumwo@yahoo.com>:
> AHN     AHN   PC1   ANA     1
> When keys hosp, loc, pspec has the same values, their pcat must be
> identical. So, there is problem in the last two records, the key pcat
> should be 1, because the pcat is correct if array["server"] equal to
> array["hosp"].
>
> I cannot figure out the logic to doing this in ruby (even in other
> language). Can someone give me some hints on this? Thanks

IMHO this is plainly the wrong data structure for the task.  Since you
identify entries by their hosp, loc, pspec you should *index* the
whole thing by these columns.  Also, since your Hashes seem to be
uniform I would rather define a particular type for this, e.g.

Entry = Struct.new :server, :hosp, :loc, :pspec, :pcat

EntryKey = Struct.new :server, :hosp, :loc do
  def self.create(entry)
    new(*members.map {|m| entry[m]})
  end
end

index = Hash.new {|h,k| h[k] = []}
# loop reading input
  entry = ...
  index[EntryKey.create(entry)] << entry

# now you can process them or do it while reading

See also Martin's reply which goes into the same direction just with a
different approach.

Cheers

robert
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2009-02-16 13:23
(Received via mailing list)
On Mon, Feb 16, 2009 at 5:37 PM, Robert Klemme
<shortcutter@googlemail.com> wrote:
>
> # now you can process them or do it while reading
>
> See also Martin's reply which goes into the same direction just with a
> different approach.

The different approach is mostly due to the fact that I'm
uncomfortable using objects with mutable fieds as hash keys. I prefer
to explicitly map them to a string, and then use that string as a hash
key.

martin
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-02-16 17:40
(Received via mailing list)
2009/2/16 Martin DeMello <martindemello@gmail.com>:
>>  entry = ...
> key.
Hehe, that would be something *I* would be uncomfortable with. :-)  It
is interesting that you advertise this approach as a more robust one.
Because IMHO this is more on the hackish side of things because
instead of using a structured type you lump everything into a single
unstructured object. This can break awfully (i.e. in your example, if
fields contain "," in different places).

The nice thing about Struct is that it defines #==, #eql? and #hash
properly making generated classes suitable as Hash keys.  If you are
afraid of mutations you can always freeze keys.

Kind regards

robert
50b2daf0e7666574579b9edaf8f2b69a?d=identicon&s=25 Pit Capitain (Guest)
on 2009-02-16 19:44
(Received via mailing list)
2009/2/16 Valentino Lun <sumwo@yahoo.com>:
> I cannot figure out the logic to doing this in ruby (even in other
> language). Can someone give me some hints on this? Thanks

While I agree on what the others have said, that you should create a
better data structure, here's a way to do what you wanted with your
array of hashes. But look at the other posts. It's easy to build good
data structures in Ruby.

  # create a key for the given record to be used in the pcat hash
  def pcat_key(record)
    [record["hosp"], record["loc"], record["psec"]]
  end

  # build hash with valid pcat values
  pcat = {}
  my_array.each do |record|
    next unless record["server"] == record["hosp"]
    pcat[pcat_key(record)] = record["pcat"]
  end

  # look for invalid records
  my_array.each do |record|
    next if record["pcat"] == pcat[pcat_key(record)]
    # do something with the invalid record
    p record
  end

Regards,
Pit
Af3cecc8af253b5acd3c09c5b67c0074?d=identicon&s=25 Valentino Lun (on9west)
on 2009-02-17 10:00
Dear all

Thank you for your help. Finally, I used about 5 hours (>_<) to figure
out my solution and it works..But it takes long time to execute.

Below is my code to share with you all, and I am seeking your expert
advices if any optimization can be done. Thank you.


# data collection about 5000 records for each variable (lis, gcrs)
lis = ActiveRecord::Base.connection.execute("select * from lis_requests
order by hosp, spec, loc, pspec")
gcrs = ActiveRecord::Base.connection.execute("select * from
gcrs_requests order by hosp, spec, loc, pspec")

def find_correct_pcat(arr)

  server_ref = {"AHN" => "AHN", "TPH" => "AHN",
                "NDH" => "NDH", "BBH" => "NDH", "CHS" => "NDH",
                "PWH" => "PWH", "SH" => "PWH"}

  arr.each do |x|
    return x["pcat"] if x["server"] == server_ref[x["hosp"]]
  end

  #if not, then find the pcat with the largest "number"
  a.sort_by {|y| y["number"].to_i}.last["pcat"]

end

# The result will put in this hash
result = {}

#looping in all index key and get the result.
lis.collect {|x| [x["hosp"],x["spec"],x["loc"],x["pspec"]]}.uniq.each do
|index_key|

  lis_record = lis.select {|x| x["hosp"] == index_key[0] and x["spec"]
== index_key[1] and x["loc"] == index_key[2] and x["pspec"] ==
index_key[3]}
  gcrs_record = gcrs.select {|x| x["hosp"] == index_key[0] and x["spec"]
== index_key[1] and x["loc"] == index_key[2] and x["pspec"] ==
index_key[3]}
  lis_req_count = lis_record.inject(0) {|sum,n| sum + n["number"].to_i }
  gcrs_req_count = gcrs_record.inject(0) {|sum,n| sum + n["number"].to_i
}

  if lis_record.collect {|x| x["pcat"]}.uniq.size == 1
    pcat = lis_record.first["pcat"]
  else
    pcat = find_correct_pcat(lis_record)
  end

  result[index_key] = [pcat, gcrs_req_count, lis_req_count]

end

Thanks again
Valentino
This topic is locked and can not be replied to.