Ruby Database Query with Grouping

I am working in an application that has a Lucene and apache derby
database. As my ruby skills are quite new, I am not sure where ruby
ends and the application begins.

At this point, I am trying to run some counts for data that has been
indexed in the system. Attached is the beginning of a script. I know
it’s wrong. Let me preface with that. What I am trying to get is
different counts, and group them by a piece of captured metatdata called
“DateProcessed”.

What I would end up with is something like

Date Processed - Count of Items - SizeBytes of Items - Immaterial Items

  • SizeBytes of Immaterial Items - etc

1/1/12 - 500 - 20934823 - 210 - 239048
1/9/12 - 1000 - 40102983 - 330 - 435875

etc.

What I get with the current attached script is this:
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048
1/1/12 - 500 - 20934823 - 210 - 239048

etc.

Can anyone help me? I am hoping that as I learn to fish, my questions
will get better and I will become more independent.

Hi,

One of the problem(s) is here…

t_ImmaterialSizes2 = 0

t_ImmaterialSizes2, as you wrote, points to an Array (line 4).

You should access its content by index like
t_ImmaterialSizes2[102]

Another of note is at

t_DateProcessed.each do |date|

#PUT EACH SETS OF ITEMS THAT ARE NEEDED FOR THE REPORT TO THE CONSOLE
(EVENTUALLY WILL BE OUTPUT TO A FILE)
puts “#{t_DateProcessed} , #{t_ImmaterialSizes2}”

end

Look, “each t_DateProcessed item is passed ‘through’ the variable date”.
So, inside the each block, ‘date’ is what you want.

I’ll try to refactor and send to you because I think the way you’re
going is not pointing toward what you want. (just a minute).

Abinoam Jr.

Hi Courtney,

Tell me if this code runs.

Abinoam Jr.

Thanks for your response! You code is actually throwing an error in the
console:

NoMethodError: undefined method `inject’ for -1:Fixnum
(root) at :26
call at org/jruby/RubyProc.java:268
call at org/jruby/RubyProc.java:228
each at /builtin/java/java.util.rb:7
(root) at :18

It goes on, but that is the gist. I would be interested in trying your
method. I totally acknowledge that I am self taught and just made these
arrays because I don’t know how else to accomplish this.

They are basically attributes of the same items. We process a hard
drive of data and the metadata and text go into the index/database.

I am just querying that with ruby to get a report of what processed on
what date and group the totals by the date information.

I found that my ruby didn’t give the exact output I mentioned last
night. IT was more like this:
1/1/121/9/12 - 1500 - 60934823 - 540 - 469048
1/1/121/9/12 - 1500 - 60934823 - 540 - 469048

Its putting the two dates running into each other, and summing the whole
total and then repeating the output. btw it took like 6 hours to get
those 2 lines to run on an index with 2.5 million records.

Would hope to make this more efficient as I build it. Those are just a
couple of the numbers I need. In the end I need to get like 20
different aggregates into the report. I just know that if I can get the
best method down, I can replicate it.

Thanks for taking the time to help me!

I found this:
http://markusjais.com/the-group_by-method-from-rubys-enumerable-mixin-and-compared-with-scal/

But I don’t know how that would work with what I am doing.

Sorry that I don’t know how to accomplish what I need to accomplish. I
just know how to run the report manually in the application. Hoping I
can develop the ruby script to make it faster and more efficient.

I used the other method you put in the code to run the sum, and am
trying it now. It has been running for an hour. Once it finishes, I
will report back on what the output looks like.

Thanks again. I appreciate you taking the time to look at this issue.