I’ve got a table that consists of about 2.6 millions rows, each row
containing 5 columns (a short string, 3 integers, and a decimal
fraction).
A program needs to read the whole table. At first, programming without
brain fully engaged, I just grabbed it in Perl using the DBI function
selectall_arrayref, and then looped through the results.
That uses about 1.3 gig of memory. Perl data structures are not
efficient when it comes to memory. This approach is reading the whole
table (which is a mere 130 MB according to MySQL) into a Perl data
structure (array of rows, each row an array of columns), which I then
scan to build my data structures.
Recoding to not be so brain dead, instead using prepare and execute,
cuts the memory usage to about 270 MB. That makes sense, as the table is
now being loaded into some data structure maintained by the C library
that implements the Perl DBI driver for mysql.
Out of curiosity, I decided to try this in other languages. Python is
the same as Perl. 270 MB when I execute the store_result() function to
grab the data from the database.
Same in C. 270 MB used when I grab the results with
mysql_store_result().
This all makes sense, assuming the Perl and Python database drivers are
using the C mysql library.
So then I tried Ruby:
require “mysql”
my = Mysql.new(…)
st = my.prepare(“SELECT * from my_table”)
st.execute
while row = st.fetch do
…
end
That only uses about 130 MB of memory!?
The above was on a 64-bit machine. On a 32-bit machine, the numbers are
185 MB for C, Perl, and Python, and 103 MB for Ruby.
How is Ruby able to do so much better on memory usage?
I did not test speed, but I think the Ruby version also ran a little
faster than the others.