Why is Ruby+MySQL memory usage way better than C, Perl, Python?

weliwarmer · February 8, 2010, 10:28pm

I’ve got a table that consists of about 2.6 millions rows, each row
containing 5 columns (a short string, 3 integers, and a decimal
fraction).

A program needs to read the whole table. At first, programming without
brain fully engaged, I just grabbed it in Perl using the DBI function
selectall_arrayref, and then looped through the results.

That uses about 1.3 gig of memory. Perl data structures are not
efficient when it comes to memory. This approach is reading the whole
table (which is a mere 130 MB according to MySQL) into a Perl data
structure (array of rows, each row an array of columns), which I then
scan to build my data structures.

Recoding to not be so brain dead, instead using prepare and execute,
cuts the memory usage to about 270 MB. That makes sense, as the table is
now being loaded into some data structure maintained by the C library
that implements the Perl DBI driver for mysql.

Out of curiosity, I decided to try this in other languages. Python is
the same as Perl. 270 MB when I execute the store_result() function to
grab the data from the database.

Same in C. 270 MB used when I grab the results with
mysql_store_result().

This all makes sense, assuming the Perl and Python database drivers are
using the C mysql library.

So then I tried Ruby:

require “mysql”
my = Mysql.new(…)
st = my.prepare(“SELECT * from my_table”)
st.execute
while row = st.fetch do
…
end

That only uses about 130 MB of memory!?

The above was on a 64-bit machine. On a 32-bit machine, the numbers are
185 MB for C, Perl, and Python, and 103 MB for Ruby.

How is Ruby able to do so much better on memory usage?

I did not test speed, but I think the Ruby version also ran a little
faster than the others.

weliwarmer · February 8, 2010, 10:48pm

On 02/08/2010 10:18 PM, Tim S. wrote:

table (which is a mere 130 MB according to MySQL) into a Perl data
grab the data from the database.
st = my.prepare(“SELECT * from my_table”)
How is Ruby able to do so much better on memory usage?

I did not test speed, but I think the Ruby version also ran a little
faster than the others.

It appears that you employ two fundamentally different approaches: the
description of your Perl and Python versions sounds as if you load the
whole table in one data structure and only then iterate or process it.
The Ruby code fetches a single row at a time (or at most a few rows if
there is caching going on behind the scenes). Well, if my assessment is
correct, you have your reason.

Kind regards

robert

weliwarmer · February 9, 2010, 12:30am

In article [email protected],
Robert K. [email protected] wrote:

That only uses about 130 MB of memory!?
description of your Perl and Python versions sounds as if you load the
whole table in one data structure and only then iterate or process it.
The Ruby code fetches a single row at a time (or at most a few rows if
there is caching going on behind the scenes). Well, if my assessment is
correct, you have your reason.

Generally, low level mysql libraries provide two ways to get results
from the server after a query is executed. In the C library, these are
called mysql_store_results and mysql_use_results.

The former reads the entire results set into a data structure. The
program then typically calls a fetch_row function that iterates over the
rows in that results set.

The latter (use_results) is also used with a row iteration function, but
only fetches the rows from the server as the program calls the row
iterator.

The execute function in the Ruby mysql library appears to use the
store_results method. It is during the execution of execute, for
instance, that the memory usage rises from something insignificant to
the final 130 MB. If it were using the use_results method, memory usage
would not noticeably rise while reading the results.

Anyway, in all four (Ruby, Perl, Python, C) I am using the store_results
form, so as to read the whole table into memory.