Optimization on huge generating xml?

okkezSS · November 8, 2010, 4:59am

Hi all,

Currently, I’m developing a rails app that are heavy generating xml
from restful webservice. My xml representation of web service use
nokogiri gem to generates xml format that match expected format from
client. But the problem is data is quite big around 50, 000 records to
pull out from the table(millions records). I just test in my local
machine, it takes about 20 minutes to get the response from the
request.

Do you any ideas on how to optimize this problem? I’m not sure if we
don’t use ActiveRecord, and we just use pure sql statement to pull out
the data for generating xml, then the performance is huge faster or
not?

Thanks,
Samnang

samnang · November 8, 2010, 9:41am

On Nov 8, 3:57am, Samnang [email protected] wrote:

Do you any ideas on how to optimize this problem? I’m not sure if we
don’t use ActiveRecord, and we just use pure sql statement to pull out
the data for generating xml, then the performance is huge faster or
not?

Have you profiled your code to see where the bottleneck is?

Fred

samnang · November 8, 2010, 3:05pm

With the kind of equation you are providing i.e. you have huge records
to access, it will be better if you perform the ‘pure sql query’ to
test. it might be a rare practice for others to test and testify.

samnang · November 8, 2010, 6:36pm

Quoting S. [email protected]:

Do you any ideas on how to optimize this problem? I’m not sure if we
don’t use ActiveRecord, and we just use pure sql statement to pull out
the data for generating xml, then the performance is huge faster or
not?

Using SQL and libxml2 (libxml-ruby gem) directly instead of ActiveRecord
and
Nokogiri (which calls libxml-ruby) will cut the run time. I would guess
between 2x and 10x, if the code is written with speed in mind. And your
code
will be bigger and uglier.

What’s cheaper, computer time or programmer time? How many times will
this
generation be run? And are there elapsed time constraints (e.g., an
excellent
24 hour weather forecast that takes 28 hours to generate isn’t useful).

Jeffrey

samnang · November 9, 2010, 10:38am

Chris, it has to be XML because I need to pass it directly to Adobe
InDesign
to place that data on document template. This is a book generation
process,
so it rarely runs. Like Jeffery mentioned above, maybe I can use pure
xml
and libxml2 to gain the speed just only this problem.

Thanks for all of your feedbacks.

Samnang

samnang · November 8, 2010, 7:13pm

On Nov 7, 10:57 pm, Samnang [email protected] wrote:

Do you any ideas on how to optimize this problem?

Does it need to be XML? JSON is much lighter and faster. You can
also use page caching with REST, so subsequent request is just like
Apache serving flat file. Maybe try to use some sort of compression
too? I’m betting the bottleneck is getting the data over HTTP and
loaded by the client, NOT AR getting it out of DB and building XML.