Accessing binary blob data in MySQL backend

Hi!

I am curious if is feasible to have Nginx deliver binary blob data which
is currently stored in a MySQL database. I was loosely following the
examples mentioned in agentzh’s blog [1] which work all fine for me with
non-binary data. But as JSON is used for intermediate serialization
which is not binary safe I was wondering if there are other ways (or
even modules) that access the raw payload data from its RDS
representation.

In my scenario I’m trying to access images data like JPEGs stored as
blobs in MySQL causing the JSON parser to fail of course

content_by_lua aborted: runtime error: InvalidJSONInput: lexical error:
invalid bytes in UTF8 string.

My current software stack consists of CentOS 5.6/x86_64, Nignx 1.0.3
(plus simpl-ngx_devel_kit 0.2.17, chaoslawful-lua-nginx-module 0.1.5,
chaoslawful-drizzle-nginx-module 0.0.14, agentzh-rds-json-nginx-module
0.10), Lua 5.1.4, LuaRocks 2.0.4.1, Lua YAJL 2.0, YAJL 2.0.1, Drizzle7
libraries 2011.04.15 and MySQL database 5.1.57.

[1]

Posted at Nginx Forum:

On Tue, May 31, 2011 at 07:20:34AM -0400, csg wrote:

In my scenario I’m trying to access images data like JPEGs stored as
blobs in MySQL causing the JSON parser to fail of course

I can not say anuthing about subject, I’m just interesting are any
advanatges to store images in MySQL instead of file system ?


Igor S.

ease of management is the main positive, I recently moved away from it
myself to a filesystem based solution for one of my clients. The largest
disadvantage is the size of the table. Oh and JSON can be binary safe if
implmented correctly with correct escape sequences, yes?

DELETE FROM images WHERE (uploaded_user=‘user1’ OR
uploaded_user=‘user2’)
AND uploaded_date > 123 AND gallery_id=0

SQL makes querying and management easy, thats why some people choose to
store their data in mysql BLOBs instead of on the filesystem.

On Wed, Jun 01, 2011 at 12:06:12AM +1000, SplitIce wrote:

ease of management is the main positive, I recently moved away from it

What do you mean by “ease of management”. As for me there is no easier
management than file system one. You have a lot of tools - find, grep,
etc.

is currently stored in a MySQL database. I was loosely following the
advanatges to store images in MySQL instead of file system ?

Igor S.

SplitIce Wrote:

The largest
disadvantage is the size of the table.

This can be easily avoided by sharding the data into many tables,
accessing it per modulo operation on the numerical key.

Oh and JSON
can be binary safe if
implmented correctly with correct escape
sequences, yes?

But this would require – if I am not mistaken – to touch all data end
encode it with Base64 (or similar) which gives you computation and data
capacity overhead. Additionally this would require changes to
application which we are trying to avoid. Some component inbetwen like
MySQL Proxy might be able to workaround this, but this increases latency
which is also not desired.

Posted at Nginx Forum:

Igor S. Wrote:

I can not say anuthing about subject, I’m just
interesting are any
advanatges to store images in MySQL instead of
file system ?

First of all quite a lot of backend processes and application code
relies on it. Those interfaces cannot be changed easily. But on the
other hand for the ease of handling the data. For redundancy it is much
easier and more performant to do MySQL replication instead of file
system based approaches. With millions of files even when using deep
hashed subdirectory trees a full rsync(1) run would take several hours
up to half a day to perform a sync. Even when using tools utilizing the
inotify(7) framework like lsyncd(1) [1] you have to run a full rsync(1)
run from time (e.g., reboots due to kernel updates, inotify event
overflow, etc.). With data backup you would face the same issues, too.
As company policies discourage the use of AGPL-licenses software,
MongoDB/gridfs is also not an option. Our current solution is Apache 2.2
MPM worker/Perl-based but on the long term this does not scale too well
anymore, that’s why we are looking for alternatives.

[1] Google Code Archive - Long-term storage for Google Code Project Hosting.

Posted at Nginx Forum:

mow Wrote:

You may try “fuse” (http://fuse.sourceforge.net/),
implement a tiny file
system easily which uses mysql as data source and
acts as a real file
system.

Correct me if I am wrong, but I reckon that context switches between
kernel and a userland filesystem would be detrimental to the
performance.Performance is the key :wink:

Posted at Nginx Forum:

On Tue, May 31, 2011 at 11:05:09AM -0400, csg wrote:

performance.Performance is the key :wink:
If perfomance is the key, you should use a file system.
Or you believe that reading from MySQL is magic zero transportation ? :slight_smile:


Igor S.

Igor S. Wrote:

If perfomance is the key, you should use a file
system.
Or you believe that reading from MySQL is magic
zero transportation ? :slight_smile:

For sure not, and I didn’t say so. But instead of bringing yet another
software component into play and entering a semi-flameware about the
pros and cons to store blobs in databases (there are already so many on
the net, and I am looking for information I couldn’t find on the net) I
had rather hoped to get some background information about RDS (is there
some spec somewhere?) and why the use of the database modules for Nginx
for lua-nginx more or less enforces the use of JSON. There might be some
valid points for this design decision and I would be interested to hear
about them. It is also unclear to me if accessing the RDS data directly
from lua-nginx is possible. Any ideas on that more than welcome :slight_smile:

Posted at Nginx Forum:

Correct, I just found out that although it is possible to encode binary
in
javascript strings its unfortunately not possible in JSON.

You could output from mysql in HEX form using the built in function, and
de-hex in a lua module? But yes that would add computational overhead.

Of course you have to pay for fuse but till you use mysql as your image
storage, I assume payload for “only - reading” from fuse file system
becomes a drop in the ocean for you. Also, this was nothing but just a
quick advice of course learning more about the nginx plugin you use will
be a perfect solution as far as you can modify it to fit your needs.

You may try “fuse” (http://fuse.sourceforge.net/), implement a tiny file
system easily which uses mysql as data source and acts as a real file
system.

Hi,

I had rather hoped to get some background information about RDS (is there
some spec somewhere?)

There is no formal specification, but you can learn everything from
ngx_drizzle / ngx_postgres sources.

Take a look at those 3 files:

https://github.com/chaoslawful/drizzle-nginx-module/blob/master/t/sanity.t

and why the use of the database modules for Nginx for lua-nginx more or
less enforces the use of JSON. There might be some valid points for this
design decision and I would be interested to hear about them. It is also
unclear to me if accessing the RDS data directly from lua-nginx is
possible.

Because JSON parsers for Lua were already available :wink:
Anyway, I believe that agentzh was/is working on RDS parser for Lua.

Any ideas on that more than welcome :slight_smile:

Not sure if this is helpful, but you can return binary data from
PostgreSQL
(you should be using PostgreSQL instead of MySQL anyway :P) using
ngx_postgres with “postgres_output” directive. There is a reason why
there
is no “drizzle_output” directive, but I’ll let agentzh comment on that.

Best regards,
Piotr S. < [email protected] >

On Tue, May 31, 2011 at 11:50 PM, Piotr S.
[email protected] wrote:

There is no formal specification, but you can learn everything from
ngx_drizzle / ngx_postgres sources.

There’s a specification for the RDS binary format in ngx_drizzle’s
README file:

http://github.com/chaoslawful/drizzle-nginx-module

See the Output section there.

Take a look at those 3 files:

drizzle-nginx-module/t/sanity.t at master · openresty/drizzle-nginx-module · GitHub

A generic RDS parser can be built based on these C source and header
files.

and why the use of the database modules for Nginx for lua-nginx more or
less enforces the use of JSON. There might be some valid points for this
design decision and I would be interested to hear about them. It is also
unclear to me if accessing the RDS data directly from lua-nginx is possible.

Because JSON parsers for Lua were already available :wink:
Anyway, I believe that agentzh was/is working on RDS parser for Lua.

Yeah, I’ll work on the RDS API for ngx_lua, for example:

local lua_table = ngx.from_rds(some_rds_binary)
local rds_binary = ngx.to_rds(some_lua_table)

Any ideas on that more than welcome :slight_smile:

Not sure if this is helpful, but you can return binary data from PostgreSQL
(you should be using PostgreSQL instead of MySQL anyway :P) using
ngx_postgres with “postgres_output” directive. There is a reason why there
is no “drizzle_output” directive, but I’ll let agentzh comment on that.

We surely can implement a drizzle_output directive though it’s much
more difficult than ngx_postgres because ngx_drizzle is designed to
work in a streaming fashion while libpq in the Pg world always buffer
the whole resultset.

It seems that I still do have a lot of interesting stuffs to work on,
huh? :wink: Well, the focus in the future will, however, be on developing
a full-fledged non-blocking streaming mysql C driver from scratch for
nginx because both libmysql and libdrizzle are sad in one way or
another.

Cheers,
-agentzh

agentzh Wrote:

On Tue, May 31, 2011 at 11:50 PM, Piotr S.
[email protected] wrote:

There’s a specification for the RDS binary format
in ngx_drizzle’s README file:

GitHub - openresty/drizzle-nginx-module: an nginx upstream module that talks to mysql and drizzle by libdrizzle

See the Output section there.

First of all, thanks to both of you, Piotr and agentzh, for your
replies!

Anyway, I believe that agentzh was/is working on
RDS parser for Lua.

Yeah, I’ll work on the RDS API for ngx_lua, for
example:

local lua_table =

ngx.from_rds(some_rds_binary)
local rds_binary = ngx.to_rds(some_lua_table)

That what be awesome, really :slight_smile: I would be happy to test code as soon
as available.

We surely can implement a drizzle_output
directive though it’s much
more difficult than ngx_postgres because
ngx_drizzle is designed to
work in a streaming fashion while libpq in the Pg
world always buffer
the whole resultset.

Haven’t thought about that, but makes perfect sense.

It seems that I still do have a lot of interesting
stuffs to work on,
huh? :wink: Well, the focus in the future will,
however, be on developing
a full-fledged non-blocking streaming mysql C
driver from scratch for
nginx because both libmysql and libdrizzle are sad
in one way or
another.

Sounds like a plan, but surely it cannot be expeceted that this will be
available soon, right?

For now maybe another area I can look at is MySQL’s upcoming Memcache
interface [1] (which would also reduce the pain for SQL query parsing as
in my use case from the web server point of view MySQL is just being
used as a key value store) and access it via the Memcache module.

[1] http://blogs.innodb.com/wp/2011/04/nosql-to-innodb-with-memcached/
[2] Module ngx_http_memcached_module

Anyway, many thanks so far!

Posted at Nginx Forum: