Mapping string data ptr to buffer in ffi

sgm · April 8, 2013, 4:05am

I’m trying to implement some “shared memory” in Ruby, but I’m not sure
how to declare it properly with ffi. The idea is to map the read-only
(in Ruby) string contents(by passing a data ptr) to external C library
threads so that they would write some data to it. Could you point out
the proper way to do it ?

Thanks in advance

sgm · April 8, 2013, 10:40am

On Mon, Apr 8, 2013 at 4:06 AM, se gm [email protected] wrote:

I’m trying to implement some “shared memory” in Ruby, but I’m not sure
how to declare it properly with ffi. The idea is to map the read-only
(in Ruby) string contents(by passing a data ptr) to external C library
threads so that they would write some data to it. Could you point out
the proper way to do it ?

I think a better way would be to implement a Ruby class in C and provide
the information via a getter method which returns a new Ruby String and
internally uses proper synchronization to maintain thread safety.
Otherwise if you update the String contents from arbitrary threads you
will create all sorts of race conditions and issues which are hard to
hunt
down.

Kind regards

robert

sgm · April 8, 2013, 11:22am

I think a better way would be to implement a Ruby class in C and provide
the information via a getter method which returns a new Ruby String and
internally uses proper synchronization to maintain thread safety.
It would be too slow, the purpose is to avoid any extra operations (no
memory copy operations at all, the string changes should be immediately
visible in Ruby) regardless of thread safety issues, no setters or
getters.

sgm · April 8, 2013, 11:31am

On Mon, Apr 8, 2013 at 11:22 AM, se gm [email protected] wrote:

I think a better way would be to implement a Ruby class in C and provide
the information via a getter method which returns a new Ruby String and
internally uses proper synchronization to maintain thread safety.
It would be too slow, the purpose is to avoid any extra operations (no
memory copy operations at all, the string changes should be immediately
visible in Ruby) regardless of thread safety issues, no setters or
getters.

If you simply change a String’s content from some other thread no piece
of
Ruby code will be aware of this so nothing can really do something with
that changed content in Ruby land. You need a method call, either from
Ruby side (i.e. when it needs the data) or from the extension side (i.e.
after every update). If you do it from the extension side things get
more
complicated since you need to pass in a Ruby object as callback plus
ensure
that the calling thread is a Ruby interpreter thread.

Btw, what are you trying to create here?

Cheers

robert

sgm · April 8, 2013, 11:49am

You need a method call, either from
Ruby side
I need only one blocking syncronization call from Ruby to start working
with the string. Getters method calls which would copy 512m string are
not a viable option. So no other option except the one I was asking
about.
Btw, what are you trying to create here?
File parsing and searching over a huge volume of data.

sgm · April 8, 2013, 12:14pm

I believe it would be a huge work. I used shared memory in the past
from C, but when the Ruby part needed to access the data, I created a
String object (with rb_str_new).
By doing that, the performance of the code drops more than 2x times
(compared with the test results of the code which finds the ptrs before
processing by performing a memory scan) - such a way is not a reliable
one, but the drop in performance of some task which takes days to
complete is not on option either.

sgm · April 8, 2013, 1:02pm

On Mon, Apr 8, 2013 at 11:49 AM, se gm [email protected] wrote:

You need a method call, either from
Ruby side
I need only one blocking syncronization call from Ruby to start working
with the string. Getters method calls which would copy 512m string are
not a viable option. So no other option except the one I was asking
about.

Well, you do not necessarily need to copy complete files.

Btw, what are you trying to create here?
File parsing and searching over a huge volume of data.

Please disclose more detail about the way the parsing and processing of
parse results is supposed to work. With that information we will be
better
able to find a proper solution. I think you need to step back and look
at
this from a bit more distance. The solution you are hooked into does
not
sound viable yet you wa

sgm · April 8, 2013, 12:05pm

Subject: Re: Mapping string data ptr to buffer in ffi
Date: lun 08 apr 13 06:49:27 +0900

Quoting se gm ([email protected]):

You need a method call, either from
Ruby side
I need only one blocking syncronization call from Ruby to start working
with the string. Getters method calls which would copy 512m string are
not a viable option. So no other option except the one I was asking
about.

It remains that, if you want to manipulate strings from Ruby, you must
at some point create Ruby String objects. Either you do all your
string processing in C, or at some point you must do the bridging. The
normal way is to allow Ruby GC to manage these objects. I believe you
could twiddle the main String class from C (see string.c in the Ruby
code) to have a string whose content is unmodifiable and is held in a
memory area managed from C (say, using SystemV IPC shared memory), but
I believe it would be a huge work. I used shared memory in the past
from C, but when the Ruby part needed to access the data, I created a
String object (with rb_str_new).

Carlo

sgm · April 8, 2013, 1:09pm

On Mon, Apr 8, 2013 at 1:02 PM, Robert K.
[email protected]wrote:

about.
parse results is supposed to work. With that information we will be better
able to find a proper solution. I think you need to step back and look at
this from a bit more distance. The solution you are hooked into does not
sound viable yet you wa
http://blog.rubybestpractices.com/

sorry, key press error.

The solution you are hooked into does not sound viable yet you want to
do
it that way. I’d say it’s unlikely that you will come to a satisfactory
solution doing it that way.

Cheers

robert

sgm · April 8, 2013, 2:29pm

On Mon, Apr 8, 2013 at 2:09 PM, se gm [email protected] wrote:

The solution you are hooked into does not sound viable yet you want to
do
it that way.
It is the only way to solve the bottleneck problem. There is simply no
other acceptable solution and I would go for it regardless of any
issues. Do you know how to pass the pointer reliably or not ?

You need to properly synchronize accesses from both sides. Without
knowing
the usage patterns I don’t think it’s possible to come up with a proper
solution. After all what’s it worth to fix a bottleneck when the logic
is
broken in the process?

Cheers

robert

sgm · April 8, 2013, 2:09pm

The solution you are hooked into does not sound viable yet you want to
do
it that way.
It is the only way to solve the bottleneck problem. There is simply no
other acceptable solution and I would go for it regardless of any
issues. Do you know how to pass the pointer reliably or not ?

sgm · April 8, 2013, 2:41pm

Without
knowing
the usage patterns I don’t think it’s possible to come up with a proper
solution.
There are many patterns including exported ones (modules and
extensions), that is why ‘fixing’ the logic won’t work. But fixing the
bottleneck would.

sgm · April 8, 2013, 9:32pm

On 04/08/2013 07:41 AM, se gm wrote:

Without
knowing
the usage patterns I don’t think it’s possible to come up with a proper
solution.
There are many patterns including exported ones (modules and
extensions), that is why ‘fixing’ the logic won’t work. But fixing the
bottleneck would.

I would be interested to hear about any solution in this vein as well.
It seems to boil down to this: can the byte buffer that backs Ruby
strings be exposed for direct use by C functions? If not, then this
discussion can only move on to alternatives. If it can be done, how so?

Could this even be done safely without some form of cooperation from the
GC? That byte buffer could potentially be moved around at any time
during a GC event, couldn’t it?

-Jeremy

sgm · April 8, 2013, 10:28pm

On 04/08/2013 03:06 PM, Carlo E. Prelz wrote:

to a String object, you can do

char *p=RSTRING_PTR(v);

Sadly, this doesn’t appear to be made available via any functionality of
FFI, so the OP would need to write at least a small amount of his own
glue as a C extension or similarly extend FFI. From the earlier
discussion, the hope is that such functionality already exists in FFI.

The next question to answer is whether or not the GC will relocate that
buffer behind the back of the C functions. If that might happen, the GC
might need to be disabled during any operations that use the buffer with
the C function.

What you can’t do if you want to keep the Ruby
environment (and yourself) happy is change the size of that buffer, or
free the buffer, or other actions of this sort. If you want to
manipulate that string from C, you are free to allocate your own
buffers, and assume the responsibility to free them, or leak memory.

In my case at least, all operations on the buffer would only change
content within the existing buffer. The size of the string that owns
the buffer would be unaffected. I think the OP has similar needs from
the sound of it.

-Jeremy

sgm · April 8, 2013, 10:06pm

Subject: Re: Mapping string data ptr to buffer in ffi
Date: mar 09 apr 13 04:31:39 +0900

Quoting Jeremy B. ([email protected]):

I would be interested to hear about any solution in this vein as well.
It seems to boil down to this: can the byte buffer that backs Ruby
strings be exposed for direct use by C functions? If not, then this
discussion can only move on to alternatives. If it can be done, how so?

The buffer can be accessed quite easily. If v is a VALUE that refers
to a String object, you can do

char *p=RSTRING_PTR(v);

(there is also

int l=RSTRING_LEN(v);

that returns the length of the string (strings in Ruby are not
NULL-terminated)). What you can’t do if you want to keep the Ruby
environment (and yourself) happy is change the size of that buffer, or
free the buffer, or other actions of this sort. If you want to
manipulate that string from C, you are free to allocate your own
buffers, and assume the responsibility to free them, or leak memory.

When you embrace a garbage-collected language you have VAST
advantages. They come with their fair share of disadvantages, like a
performance hit, and having to forget about playing with pointers.

Then, as I already wrote, I believe it is possible to derive from the
stuff in string.c an immutable string object where content points to
some underlying storage, but it is certainly not trivial. What I am
reasonably certain is a dead end is transmitting memory pointers to
the Ruby environment. You can, of course, obtain the value of the
memory location, but there’s little you will want to do with it.

Carlo

sgm · April 9, 2013, 7:47am

Subject: Re: Mapping string data ptr to buffer in ffi
Date: mar 09 apr 13 05:27:43 +0900

Quoting Jeremy B. ([email protected]):

Sadly, this doesn’t appear to be made available via any functionality of
FFI, so the OP would need to write at least a small amount of his own
glue as a C extension or similarly extend FFI. From the earlier
discussion, the hope is that such functionality already exists in FFI.

I know nothing about FFI. Can’t help there.

The next question to answer is whether or not the GC will relocate that
buffer behind the back of the C functions. If that might happen, the GC
might need to be disabled during any operations that use the buffer with
the C function.

As far as I know, the GC won’t operate while your Ruby-called C
function is being executed.

In my case at least, all operations on the buffer would only change
content within the existing buffer. The size of the string that owns
the buffer would be unaffected. I think the OP has similar needs from
the sound of it.

The OP had quite different needs, if I got this correctly. He wanted
huge amounts of strings to be accessible read-only from some form of
global memory, and he hoped he could introduce them into the Ruby
environment without paying the price of creating String objects (and
thus without needing to duplicate the memory needs, and without
incurring the performance hit of allocation/deallocation).

Carlo

sgm · April 9, 2013, 11:10am

although he never explicitly
stated whether resizing would be necessary.
Resizing in my case is not necessary. Many fragments may be mapped into
one or more huge strings which are not changed from Ruby environment
(they are just preallocated).
and apparently also without proper synchronization.
That is totally wrong. If it is syncronized for the particular situation
then one can call it “right”, and the language concepts would be
irrelevant. The actual syncronization is the only thing that matters.

sgm · April 9, 2013, 8:53am

On Tue, Apr 9, 2013 at 7:46 AM, Carlo E. Prelz [email protected] wrote:

    Subject: Re: Mapping string data ptr to buffer in ffi
    Date: mar 09 apr 13 05:27:43 +0900
Quoting Jeremy B. ([email protected]):

In my case at least, all operations on the buffer would only change
content within the existing buffer. The size of the string that owns
the buffer would be unaffected. I think the OP has similar needs from
the sound of it.

The OP had quite different needs, if I got this correctly.

I think he wanted exactly that as well - although he never explicitly
stated whether resizing would be necessary.

He wanted
huge amounts of strings to be accessible read-only from some form of
global memory, and he hoped he could introduce them into the Ruby
environment without paying the price of creating String objects (and
thus without needing to duplicate the memory needs, and without
incurring the performance hit of allocation/deallocation).

… and apparently also without proper synchronization.

Cheers

robert

sgm · April 9, 2013, 4:06pm

On Tue, Apr 9, 2013 at 11:10 AM, se gm [email protected] wrote:

although he never explicitly
stated whether resizing would be necessary.
Resizing in my case is not necessary. Many fragments may be mapped into
one or more huge strings which are not changed from Ruby environment
(they are just preallocated).
and apparently also without proper synchronization.
That is totally wrong. If it is syncronized for the particular situation
then one can call it “right”, and the language concepts would be
irrelevant. The actual syncronization is the only thing that matters.

Well, Ruby’s String certainly has no synchronizations built in with the
library you want to integrate so how will you ensure that there is
proper
synchronization? So far I have only seen that you want to share the
memory
of a Ruby string’s byte array with some third party library which will
then
write to the memory from threads not controlled by Ruby. You said you
need
“only one blocking syncronization call from Ruby to start working with
the
string” - but what do you do afterwards, i.e. while the processing is
under
way and “the string changes should be immediately visible in Ruby”?

Cheers

robert

sgm · April 9, 2013, 7:08pm

how will you ensure that there is
proper
synchronization?
By blocking the thread which is waiting for external data - as soon as
the data is ready, the function would return and of course, the string
changes would be immediately visible in Ruby without major performance
drop from unneeded memory copy operations.