Socket Decorator

Hi, I am implementing a special kind of socket that takes an io object
(usually a socket) as its argument and then behaves like a socket, but
does some transformations before/after it sends/receives, so in pseudo
code, it could look about like this:

def send(text)
transform(text)
@io.send(text)
end

def receive
text = @io.gets
inverse_transform(text)
end

But now I would like to implement the io interface like all those other
io classes, but it seems a lot of work to do it “manually” because there
are so many similar methods. Simply “def_delegators” is also not
sufficient because I have to do some transformations first, some simple
method_missing tricks are also bad because the different methods take
different arguments, sometimes and furthermore, I don’t really like this
trick except if the generated methods really depend on the context (like
in ActiveRecord) because for example, in irb, a.methods doesn’t return
anything useful.

Since the length of the messages I send through the internal sockets are
fixed (because of the kind of transformations I do), I should probably
use an internal buffer and for example return the next byte of that
buffer if getc is called and if the buffer is empty, receive the next
message and put it into the buffer. But what should I use for the
buffer? I could use a string, but I guess +=ing and splitting strings
all the time may not be the most efficient way to do it.

Does anyone see a good solution to my problem, how I could implement
this in an elegant way with some meta programming trick or so without
having to implement all the gets puts send recv etc myself?

On 09/04/2012 07:34 AM, Bernhard B. wrote:

having to implement all the gets puts send recv etc myself?
Take a look at the io-like gem:

io-like | RubyGems.org | your community gem host

It provides a fairly easy way to emulate the IO interface; however, I
haven’t updated it yet to emulate Ruby 1.9 IO details. Ruby 1.8.7 is
the limit for now.

-Jeremy

On Tue, Sep 4, 2012 at 2:34 PM, Bernhard B.
[email protected] wrote:

Hi, I am implementing a special kind of socket that takes an io object
(usually a socket) as its argument and then behaves like a socket, but
does some transformations before/after it sends/receives, so in pseudo
code, it could look about like this:

But now I would like to implement the io interface like all those other
io classes,

Why? I’d rather have a clean separation, i.e. you have a class
representing the connection on a higher level of abstraction and make
it contain a Socket instance. Then implement the methods you need on
the abstract level. You could also have custom classes for messages
(or only one custom class depending on your needs). That way you make
sure clients pass in only what they are allowed to etc.

If you allow the socket to be visible outside your connection class
you allow people to manipulate the socket in arbitrary ways which
might interfere with your message exchange logic. It’s usually better
to have this layered approach and manipulations of the socket which
interfere with your logic (i.e. for example, someone invoking
close_read() while you are still waiting for an answer).

Since the length of the messages I send through the internal sockets are
fixed (because of the kind of transformations I do), I should probably
use an internal buffer and for example return the next byte of that
buffer if getc is called and if the buffer is empty, receive the next
message and put it into the buffer. But what should I use for the
buffer? I could use a string, but I guess +=ing and splitting strings
all the time may not be the most efficient way to do it.

You can use String#<< and String#slice! which are more efficient.
But: since the length is fixed I’d probably just have a thread reading
messages which simply uses @socket.read(1234). That will block until
a message is complete.

Kind regards

robert

Jeremy B. wrote in post #1074640:

On 09/04/2012 07:34 AM, Bernhard B. wrote:

having to implement all the gets puts send recv etc myself?
Take a look at the io-like gem:

io-like | RubyGems.org | your community gem host

It provides a fairly easy way to emulate the IO interface; however, I
haven’t updated it yet to emulate Ruby 1.9 IO details. Ruby 1.8.7 is
the limit for now.

-Jeremy

Thanks a lot, that is almost exactly what I was looking for. But there
is a slight twist in my application: I cannot do unbuffered_read() for
an arbitrary length in my case because the transformations operate on
blocks and my transformations may even change the length of the part I
read in an unpredictable way, which makes it impossible in the first
place to read a given number of bytes. There is an easy workaround for
this, namely that I can use another internal buffer for my
transformations and read as much as necessary to achieve a given length,
but it is a little inefficient and not very elegant to do so, because
then my read operation is not really unbuffered and there exist two
buffers on the different layers.

A much larger problem is the fact that I usually work with sockets. With
sockets, I cannot do “unbuffered_read” for an arbitrarily large length
since the other side may simply not have sent enough yet. I didn’t look
at your implementation yet, but I guess if I call gets, you probably
read subsequent larger parts (let’s say of size N) using the
unbuffered_read operation in some way until you find a newline
character. But if the other side just sends “\n”, gets should not block,
but deliver “\n” directly, but if you try to read N bytes, it would
block.

So your library seems nice in the case if the underlying IO object is a
File or something similar, I guess that is also the application you had
in mind when you wrote it, but for sockets, it is probably a little
harder.

Bernhard B. wrote in post #1074676:

Jeremy B. wrote in post #1074640:

On 09/04/2012 07:34 AM, Bernhard B. wrote:

having to implement all the gets puts send recv etc myself?
Take a look at the io-like gem:

io-like | RubyGems.org | your community gem host

It provides a fairly easy way to emulate the IO interface; however, I
haven’t updated it yet to emulate Ruby 1.9 IO details. Ruby 1.8.7 is
the limit for now.

-Jeremy

Thanks a lot, that is almost exactly what I was looking for. But there
is a slight twist in my application: I cannot do unbuffered_read() for
an arbitrary length in my case because the transformations operate on
blocks and my transformations may even change the length of the part I
read in an unpredictable way, which makes it impossible in the first
place to read a given number of bytes. There is an easy workaround for
this, namely that I can use another internal buffer for my
transformations and read as much as necessary to achieve a given length,
but it is a little inefficient and not very elegant to do so, because
then my read operation is not really unbuffered and there exist two
buffers on the different layers.

A much larger problem is the fact that I usually work with sockets. With
sockets, I cannot do “unbuffered_read” for an arbitrarily large length
since the other side may simply not have sent enough yet. I didn’t look
at your implementation yet, but I guess if I call gets, you probably
read subsequent larger parts (let’s say of size N) using the
unbuffered_read operation in some way until you find a newline
character. But if the other side just sends “\n”, gets should not block,
but deliver “\n” directly, but if you try to read N bytes, it would
block.

So your library seems nice in the case if the underlying IO object is a
File or something similar, I guess that is also the application you had
in mind when you wrote it, but for sockets, it is probably a little
harder.

Ok, I started looking at the code and I noticed that the library is much
more than expected, the code is really very well documented and nicely
written, I also think I can solve all the problems I mentioned, for
example I could set fill_size to 0 to disable buffering by your code and
then do buffering in my code to achieve the required lengths in
unbuffered_read. I still need to figure out some things, for example
your code heavily relies on the exceptions I throw and I have to figure
out which ones I have to throw in which occasions such that all things
like blocking and non-blocking reads work properly and what I have to do
to end the connection suddenly without your code catching it and I will
have to watch out if there are occasions where a call to
unbuffered_read() blocks because not enough data is available where it
should not or should throw an exception instead.

Bernhard B. wrote in post #1074683:

I still need to figure out some things, for example
your code heavily relies on the exceptions I throw and I have to figure
out which ones I have to throw in which occasions such that all things
like blocking and non-blocking reads work properly and what I have to do
to end the connection suddenly without your code catching it and I will
have to watch out if there are occasions where a call to
unbuffered_read() blocks because not enough data is available where it
should not or should throw an exception instead.

Ok, I found out everything I wanted to now is in the comment of the
module itself and it works now, so thanks a lot.

But I have one little flaw report: In your LikeStringIO example, you
didn’t define duplexed? to return true.

Robert K. wrote in post #1074648:

On Tue, Sep 4, 2012 at 2:34 PM, Bernhard B.
[email protected] wrote:

Hi, I am implementing a special kind of socket that takes an io object
(usually a socket) as its argument and then behaves like a socket, but
does some transformations before/after it sends/receives, so in pseudo
code, it could look about like this:

But now I would like to implement the io interface like all those other
io classes,

Why? I’d rather have a clean separation, i.e. you have a class
representing the connection on a higher level of abstraction and make
it contain a Socket instance. Then implement the methods you need on
the abstract level. You could also have custom classes for messages
(or only one custom class depending on your needs). That way you make
sure clients pass in only what they are allowed to etc.

My transformed socket can send arbitrary strings, so a client can send
anything he likes. The advantage of implementing the same interface as
the other IO classes is that my transformed socket can be reused in more
different ways and passed to other methods that take IO streams as
arguments, for example YAML::load(transformed_socket) might read yaml
directly from my transformed socket, which is not possible if I have a
different interface.

If you allow the socket to be visible outside your connection class
you allow people to manipulate the socket in arbitrary ways which
might interfere with your message exchange logic. It’s usually better
to have this layered approach and manipulations of the socket which
interfere with your logic (i.e. for example, someone invoking
close_read() while you are still waiting for an answer).

But that is a totally different thing. I did not want to allow my
internal io object to be visible from outside. That is why I could not
use def_delegators etc. I do want to use a layered approach, the lower
layer is the “real” socket, the upper layer is my class. Hence calling
close_read() would not close my internal socket, but it would close my
transformed socket (probably closing the internal socket would be one
step, though) and if the client does it and tries to read afterwards, he
will of course get an IOError from my class because my transformed
socket would already be closed for reading.

Since the length of the messages I send through the internal sockets are
fixed (because of the kind of transformations I do), I should probably
use an internal buffer and for example return the next byte of that
buffer if getc is called and if the buffer is empty, receive the next
message and put it into the buffer. But what should I use for the
buffer? I could use a string, but I guess +=ing and splitting strings
all the time may not be the most efficient way to do it.

You can use String#<< and String#slice! which are more efficient.
But: since the length is fixed I’d probably just have a thread reading
messages which simply uses @socket.read(1234). That will block until
a message is complete.

Kind regards

robert

Thanks a lot, I alredy switched to String#<< and String#slice!. The
message size sent through the internal socket is fixed, but from
outside, it appears like a normal IO stream, so a client may call gets,
but several internal messages may have to be read until a newline
character is found. Or read(n) may be called for n != block_size, so a
buffer is definitely necessary.

On Tue, Sep 4, 2012 at 7:53 PM, Bernhard B.
[email protected] wrote:

Why? I’d rather have a clean separation, i.e. you have a class […]

My transformed socket can send arbitrary strings, so a client can send
anything he likes. The advantage of implementing the same interface as
the other IO classes is that my transformed socket can be reused in more
different ways and passed to other methods that take IO streams as
arguments, for example YAML::load(transformed_socket) might read yaml
directly from my transformed socket, which is not possible if I have a
different interface.

Ah, I see. Basically you are defining a different message format
across the wire but otherwise want to retain all the other
functionality. I have two ideas for this:

  • look at how Ruby’s SSL implementation does it
  • define a “default forwarding” like this:

class Wrapper # maybe inherit BasicObject
def initialize(socket)
@socket = socket or raise “Invalid nil socket”
end

def write…
def read…

other methods that either accept data to send or return data read

def method_missing(*a, &b)
wrap(@socket.send(*a, &b))
end

private
def wrap(obj)
obj.equal?(@socket) ? self : obj
end

variant

def method_missing(*a, &b)
wrap(@socket.send(*a) do |*x|
b[*x.map! {|e| wrap(e)}]
end)
end
end

Thanks a lot, I alredy switched to String#<< and String#slice!. The
message size sent through the internal socket is fixed, but from
outside, it appears like a normal IO stream, so a client may call gets,
but several internal messages may have to be read until a newline
character is found. Or read(n) may be called for n != block_size, so a
buffer is definitely necessary.

I see.

Kind regards

robert

On 09/04/2012 02:19 PM, Bernhard B. wrote:

Ok, I found out everything I wanted to now is in the comment of the
module itself and it works now, so thanks a lot.

Good deal. The problem with documenting mixins is finding a place to
document the methods a mixin needs to have defined.

BTW, for everyone reading this thread, if there are any suggestions for
a better way to manage this, I would appreciate them.

But I have one little flaw report: In your LikeStringIO example, you
didn’t define duplexed? to return true.

That’s not actually a bug. The string upon which a LikeStringIO
instance operates could be thought of as an in-memory file opened in
read-write mode. Reading and writing to an instance of LikeStringIO are
not independent operations. Performing one operation has
consequences/side effects for the other (the file pointer moves),
contrary to a duplexed IO.

A duplexed IO might read from one file and write to a different file or
a socket, for example. :slight_smile:

-Jeremy