Unsafe readline(), anything better?

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from
lacking
this maximum allowing DOS against servers reading HTTP headers, for
example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

1974 class << HTTPResponse
1975 def read_new(sock) #:nodoc: internal use only 1976
httpv,
code, msg = read_status_line(sock) 1977 res =
response_class(code).new(httpv, code, msg) 1978
each_response_header(sock) do |k,v| 1979 res.add_field k, v
1980 end
1981 res
1982 end
1983
1984 private
1985
1986 def read_status_line(sock) 1987 str =
sock.readline
1988 m =
/\AHTTP(?:/(\d+.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
1989 raise HTTPBadResponse, “wrong status line:
#{str.dump}”
1990 m.captures
1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching
each
read for a newline, but that is just unfun. Have I overlooked something
obvious in my search? Hoping to not have to write my own safe/buffered
IO
layer like I’ve had to do with other langs.

If there is enough interest, maybe I’ll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong
exception
or just return what could be read of the line up to that point.

Thanks,

Rob

Rob M. wrote:

each_response_header(sock) do |k,v| 1979 res.add_field k, v
1990 m.captures
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

Thanks,

Rob

class File
def safe_readline(sep_string=$/,maxlen=nil)
buf_size = 1024
line = “”
while !self.eof?
s = read( [ buf_size, maxlen - line.size ].min )
line << s
if i = line.index( sep_string )
line = line[0,i+sep_string.size]
return [ line, true ]
end
return [ line[0,maxlen], false ] if maxlen &&
line.size >= maxlen
end
[ line, true ]
end
end

open(‘junk’){|h| p h.safe_readline("\n",9) }

On Thu, 28 Dec 2006 12:10:04 -0800, William J. wrote:

response_class(code).new(httpv, code, msg) 1978
1989 raise HTTPBadResponse, "wrong status line:
If there is enough interest, maybe I’ll hack a readmaxline() method into

    return [ line, true ]
  end
  return [ line[0,maxlen], false ]  if maxlen &&
    line.size >= maxlen
end
[ line, true ]

end
end

open(‘junk’){|h| p h.safe_readline("\n",9) }

Thanks for the attempt, this is very close to the reader-flavor of
classes I’ve had to write in other langs. But it doesn’t play nice
dealing with non-seekable streams like sockets, which can cause some
nasty IO socket blocking (or the nonblocking socket read run around).
For example:

Header1: something
Header2: else, followed by blank line
Content-Length: 55 (or whatever)

Here is the data portion that could be binary or text.

If I called h.safe_readline("\r\n",1024) and was working with an IO
socket there wouldn’t be anything left by the time I want to read
the data. That is, if I ever had a chance to try since the read would
block my proggy from doing anything.

One solution is to create a Reader class and maintain a buffer which
keeps the extra overrun available. I was hoping at the binary level
during the byte read that readline does that it could keep a counter of
the number of bytes read as it reads them and throw or whatever when
maxlen exceeded. Doing a readbyte from most languages at the script/lang
level is usually way to costly, it would probably be too costly to weigh
down read() with a count and check for every byte, but perhaps not in
another safe_readline() native extension.

I do love that Ruby let’s me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I’ll do and post for review.

Thanks again,

On 28.12.2006 22:09, Rob M. wrote:

I do love that Ruby let’s me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I’ll do and post for review.

I would also consider the alternative of creating a SafeLineReader that
can be stacked on any IO object (or rather any object that implements
#read with the proper protocol) as input stream much the same way Java
streams can be stacked. You gain flexibility but of course you loose
the easy “I just need to open a file and it works”. I do not have a
clear bias either way because I love the simplicity of Ruby’s IO classes
but on the other hand you can also stuff too much into a single class.

Yet another alternative is to put your code into a Module which you can
mix in to any IO object. You can still decide to to that on a general
basis for class IO and StringIO.

Just my 0.02EUR…

Kind regards

robert

On Fri, 29 Dec 2006 10:19:52 +0100, Robert K. wrote:

bias either way because I love the simplicity of Ruby’s IO classes but on
the other hand you can also stuff too much into a single class.

Glad to read your post. Couldn’t sleep last night thinking through
this. Then this morning found that vulnerability in WEBrick based
on unsafe readlines (actually gets, but same deal). [see WEBrick DOS
Security Flaw thread]

The more I got into adding to IO the more I kept coming back to stack
design that you mention and that I’m familiar with. Mostly there are
a lot of methods in the IO and StringIO classes that would all have to
have equivalents that support the prev_read buffer. Also we face the
default duplex nature of IO and this is only for reads. So I do really
think stacking (the reader) stream is the way to go–especially with
support for things like ‘<<’ in Ruby, which I love.

Maybe we could call it SafeReader so we can have its readline calling
a read_until (like I’ve done in other langs). In my few remaining
vacation hours, I’ll try to write up a spec, perhaps in Rspec and post
for comment.

I’m still somewhat new to this Ruby stuff, but think I can get a gem
started on RubyForge for it, it would be my first. Then if there is
enough
demand, we can implement one in C. By the way, anyone know of any gem
naming conventions to distinguish a pure Ruby gem from C extension
version, like Perl’s XS suffix convention?

Also on a related topic, is there a convention or other reliable way to
avoid gem library directory and file name collisions? I did notice, for
example, webrick follows the webrick.rb and webrick/ convention as does
rspec and others. Are these names somehow registered publicly to avoid
gem
collisions with others? Perhaps Brian or Ryan would care to set us
newbies
straight on that. [I did do quite a bit of homework reading http://rubygems.org particularly section 8. Distributing Gems]

Hi,

In message “Re: unsafe readline(), anything better?”
on Fri, 29 Dec 2006 04:05:11 +0900, Rob M.
[email protected] writes:

|If there is enough interest, maybe I’ll hack a readmaxline() method into
|an IO patch to submit. Actually, on second thought, how about adding a
|second parameter:
|
| ios.readline(sep_string=$/,maxlen=nil)
|
|The tough question would then be whether to raise an LineTooLong exception
|or just return what could be read of the line up to that point.

I have added the second optional argument to specify maximum length
limit for gets, readline, readlines, etc. I will commit the change
for HEAD soon. Webrick and others should be updated using the new
feature.

						matz.

On Fri, 29 Dec 2006 10:49:41 -0500, Rob M. wrote:

Maybe we could call it SafeReader so we can have its readline calling a
read_until (like I’ve done in other langs). In my few remaining vacation
hours, I’ll try to write up a spec, perhaps in Rspec and post for
comment.

Humm, from this last recent ruby CVS commit just today makes me wonder
if
matz is listening to this thread. If so, matz, thanks for getting me
giddy
about programming again, and thanks for taking a shot at addressing
this!

matz 2006-12-30 04:21:50 +0900 (Sat, 30 Dec 2006)

New Revision: 11428

Modified files:
trunk/ChangeLog
trunk/ext/stringio/stringio.c
trunk/io.c
trunk/version.h

Log:
* ext/stringio/stringio.c (strio_gets): accepts limit argument.

* ext/stringio/stringio.c (strio_readline, strio_each,
  strio_readlines): ditto.

* ext/stringio/stringio.c (strio_getline): add limit capability.

* io.c (rb_io_gets_m): accepts limit argument.  [ruby-talk:231563]

* io.c (rb_io_readline, rb_io_readlines, rb_io_each_line,
argf_getline):
  ditto.

* io.c (appendline): add limit capability.

* io.c (rb_io_getline_fast, rb_io_getline): ditto.

* io.c (rb_io_getline): small refactoring for DRY.

* io.c (rb_io_s_foreach, rb_io_s_readlines): small refactoring.

Hi,

In message “Re: unsafe readline(), anything better?”
on Sat, 30 Dec 2006 04:55:04 +0900, Rob M.
[email protected] writes:

|Humm, from this last recent ruby CVS commit just today makes me wonder if
|matz is listening to this thread. If so, matz, thanks for getting me giddy
|about programming again, and thanks for taking a shot at addressing this!

I am listening. We still need update for Webrick. Here’s the my
personal patch (not tested at all).

						matz.

— a/lib/webrick/httprequest.rb
+++ b/lib/webrick/httprequest.rb
@@ -317,10 +317,10 @@ module WEBrick
@remaining_size = 0
end

  • def _read_data(io, method, arg)
  • def _read_data(io, method, *arg)
    begin
    WEBrick::Utils.timeout(@config[:RequestTimeout]){
  •      return io.__send__(method, arg)
    
  •      return io.__send(method, *arg)
       }
     rescue Errno::ECONNRESET
       return nil
    

@@ -330,7 +330,11 @@ module WEBrick
end

 def read_line(io)
  •  _read_data(io, :gets, LF)
    
  •  line = _read_data(io, :gets, [LF, 1024])
    
  •  if line.size == 1024 and line[-1,1] != LF
    
  •    raise HTTPStatus::RequestURITooLarge
    
  •  end
    
  •  line
    

    end

    def read_data(io, size)