Anyway, let me resend it again.
I am using ruby to run some map reduce job in hadoop streaming.
Unfortunately, we have some dirty data which have invalid byte sequence
the input. So while running things like
I will get errors like
:in `split’: invalid byte sequence in UTF-8 (ArgumentError)
I searched a little bit and try to use iconv to ignore the invalid
ic = Iconv.new(‘UTF-8//IGNORE’, ‘UTF-8’)
line = ic.iconv(line)
It resolve most of the invalid lines but will still a couple of line
have the same error.
I am wondering if there is a way I could let the string.split() worked
ruby1.9 with invalid character sequences?
Thanks in advance
On Tue, Mar 22, 2011 at 11:09 PM, Robert K.