Reading binary Files

Hi there
I am trying to read a binary file using Java’s “FileInputStream” to
later store it in HBase.

My problem is the byte-array conversion needed to call the read-method:

inFile = File.new("/home/roger/Downloads/test.jpg")
inputStream = FileInputStream.new(inFile)

length = inFile.length()
buffer = “”

inputStream.read(buffer)

Any Ideas?

You may find the entire code in the attachment.

Thanks
Roger

Would you mind clarifying your goal a little. Is it your intention to
read
the bytes in a “streaming” fashion or is it OK to read the entire file
in
memory as a byte[]?

Ariel V.
e-mail: [email protected]
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: Sign Up | LinkedIn

*simplicity *communication
*feedback *courage *respect

Strange: based on your code, it should warn you that File is already
defined by ruby. “warning: already initialized constant File”

You need to perform at least a :remove_const on Object prior to any
java
class import. Like this:
Object.send(:remove_const,:File) # put this as 1st line

Next error… NameError: no method ‘read’ for arguments
(org.jruby.RubyString) on Java::JavaIo::FileInputStream
Explanation: your buffer is actually a ruby string, not a byte[].
Possible
fix:
buffer = [].to_java(:byte)

I did not check the rest of HBase related code. Good luck!

Hey Ariel

It’s the latter case. These PDFs are rather small. So I just want to
read them into memory and then pass this “stream” (say byte[]) to
another function (the put-method of HBase)

Then please remove the following line from your code:

java_import “java.io.File”

Good luck!

According to the Ruby Documentation, I just use their “file” class,
since my PDFs (here JPGs) are rather small. This seems to work so far,
but might be slower than a stream-based way. It did well for importing 5
files on my test box, we’ll see how it does when running on the real
site with millions of files. Even without my “puts” the HBase shell
produces a lot of messages on screen…

java_import “org.apache.hadoop.hbase.util.Bytes”
java_import “org.apache.hadoop.hbase.client.HTable”
java_import “org.apache.hadoop.hbase.client.Put”

def jbytes(*args)
args.map { |arg| arg.to_s.to_java_bytes }
end

files = Dir.glob("/home/roger/Downloads/*.jpg")

files.each { |x| puts “File #{x}”

inFile = File.new(x)
buffer = inFile.read()

table = HTable.new(@hbase.configuration, “rb_test”)
p = Put.new(*jbytes(File.basename(x)))

p.add(*jbytes(“inhalt”, “”, buffer))

table.put§

table.close()
}