Working with large binary strings?

I need to work with multiple large binary strings, and to do XOR
operations on their ‘rows’. This is all fine, but I am having a slow
time to get binary representation in this fashion:

a = SecureRandom.random_bytes(250_000).unpack(“B*”)[0].to_i.to_s(2)

I know there must be a quick way to do it, because a =
SecureRandom.random_bytes(250_000) is very fast, so I don’t know why it
would be slow to convert that into a bit string of 1’s and 0’s that will
allow me to manipulate it on a bit by bit basis. Is there a faster way
to do this? Generating that string takes about one second, but then
getting its binary form as 1’s and 0’s takes a few minutes.

bob hope [email protected] writes:

I need to work with multiple large binary strings, and to do XOR
operations on their ‘rows’. This is all fine, but I am having a slow
time to get binary representation in this fashion:

a = SecureRandom.random_bytes(250_000).unpack(“B*”)[0].to_i.to_s(2)

You are asking ruby to convert a string into a number 250,000 places
wide, then asking to convert this very large base-10 number into a
string using radix 2.

I know there must be a quick way to do it, because a =
SecureRandom.random_bytes(250_000) is very fast, so I don’t know why
it would be slow to convert that into a bit string of 1’s and 0’s

SecureRandom.random_bytes(250_000).unpack(“B*”) returns a random string
of 1’s and 0’s

that will allow me to manipulate it on a bit by bit basis. Is there a
faster way to do this? Generating that string takes about one second,
but then getting its binary form as 1’s and 0’s takes a few minutes.

If you need fast bitwise operations, why are you using strings?

guns

Because I have no clue what I am doing :). Thanks. Do you know if there
is a better way to do this? It seems to be working but I bet there is a
better way. The goal is, with 4 random bit strings to make a 4th random
bit string that xors to 0 for each row with the other strings. I don’t
know how to concatenate with integers so that is why I make it a string
here.

require “securerandom”
require “openssl”

a = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]
b = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]
c = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]
d = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]

vsb = “”
1_000_000.times do |x|
column = a[x].to_i ^ b[x].to_i ^ c[x].to_i ^ d[x].to_i ^ 0

case column
when 0
vsb << 0.to_s
else
vsb << 1.to_s
end

end

puts vsb.to_i

bob hope [email protected] writes:

The goal is, with 4 random strings, to make a 4th random string that
xors to 0 for each row with the other strings.

I have no idea what you’re trying to accomplish, but here are some tips:

I don’t know how to concatenate with a number so that is why I make it
a string here.

Strings are the proper format for transferring and storing binary data.
You just have to work on them in chunks.

require “securerandom”
require “openssl”

a = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]
b = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]
c = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]
d = SecureRandom.random_bytes(1_000_000).unpack(“B*”)[0]

SecureRandom.random_bytes returns a string with each byte containing
a random value from 0x00 to 0xff. You are converting each byte into
an 8-byte string representation of this number, which is incredibly
wasteful.

SecureRandom.random_bytes(len).unpack('C*')

will return the string as an array of len unsigned integers, which you
can then XOR without any more string conversions.

vsb = “”
100_000.times do |x|

Why 100,000 when you have created strings of 8,000,000 characters in
length?

puts x
column = a[x].to_i ^ b[x].to_i ^ c[x].to_i ^ d[x].to_i ^ 0

This code actually XORs a single bit at a time.

If you have arrays of integers, you can XOR byte(s) at a time without
string conversions.

Also, n XOR 0 always returns n, so that part does nothing. If this is an
important part of your algorithm, I think you need to think this through
a bit longer.

case column
when 0
vsb << 0.to_s
else
vsb << 1.to_s
end

This is very circuitous. You already have the value 0 or 1, so just push
it onto vsb directly!

end

puts vsb.to_i

Why does vsb need to be a number? Numbers larger than your CPU’s native
bit size are very inefficient to work with. Binary data should be passed
around as a string.

If I am understanding you, this is what you want: [1]

len = 1_000_000
a, b, c, d = 4.times.map { SecureRandom.random_bytes(len).unpack 

‘L*’ }
(len/4).times.map { |i| a[i] ^ b[i] ^ c[i] ^ d[i] }.pack ‘L*’

However, this is totally useless, since the input is random, and 4
random inputs XORed together are equivalent to a single random input.

If you’re planning on supplying your own data, this smells an awful lot
like home-rolled encryption, which is either admirable or horrifying
depending on your goal.

HTH
guns

[1]: Note that I am chunking the string into 32-bit unsigned longs for
performance

Thanks for the tips. It is a simple PIR, here is a paragraph about it if
you happened to be curious, thanks for your help you have answered all
of my questions :slight_smile:

“The client sends a different “random-looking” bit vector
vsb to each distributor s, for each bucket b to be retrieved.
Each bit vector has a length equal to the number of buckets
in the pool. Each distributor s then computes R(vsb ) as the
XOR of all buckets whose positions is set to 1 in vsb . The
resulting value is then returned to the client.
Thus, in order to retrieve the b’th bucket, the client need
only to choose the values of vsb so that their exclusive OR is
0 at every position except b. (For security, k−1 of the vectors
should be generated randomly.) When the client receives the
corresponding R(vsb ) values, she can XOR them to compute
the bucket’s contents.”

Yes yes home rolled encryption is horrible but this is so simple that I
don’t think even I can screw it up … and I am in contact the person who
designed it and they will tell me if I screwed it up when I am done with
it

I appreciate your advice, I am pretty sure I can do it correctly but
efficiently is where I am sure to screw up :stuck_out_tongue: , and your tips have
already helped me improve it

Also yes I screwed up on the sizes, I am finding that I have a difficult
time with conversions between all of these units and keeping things
straight, but I always spend a lot of time to polish everything up and
make sure it is correct.

Hi, so in the spirit of not cluttering the forum I will ask my somewhat
unrelated but still related question here. So I have finished
implementing this, but have run into one issue, when I XOR strings
together like this:

def xor_strings(string_array)

processed_strings = []

string_array.each do |string|
  processed_strings << NArray.to_na(string, "byte")
end

processed_strings.inject(:^).to_s

end

the result is not always the same size as the strings being XORed. I
believe
some optimization is being done for me, particularly since it does this
only when there is similarity between the strings, however I need to
avoid it.

however googling for Narray and XOR has not led me to any success. I
swear this is my last question regarding this, as after I have this
fixed I am all
the way done :smiley:

On Wed, Aug 1, 2012 at 1:55 PM, bob hope [email protected] wrote:

  processed_strings << NArray.to_na(string, "byte")
end

processed_strings.inject(:^).to_s

end

the result is not always the same size as the set of strings. I believe
some optimization is being done for me and I need to avoid it, however
googling for Narray and XOR has not led me to any success. I swear this
is my last question regarding this, as after I have this fixed I am all
the way done :smiley:

I am really not sure what you’re after, but maybe this does help:

irb(main):013:0> x = SecureRandom.random_bytes(10)
=> “\r\xC8\xA9\x99\t\f\x12\xC9#]”
irb(main):014:0> y = SecureRandom.random_bytes(10)
=> “\x12\x86p?\x19q2\xA6\e\x88”
irb(main):015:0> z = “”.force_encoding ‘BINARY’
=> “”
irb(main):016:0> x.each_byte.zip(y.each_byte) {|a,b| z<<(a^b)}
=> nil
irb(main):017:0> z.length
=> 10
irb(main):018:0> z
=> “\x1FN\xD9\xA6\x10} o8\xD5”

Or, for multiple strings

irb(main):020:0> z = “”.force_encoding ‘BINARY’
=> “”
irb(main):024:0> max = arr.map(&:bytesize).max
=> 10
irb(main):025:0> max.times {|i| z[i] = arr.inject(0) {|agg, bytes| b =
bytes[i]; b ? agg ^ b.ord : agg}.chr}
=> 10
irb(main):026:0> z
=> “\x1FN\xD9\xA6\x10} o8\xD5”

Kind regards

robert