Big endian convention in Ruby


#1

Hello,

I would like to convert an input (like a file, or a password) into
binary format. After reading my Ruby in a nutshell book, I belive I can
use this method:
unpack(‘C*’)

So according the documentation, this is great for unsigned chars. But do
the binary representation will respect the big endian convention?

Thank you.


#2

On 16.10.2008 10:26, Zangief I. wrote:

I would like to convert an input (like a file, or a password) into
binary format. After reading my Ruby in a nutshell book, I belive I can
use this method:
unpack(‘C*’)

This will give you an array of integer byte values. I am not sure where
there you see the binary format. What exactly do you want to achieve?

So according the documentation, this is great for unsigned chars. But do
the binary representation will respect the big endian convention?

What exactly do you mean by this? Are you referring to bits inside a
byte or to the ordering of multiple bytes? If the latter, there is no
point in talking about big or little endian when encoding byte wise
because there are not multiple bytes belonging together. If the former,
I am not sure whether there is any platform that reverses bits but it
could be possible. OTOH, how would you notice?

Kind regards

robert


#3

Thanks you for you answer.
Actually I would like to rewrite the SHA1 algorithm
(http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode) in a
pure ruby implementation. And in this way, I would need the ability to
accomplish the “Pre-processing” step by converting the input as a 64-bit
big-endian integer. I believe that’s could be more simple to do in Ruby
then in an other language such as in C. But I am not really sure about
the way to do so.


#4

Zangief I. wrote:

Thanks you for you answer.
Actually I would like to rewrite the SHA1 algorithm
(http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode) in a
pure ruby implementation. And in this way, I would need the ability to
accomplish the “Pre-processing” step by converting the input as a 64-bit
big-endian integer. I believe that’s could be more simple to do in Ruby
then in an other language such as in C. But I am not really sure about
the way to do so.

ri String#unpack

Unfortunately, the q/Q conversion character seems to use native ordering
and I don’t think there’s a network-order equivalent:

irb(main):002:0> “\000\000\000\000\000\000\000\001”.unpack(“Q”)
=> [72057594037927936]

If all you’re concerned about is this step:

“append length of message (before pre-processing), in bits, as 64-bit
big-endian integer”

then you could do it by converting to hex first:

buff << [("%016X" % len)].pack(“H*”)

BTW, I presume you’re doing this as an academic exercise. After all,
there’s already:

require ‘digest/sha1’
puts Digest::SHA1.hexdigest(“hello world”)

HTH,

Brian.


#5

On 16.10.2008 15:25, Brian C. wrote:

ri String#unpack
big-endian integer"

then you could do it by converting to hex first:

buff << [("%016X" % len)].pack(“H*”)

Or use “N” and combine, e.g.

irb(main):007:0> s = “\000\000\000\000\000\000\000\001”
=> “\000\000\000\000\000\000\000\001”
irb(main):008:0> r=[];s.unpack(“N*”).each_slice(2) {|hi,lo| r << (hi <<
32 | lo)}; r
=> [1]

Kind regards

robert


#6

Zangief I. wrote:

So if I have well understood, is that correct if I use unpack(‘N*’) like
this?

message = “A message”
=> “A message”

message.unpack(‘N*’).join.to_i.to_s(2)
=> “1001011110100010100001010001010100101100001110000101010101100111”

No. The message itself isn’t treated as a 64-bit integer, only the
length of the message is a 64-bit integer, which is appended to the
message. In this case the length is 9*8 = 72 bits, so you need
\x00\x00\x00\x00\x00\x00\x00\x48

Anyway, I don’t know why you are going to binary. You just want a String
of bytes. Don’t worry about the order of bits-within-bytes; it will be
correct, trust me :slight_smile:

Of course, if you are trying to write an SHA1 implementation which
properly handles input streams which are not a multiple of 8 bits long
(as many don’t), then you have a little bit more work to do. But not
very much, since the padding operating makes it into whole bytes anyway.

e.g. if your input is
10101010101

this becomes

10101010 10110000 00000000 00000000 …
^^^^^ ^^^^^^^^ ^^^^^^^^
padding

and hence your string just needs to be \xAA\xB0\x00\x00 … padded to
the correct length. And the length is \x00\x00\x00\x00\x00\x00\x00\x0b,
i.e. 11 bits.

However if your SHA1 input is just a stream of bytes, as is normally the
case, then the padding is simply \x80\x00\x00\x00\x00 … etc

Anyway, this is no longer a Ruby question, this is about reading the
SHA1 pseudocode correctly. But you could always submit it as a Ruby Q.
idea :slight_smile:


#7

Thank you all for your help.

So if I have well understood, is that correct if I use unpack(‘N*’) like
this?

message = “A message”
=> “A message”

message.unpack(‘N*’).join.to_i.to_s(2)
=> “1001011110100010100001010001010100101100001110000101010101100111”


#8

My apology, I had made a confusion between the length of the message and
the length appended of it at its end… Now that’s okay, many thanks :slight_smile:

I just have an ultimate question:
Because I would like to work with an input in binary format, I would
like to convert the message at the begining, before append the bit ‘1’
on it. In this goal, can I convert the data in message with this:

message = “A message”
=> “A message”

message.unpack(‘b*’).join
=>
“100000100000010010110110101001101100111011001110100001101110011010100110”

There is .unpack(‘B*’) too, but with “B” the order is not correct I
think.


#9

Zangief I. wrote:

Because I would like to work with an input in binary format, I would
like to convert the message at the begining, before append the bit ‘1’
on it. In this goal, can I convert the data in message with this:

message = “A message”
=> “A message”

message.unpack(‘b*’).join
=>
“100000100000010010110110101001101100111011001110100001101110011010100110”

There is .unpack(‘B*’) too, but with “B” the order is not correct I
think.

I believe you’ll need B*. The letter “A” should unpack to 01000001 (MSB
first).

However this is a really, really bad way to implement the SHA1
algorithm. If the input is already presented as a string of bytes, then
it is completely pointless to convert it into a string of bits, because
the SHA1 algorithm is designed to be run on bytes, as the pseudocode
demonstrates. That is one reason why the input has to be padded to a
multiple of 64 bytes; so that the core loop does not have to worry
about working at the bit level!

Of course, as an academic exercise, you’re free to do whatever you like.
If you want to experiment with binary arithmetic where the operands are
strings of 0x30 and 0x31 (representing bit 0 and bit 1 respectively),
then fine. The resulting code will be tortuous, use tons of RAM and run
extremely slowly.

(Hopefully it should also be clear from the pseudocode that you don’t
have to read in the entire message at the start at all. You can process
the message in 64-byte chunks, as it arrives)


#10

Just to make this clearer: the padding operation just pads the message
up to a multiple of 64 bytes (512 bits), where the last block consists
of 56 bytes (448 bits) followed by 8 bytes of message length.

So assuming your message consists only of whole bytes, as your example
implied, then I believe the padding operation is simply this:

message = “A message”
bits = message.size * 8
message << “\x80”
message << “\x00” while (message.size & 63) != 56
message << [("%016X" % bits)].pack(“H*”)

Now your message is exactly n * 64 bytes long, and you can proceed.


#11

Many Thanks for all your answers, Brian C… I am going to work as
you said, because I think that’s really more efficient.

Regards


#12

Brian C. wrote:

Just to make this clearer: the padding operation just pads the message
up to a multiple of 64 bytes (512 bits), where the last block consists
of 56 bytes (448 bits) followed by 8 bytes of message length.

So assuming your message consists only of whole bytes, as your example
implied, then I believe the padding operation is simply this:

message = “A message”
bits = message.size * 8
message << “\x80”
message << “\x00” while (message.size & 63) != 56
message << [("%016X" % bits)].pack(“H*”)

Now your message is exactly n * 64 bytes long, and you can proceed.

Firstly, forgive me for continuing the topic of SHA-1 here… but I found
that this is the relevant to what I am doing right now and hence wanted
to post here.

Brian C., I would like to say that my implementation of SHA-1 is
almost on the same lines as you have explained. I have also coded it so
that I handed in hex.

Let me list down the requirements of SHA-1 implementations followed by
where we might have an issue while using ruby.

  1. There is a bit wise operation that is required between 2 Hex values
    and this will only occur if both of them are Integer-Hex or Integer
    anything.

When you use unpack in ruby to get a string into its hex values, Ruby
still thinks that it is a string but only in its hex values.
For this to undergo bitwise operation.
You will have to explicitly convert this to an Integer.

For example lets say the string is ‘abc’
a gives 61 when unpacked in hex. so now. if the array holding this is
messageHex and the position of a is [0] then we will have to explicitly
say
messageHex[0].hex.to_i this will ensure that it is a integer in hex.

Next thing… appending strings ‘0x80’ or ‘0x00’ is felt not to be
appropriate by me because… if you were to use 0x80 or 0x00 then ruby
thinks that its an integer already and you dont need to do any explicit
type casting.

Also ruby does not explicitly give you a value in hex if you do any
mathematical or bitwise operation in hex, it always defaults to dec.

These were some of the issues I faced while implementing SHA-1.

Please do let me know if there are any workarounds or easier way to
implement or typecast the hex values. Also is there an default value for
ruby to understand that any unpacking of a string to hex will tell ruby
to take them as a integer value directly.

Thanks
Ashrith


#13

Ashrith Barthur wrote:

  1. There is a bit wise operation that is required between 2 Hex values
    and this will only occur if both of them are Integer-Hex or Integer
    anything.

“Integer-Hex” doesn’t really mean anything.

The specification says it works on 32-bit unsigned integer values, and
that each 64-byte block of source data is treated as 16 x 32-bit words.
You can get this via

data is a 64-byte string

w = data.unpack(“N16”)

now w is an array of 16 Integers

For example lets say the string is ‘abc’
a gives 61 when unpacked in hex. so now. if the array holding this is
messageHex and the position of a is [0] then we will have to explicitly
say
messageHex[0].hex.to_i this will ensure that it is a integer in hex.

String#hex will give you an Integer directly; the to_i is superfluous.

But in any case, the conversion into hex-ascii in the first place is
superfluous. Unpack directly to Integers, as shown above.

Next thing… appending strings ‘0x80’ or ‘0x00’ is felt not to be
appropriate by me because… if you were to use 0x80 or 0x00 then ruby
thinks that its an integer already and you dont need to do any explicit
type casting.

Sorry, but I am unable to make any sense of that sentence at all. The
input data to SHA1 is an arbitary-sized string of bytes (); the padding
algorithm requires you to add more bytes (
) to the end, to achieve
alignment into 64-byte blocks. So adding padding bytes is exactly what
is required.

() actually an arbitary-sized string of bits, but most
implementations assume that it’s a whole number of bytes, i.e. n
8 bits.

Also ruby does not explicitly give you a value in hex if you do any
mathematical or bitwise operation in hex, it always defaults to dec.

I think you may have lost the distinction between a number, and its
external representation.

Doing a bitwise operation “in hex” or “in decimal” doesn’t make any
sense. The number is stored internally in binary - this is a digital
computer, after all - and the bitwise operations are done on those bits.
It is only converted into a hex or decimal representation at the point
where you input or output the number.

a = 20
a.to_s # converts to string “20”
a.to_s(16) # converts to string “14”

Anyway, maybe you would like to submit this as a ruby quiz, as you’d
probably get some good implementations to look at.

Brian.


#14

Ashrith Barthur wrote:

I searched far and wide on the internet and I don’t see anyone posting
this kind of an error … I really dont get it as to why the code would
work with 64 loops but not 79 loops.

The only answer I can give is trite: “because there is a bug in your
program”, or “because you are doing something wrong”. If you don’t post
the code, then we cannot guess what you are doing wrong.

At a guess: your input by this stage should consist of an array of 80
32-bit integers, and you should be indexing this array to obtain w[i].
Possibly you have set up this array wrongly.

Is it gotta do with the amount of numbers that are being inserted in
each value of the array? while unpacking I have defined it as H8*16 so
does that limit the size of each value in array? would it work if i say
N16? or is it a completely different error?

Arrays in Ruby are of unlimited size (subject only to available RAM).

FWIW, I have attached a direct translation of the pseudocode on
Wikipedia. It seems to work for the handful of test vectors I’ve tried.

$ echo -n “” | sha1sum
da39a3ee5e6b4b0d3255bfef95601890afd80709 -
$ echo -n “hello” | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d -
$ echo -n “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”
| sha1sum
06ced2e070e58c2c4ed9f2b8cb890f0c512ce60d -


#15

Brian C. wrote:

Ashrith Barthur wrote:

  1. There is a bit wise operation that is required between 2 Hex values
    and this will only occur if both of them are Integer-Hex or Integer
    anything.

“Integer-Hex” doesn’t really mean anything.

The specification says it works on 32-bit unsigned integer values, and
that each 64-byte block of source data is treated as 16 x 32-bit words.
You can get this via

data is a 64-byte string

w = data.unpack(“N16”)

now w is an array of 16 Integers

For example lets say the string is ‘abc’
a gives 61 when unpacked in hex. so now. if the array holding this is
messageHex and the position of a is [0] then we will have to explicitly
say
messageHex[0].hex.to_i this will ensure that it is a integer in hex.

String#hex will give you an Integer directly; the to_i is superfluous.

But in any case, the conversion into hex-ascii in the first place is
superfluous. Unpack directly to Integers, as shown above.

Next thing… appending strings ‘0x80’ or ‘0x00’ is felt not to be
appropriate by me because… if you were to use 0x80 or 0x00 then ruby
thinks that its an integer already and you dont need to do any explicit
type casting.

Sorry, but I am unable to make any sense of that sentence at all. The
input data to SHA1 is an arbitary-sized string of bytes (); the padding
algorithm requires you to add more bytes (
) to the end, to achieve
alignment into 64-byte blocks. So adding padding bytes is exactly what
is required.

() actually an arbitary-sized string of bits, but most
implementations assume that it’s a whole number of bytes, i.e. n
8 bits.

Also ruby does not explicitly give you a value in hex if you do any
mathematical or bitwise operation in hex, it always defaults to dec.

I think you may have lost the distinction between a number, and its
external representation.

Doing a bitwise operation “in hex” or “in decimal” doesn’t make any
sense. The number is stored internally in binary - this is a digital
computer, after all - and the bitwise operations are done on those bits.
It is only converted into a hex or decimal representation at the point
where you input or output the number.

a = 20
a.to_s # converts to string “20”
a.to_s(16) # converts to string “14”

Anyway, maybe you would like to submit this as a ruby quiz, as you’d
probably get some good implementations to look at.

Brian.

Hi Brian…

Pretty much did on the same lines and Instead of N16 I unpacked it as
H8*16 just because the display has to be in hex mode.

Additionally… I am in a tight spot right now. I have coded the complete
algorithm. I do get the values out, or the digest…

You know how for SHA-1 we need to do a looping for 80 times so that the
bits are rotated and the a,b,c,d are changed and the keys are used…
well… this is the funny thing that happens with my code.

It works perfectly fine with in give the loop 64 times… that is

for i in 0…63 then moment i say.

for i in 0.79 the code returns with an error saying
“`[]=’: index 64 out of string (IndexError)”

I earlier thought that ruby arrays are limited to only 64 of them…
which i feel is a stupid assumption on my part, but I just gave it a
shot and recoded to handle the loop more than 64 but still its does not
work.

I searched far and wide on the internet and I don’t see anyone posting
this kind of an error … I really dont get it as to why the code would
work with 64 loops but not 79 loops.

Is it gotta do with the amount of numbers that are being inserted in
each value of the array? while unpacking I have defined it as H8*16 so
does that limit the size of each value in array? would it work if i say
N16? or is it a completely different error?


#16

Brian C. wrote:

The only answer I can give is trite: “because there is a bug in your
program”, or “because you are doing something wrong”. If you don’t post
the code, then we cannot guess what you are doing wrong.

At a guess: your input by this stage should consist of an array of 80
32-bit integers, and you should be indexing this array to obtain w[i].
Possibly you have set up this array wrongly.

Arrays in Ruby are of unlimited size (subject only to available RAM).

FWIW, I have attached a direct translation of the pseudocode on
Wikipedia. It seems to work for the handful of test vectors I’ve tried.

$ echo -n “” | sha1sum
da39a3ee5e6b4b0d3255bfef95601890afd80709 -
$ echo -n “hello” | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d -
$ echo -n “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”
| sha1sum
06ced2e070e58c2c4ed9f2b8cb890f0c512ce60d -

The following is the code that I have set… I figured that the fix for
that array limit is padding more 00s and then ripping it to a 80 row
array…
The thing is the code perfectly… works but my shasum is not the say
that I am supposed to get… well thats the issue…

here is the code

#!/usr/bin/ruby -w

h0=0x67452301
h1=0xefcdab89
h2=0x98badcfe
h3=0x10325476
h4=0xc3d2e1f0

message=‘abcde’

bit=message.size
message << ‘80’.hex

if(bit>64) then
newbitlength=bit%64
else
newbitlength=bit
end

while (newbitlength<=61) do
message <<‘00’.hex
newbitlength=newbitlength+1
end

message << ((’%016X’ %(bit*8)).hex)

for i in (0…79)
message<<‘00’.hex
end
message.unpack(‘H8’*80)

a=h0
b=h1
c=h2
d=h3
e=h4

for i in (0…79)
if (i>=16 or i<=79) then
message[i]=(((message[i-3]) ^ (message[i-8]) ^
(message[i-14]) ^ (message[i-16]))<<1)
tempmessage=(message[i])>>31
message[i]=(message[i]<<1)+tempmessage
end

    if (i>=0 or i<=19) then
            f=((b&c)|((~b)&d))
            k=0x5A827999
   elsif (i>=20 or i<=39) then
            f=b^c^d
            k=0x6ED9EBA1
    elsif (i>=40 or i<=59) then
            f=(b&c)|(b&d)|(c&d)
            k=0x8F1BBCDC
    else
            f=b^c^d
            k=0xCA62C1D6
    end

tempvaluea=a>>27
arot=(a<<5)+tempvaluea

temp = (arot+f+e+k+(message[i]))%(2**32)

e=d
d=c
tempvalueb=b>>2
brot=(b<<30)+tempvalueb
c=brot
b=a
a=temp

h0=(h0+a)%(232)
h1=(h1+b)%(2
32)
h2=(h2+c)%(232)
h3=(h3+d)%(2
32)
h4=(h4+e)%(2**32)

puts i
puts “The value of H0:”<<h0.to_s(base=16)
puts “The value of H1:”<<h1.to_s(base=16)
puts “The value of H2:”<<h2.to_s(base=16)
puts “The value of H3:”<<h3.to_s(base=16)
puts “The value of H4:”<<h4.to_s(base=16)
end

puts “The digest for the given input is
:”<<h0.to_s(base=16)<<h1.to_s(base=16)<<h2.to_s(base=16)<<h3.to_s(base=16)<<h4.to_s(base=16)

Regards,Ashrith


#17

Ashrith Barthur wrote:

The thing is the code perfectly… works but my shasum is not the say
that I am supposed to get… well thats the issue…

Then it’s not working perfectly, is it?

OK, there’s tons wrong with this code. I’m not going to debug it fully
for you, as you need to do this yourself as a learning experience. I
suggest you debug it by putting lots of debugging statements in, e.g.

puts “h0 = #{h0.inspect}”

and comparing the values in your code at each point as it runs with
those in mine.

However, the following glaring errors stand out just from a visual
inspection:

while (newbitlength<=61) do

I think this should be <56 (448 bits)

message << ((’%016X’ %(bit*8)).hex)

Wrong expression: you have converted bit*8 to a hex ascii string, then
converted it straight back to decimal!!! So this is identical to

message << (bit * 8)

which will append one character to the message.

I suggest adding a check at this point to see that the padded message is
exactly a multiple of 64 bytes long, because with your code I don’t
think it is, but this is a requirement for the rest of the algorithm to
proceed.

for i in (0…79)
message<<‘00’.hex
end

Nowhere in the algorithm does it say add 80 zero bytes to the end of the
message.

message.unpack(‘H8’*80)

This is a bizarre unpack operation on the message. But not only that,
you have not assigned the result to anywhere - so this line doesn’t do
anything at all!

a=h0
b=h1
c=h2
d=h3
e=h4

All the code from this point should be inside a loop, one iteration for
each 64-byte block of the message (as the pseudocode says: “break
message into 512-bit chunks // for each chunk”)

for i in (0…79)
if (i>=16 or i<=79) then
message[i]=(((message[i-3]) ^ (message[i-8]) ^
(message[i-14]) ^ (message[i-16]))<<1)
tempmessage=(message[i])>>31
message[i]=(message[i]<<1)+tempmessage
end

The pseudocode says: “break chunk into sixteen 32-bit big-endian words
w[i], 0 ≤ i ≤ 15”, but you have not done this.

So in your code, message[i] is a single byte, message[0] to message[63],
but actually you should have converted this to w[0] to w[15], each
element of w comprising 4 bytes from the original message.

puts “The value of H0:”<<h0.to_s(base=16)

The assignment to ‘base’ is not needed. i.e. h0.to_s(16) is all you
need.

However this won’t pad the string to 8 hex characters, so what you
really want is

("%08X" % h0)

That’s plenty of help - especially since you also have a working version
to compare against - so I’m not going to help you further on this.

Brian.


#18

Brian C. wrote:

I
suggest you debug it by putting lots of debugging statements in, e.g.

puts “h0 = #{h0.inspect}”

… and I also suggest you start with the message padding. For example,
the message “abcde” should pad to

“abcde\200\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000(”

That is, “abcde” followed by 0x80 followed by 50 zeros (making 56 bytes
so far), followed by 00 00 00 00 00 00 00 28 which is the length of the
message in bits as a 64-bit big-endian value, to give a 64-byte block.

If the message doesn’t look like this before you enter your block
processing loop, then there’s no hope of getting the right answer (GIGO
principle)