Marshal's handling of floats

parki · July 9, 2006, 6:34am

I was thinking about writing a patch to modify how Marshal handles
floats, right now it dumps them using sprintf(3) and stores the
resulting string in the Marshal stream. I’d like to see it handle
floats the same way that Array#pack does:

[400.53].pack(‘g’).length == 4
[400.53].pack(‘G’).length == 8

while

Marshal.dump(400.53).length - 3 == 22
(and is slower, to boot)

I want to make sure, though, that this would be an acceptable patch.
I can’t think why it would be OK for Array#pack to work this way and
not Marshal, but is there any particular reason why it can’t be done?
Obviously it would break backwards compatability with older Marshal
dumps, but I don’t think they’re often used for long-term storage,
are they?

– Brian P.

parki · July 9, 2006, 7:41am

On Sun, 9 Jul 2006, Brian P. wrote:

Marshal.dump(400.53).length - 3 == 22
(and is slower, to boot)

I want to make sure, though, that this would be an acceptable patch. I can’t
think why it would be OK for Array#pack to work this way and not Marshal, but
is there any particular reason why it can’t be done? Obviously it would break
backwards compatability with older Marshal dumps, but I don’t think they’re
often used for long-term storage, are they?

– Brian P.

i’ve never tried to use marshaled data across a big and little endian
machine

but this would break it. consider drb: if you had a mac and a linux
box
talking on the wire you might see

harp:~ > ruby -e’ puts
[1.44417819733316e-41].pack(“g”).reverse.unpack(“g”)[0].to_i ’
42

which could be confusing. then again maybe i’m overlooking something.

cheers.

-a

parki · July 9, 2006, 9:21am

Hi Ara,

The ‘g’ and ‘G’ flags for Array#pack/String#unpack are in network
byte order, so they’re in a platform-independent format, as far as I
know. I actually tested this by Packing a couple thousand floats on
my Mac, sending them in UDP packets and unpacking them on my AMD64
desktop, they all came across correctly.

– Brian

parki · July 10, 2006, 1:55am

Hi,

In message “Re: Marshal’s handling of floats”
on Sun, 9 Jul 2006 13:31:59 +0900, Brian P.
[email protected] writes:

|I was thinking about writing a patch to modify how Marshal handles
|floats, right now it dumps them using sprintf(3) and stores the
|resulting string in the Marshal stream. I’d like to see it handle
|floats the same way that Array#pack does:
|
|[400.53].pack(‘g’).length == 4
|[400.53].pack(‘G’).length == 8
|
|while
|
|Marshal.dump(400.53).length - 3 == 22
|(and is slower, to boot)
|
|I want to make sure, though, that this would be an acceptable patch.

There are issues:

pack(‘g’) would not work on non-IEEE floating machines.
changing marshal format in incompatible way causes a lot of
troubles, so that it should be avoided if possible.

I think we can merge it for 1.9 (if we address IEEE754 issue).

						matz.

parki · July 10, 2006, 2:14am

On Jul 9, 2006, at 5:52 PM, Yukihiro M. wrote:

|
There are issues:

pack(‘g’) would not work on non-IEEE floating machines.

changing marshal format in incompatible way causes a lot of
troubles, so that it should be avoided if possible.

I think we can merge it for 1.9 (if we address IEEE754 issue).
  					matz.

Yes, that’s a biggie. I didn’t realize that ruby compiled on non-IEEE
machines, but it makes sense now that I think about it. I think this
is out of my league, I suppose it would require integrating a
floating-point emulation library into ruby on such platforms, and
having that library handle the packing/marshaling, or even using that
library to back all Float objects on such platforms. I think that for
my purposes it makes more sense to just write a separate Marshal-type
extension library, since I only plan to target IA32, IA64 and Apple
G4/G5.

Thanks for the response!

– Brian

parki · July 10, 2006, 3:58am

Hey,

On Jul 9, 2006, at 7:13 PM, [email protected] wrote:

my purposes it makes more sense to just write a separate Marshal-type

Actually, I’m not terribly concerned with precision, but rather with
speed and size. The application I’m building needs to communicate
data over a wireless network every 100 milliseconds, and it needs to
use as little bandwidth as possible. I was considering just using the
Marshal methods, but Marshal’s floating-point representation killed
that idea, though for other Ruby built-in data types it seems to be
quite good, even truncating small integers to bytes and shorts. I’ve
decided that I only need a couple decimal places of accuracy, though,
so I’m just going to multiply each float by 100 and then convert it
to an int.

Out of curiousity, what do you mean by ‘represent floating points in
hexadecimal format’? I’ve never heard of that before.

– Brian P.

parki · July 10, 2006, 3:15am

Hi,

At Mon, 10 Jul 2006 09:12:32 +0900,
Brian P. wrote in [ruby-talk:201018]:

Yes, that’s a biggie. I didn’t realize that ruby compiled on non-IEEE
machines, but it makes sense now that I think about it. I think this
is out of my league, I suppose it would require integrating a
floating-point emulation library into ruby on such platforms, and
having that library handle the packing/marshaling, or even using that
library to back all Float objects on such platforms. I think that for
my purposes it makes more sense to just write a separate Marshal-type
extension library, since I only plan to target IA32, IA64 and Apple
G4/G5.

What’s the reason of your proposal?

If it is for precision issue, rather I’d suppose to represent
floating points in hexadecimal format.

parki · July 10, 2006, 5:45am

Brian P. wrote:

Actually, I’m not terribly concerned with precision, but rather with
speed and size. The application I’m building needs to communicate data
over a wireless network every 100 milliseconds, and it needs to use as
little bandwidth as possible. I was considering just using the Marshal
methods, but Marshal’s floating-point representation killed that idea,
though for other Ruby built-in data types it seems to be quite good,
even truncating small integers to bytes and shorts. I’ve decided that I
only need a couple decimal places of accuracy, though, so I’m just going
to multiply each float by 100 and then convert it to an int.

If you don’t need to send arbitrary (marshallable) ruby objects, but
only fairly well-defined struct-like packets, the you might find this
helpful. It’s built on top of pack/unpack, but it has a “dsl” flavor and
makes it easier to work with odd-length bit fields, for example:

http://redshift.sourceforge.net/bit-struct/

(I wrote this for purposes like yours: to send data over a constrained
wireless network, and also to communicate easily with non-ruby code, so
Marshal was not an option.)

parki · July 10, 2006, 4:29am

Yukihiro M. wrote:

There are issues:

pack(‘g’) would not work on non-IEEE floating machines.

changing marshal format in incompatible way causes a lot of
troubles, so that it should be avoided if possible.

I think we can merge it for 1.9 (if we address IEEE754 issue).
  					matz.

A couple of questions:

How does XML handle floats?
How does YAML handle floats?
What is the format of a float when dumped using Marshal?

My recommendation would be to dump floats as the hexadecimal
representation of IEEE 64-bit formatted numbers. This is “almost
universal” and occupies only 16 bytes. The alternative, dumping them in
decimal in some “scientific” notation, takes more bytes and loses small,
but noticeable, accuracy. Moreover, it does not capture IEEE’s “Inf” and
“NaN” values, which are very much part of the semantics and syntax of
modern numeric processing.

Non-IEEE architectures are very much the exception rather than the rule,
and they can be expected to dump IEEE hex and read IEEE hex as a penalty
for not adopting the standard.

–
M. Edward (Ed) Borasky

parki · July 10, 2006, 7:10am

Hi,

In message “Re: Marshal’s handling of floats”
on Mon, 10 Jul 2006 11:26:25 +0900, “M. Edward (Ed) Borasky”
[email protected] writes:

|1. How does XML handle floats?
|2. How does YAML handle floats?
|3. What is the format of a float when dumped using Marshal?

Currently all use human-readable decimal string representation.

						matz.

parki · July 10, 2006, 9:51am

Brian P. [email protected] wrote:

Hi Ara,

The ‘g’ and ‘G’ flags for Array#pack/String#unpack are in network
byte order, so they’re in a platform-independent format, as far as I
know. I actually tested this by Packing a couple thousand floats on
my Mac, sending them in UDP packets and unpacking them on my AMD64
desktop, they all came across correctly.

Yeah, but I still see serious drawbacks of your proposal:

it will break existing code (YAML data)
it’s not human readable which is one of the key advantages (YAML
files
seem to be frequently written by hand as a nice user interface to config
data)
it’s not portable and it might not be compliant with YAML spec

In sum all these are serious showstoppers for a general change of YAML
behavior.

Another solution - if you want to make use of this: AFAIK you can
customize
YAMLfication of classes. If that was true, you can pretty easily
perform
the conversion during to conversion to and from YAML. That would be the
way
to go IMHO.

Kind regards

robert

parki · July 10, 2006, 8:44am

Hi,

At Mon, 10 Jul 2006 10:56:55 +0900,
Brian P. wrote in [ruby-talk:201027]:

Out of curiousity, what do you mean by ‘represent floating points in
hexadecimal format’? I’ve never heard of that before.

Introduced in C99, like as 0xaaaa.ccccP+10.

parki · July 10, 2006, 4:11pm

Robert K. wrote:

it will break existing code (YAML data)

it’s not human readable which is one of the key advantages (YAML files
seem to be frequently written by hand as a nice user interface to config
data)

it’s not portable and it might not be compliant with YAML spec

In sum all these are serious showstoppers for a general change of YAML
behavior.

I thought the OP wanted to change the behavior of Marshal, not YAML. Or
are you saying that the YAML rep’n would also have to change so that
both formats have the same lossiness?

parki · July 10, 2006, 5:12pm

Hey Robert, I have gone with another solution but just as an aside –
we’re talking about the Marshal class, not YAML. I definitely
wouldn’t suggest storing anything in YAML as binary data

– Brian

parki · July 11, 2006, 6:57pm

On Mon, Jul 10, 2006 at 11:26:25AM +0900, M. Edward (Ed) Borasky wrote:

A couple of questions:

How does XML handle floats?

Not at all, or however you like to. XML can be used to define your own
formats, it just isn’t concerned with storing floats per se.

How does YAML handle floats?

Human readable decimal “scientific” format. Space inefficient, maybe
slightly inexact, but very portable.

irb(main):002:0> (1.0/3).to_yaml
=> “— 0.333333333333333\n”
irb(main):006:0> (10.0**20/3).to_yaml
=> “— 3.33333333333333e+19\n”

What is the format of a float when dumped using Marshal?

irb(main):013:0> Marshal.dump(1.0/3)
=> “\004\010f\e0.33333333333333331\000UU”
irb(main):014:0> Marshal.dump(10.0**20/3)
=> “\004\010f\0363.3333333333333332e+19\000\020U”

This could be changed in 1.9, maybe keeping some backwards
compatibilty in Marshal.load for the old format.

for not adopting the standard.
I think that’s reasonable. Note that legacy machines don’t need to
implement full IEEE operations, just reading and writing.

-JÃ¼rgen