On 04.09.2007 00:32, Stefan R. wrote:
Matthias Wächter wrote:
It was the code meaning of "if two numbers are identical, I want to see
a difference in the string representation, too). Not more.
And that’s where you don’t understand floats obviously. Floats are
approximations. Never test approximations for identity. That’s
destined to fail. If you want identity, use an exact system as mentioned
often enough by now I think. E.g. Rational, BigDecimal or whatever else
floats your boat.
Right, floats are approximations of decimals. But they stand for
theirself. A float or double is a precise representation of a
base-2-encoded floating point. If I ask for a float for “0.25” I
expect it to be precise, not an approximation. That is part of
IEEE 754. Certainly, For “0.1” it is an approximation, but the float
(say, double) representation of 0.1, which is only an approximation
for 0.1, converted back to decimal is precisely
“0.1000000000000000055511151231257827021181583404541015625”, period.
Please accept that fact.
Now that we know the precise value of the float back in decimal, we
can make three cases:
- I am interested in the high precision of converting double back
to a decimal string representation.
Well, then use “%.55g” or something like that to get it back. But
certainly, for smaller numbers this might not be enough anyway. a
full string representation of the float representation of 1e-200
requires a format of “%.515g” to be shown correctly. Interesting
though, that Python allows no more than 109 decimal points using
“%.109g” and throws an OverflowError exception with 110 points
onwards (!).
BTW: Float::MIN requires 715 digits for an exact representation.
Funny output awaits you from Python (2.4.4) when you ask for
“%.714e” % 0.1
- I am not interested in the high precision of converting double
back to a decimal string representation, but I am interested in a
string representation that allows me, like for any other object, to
distinguish two different binary float values.
-> Here we have the point where Ruby lacks a feature: A
distinguishable string representation with minimum length (just what
Marshal.dump does). Note that 0.1.to_s doesn’t have to have 55
decimals in the output to be identifiable: For the given number of
0.1, the least significant float (i.e. double) digit has a weight of
2.0**-56 what is about 1.4E-17, so this is the magnitude where Ruby
could stop outputting the string representation.
So the double with binary representation 0x3FB999999999999A and the
next double with succeeding binary representations would have to
have these string representations:
For 0.1 (0x3FB999999999999A)
0.10000000000000000555 … exact
0.10000000000000001 distinguishable (output of Marshal.dump)
For 0.1+2.0**-56 (0x3FB999999999999B)
0.10000000000000001942 … exact
0.10000000000000002 distinguishable
For 0.1+2.0**-55 (0x3FB999999999999C)
0.10000000000000003330 … exact
0.10000000000000003 distinguishable
For 0.1+2.0**-55+2.0**-56 (0x3FB999999999999D)
0.10000000000000004718 … exact
0.10000000000000005 distinguishable (note there is no float
for an end digit of “4”)
- I am not interested in high-precision output as I know that
arbitrary decimal values are handled using approximations after all,
and I don’t want to be bothered with this level of detail.
Well, there is an argument, certainly. First you take exact decimal
“numbers”, put them into variables which converts them to an
approximated binary floating point format with high precision of
around 17 digits after the comma. Second you make calculations on
them which might increase the error, and when the result is designed
for output, you use “%.#{MY_precision}f” to get a result from the
imperfect calculation. So be it.
The question, the only question here is whether .to_s should
already make assumptions on the requested output precision.
Have I overlooked an overridable constant in Float so I can make
.to_s behave exactly like Marshal.dump?
Here is a naive function that compares binary and string
representations of slightly differing values:
base=0.1
(-60…-47).each do |ex|
b=base
n=(b+2**ex)
an,ab=[n,b].collect{|m| m.to_s}
mn,mb=[n,b].collect{|m|
Marshal.dump(m).sub(/…([^\000])./,’\1’)}
ea,en=[[an,ab],[n,b]].collect{|l,r|
l==r ? “==”:"!=" }
puts "#{b}+2^#{ex}: #{ea}, #{en}, "+
"String: #{an}#{ea}#{ab}, "+
“Number: #{mn}#{en}#{mb}”
end
Just run this - watch the output: The Marshal.dump (i.e., the real
float) differs for 1+2**-56 already, but .to_s changes first at
1+2**-50. Tell my why .to_s drops about 2 decimal digits from the
result for no good reason.
Does noone care about on some platforms, Ruby is not able to correctly
output floating points with the asked precision?
Ruby? Are you sure you found the culprit there? Are you sure ruby
doesn’t just wrap around whatever float libs are available on the
platform?
Quite possible. I don’t mind. Actually, win32-Python cannot output
more digits either while both cygwin-Python or the one on my
gentoo-box correctly output the asked precision, so it looks like a
windows (or VC++) library “feature”.
from marshal.c:
[…]
#ifdef DBL_DIG
#define FLOAT_DIG (DBL_DIG+2)
#else
#define FLOAT_DIG 17
#endif
[…]
/* xxx: should not use system’s sprintf(3) */
sprintf(buf, “%.*g”, FLOAT_DIG, d);
[…]
First to see: If DBL_DIG is not given, it goes for 17 digits. Second
is this nice comment – I like that one
Now a quick look to numeric.c:
[…]
sprintf(buf, “%#.15g”, value); /* ensure to print decimal point */
[…]
Now why this? Why output only 15 digits on to_s, but use 17 digits
when marshaling? Is there any good reason for that? Is 15 what
customers ask for to hide the rounding issues of floats or is it a bug?
Because in that case, it’s your platform that is to blame, not
ruby. I don’t know it, and I won’t go into the source to verify it, but
I wouldn’t just go around and blame ruby without being sure.
Noting the “xxx:” comment above makes me believe that Ruby
developers are already aware of imprecise libaries but didn’t have
had the time to fix that (i.e., write an own one or use a GNU
version of it).
I understood that floats are approximations, and it doesn’t bother me
much (maybe it would if I did scientific calculations - but then I’d
probably know a bit more about it and/or use different means) whether
the approximation is 0.000000000000010% or 0.000000000000006% off
(that’s the difference in your 0.1 example - notice something?).
Great words. “I don’t mind the precision if it looks precise
enough”. Either I can rely on correct IEEE 754 support or I can’t.
Again, it’s not a matter of taste how a floating point value is
transferred back to decimals. This can be done unambiguously, especially
if enough positions after decimal point are requested like in “print
%.60f”.
It’s just rather pointless as at some time in your calculation, you lost
precision already.
Who says that? Just because you always use 10-based numbers and
can live with the approximation? Another person might use precise
binary values, i.e. powers of two, and expects from the programming
language, the libraries and the floating point processor to
precisely manage and output it. Even if the thread started with
0.1 which is stored in float as the 56-digit value given above, I
expect to get this very value back from the programming language.
So now let me ask you, what do you do that you need 60 places? Are you
sure floats are the right tool?
Floats are good for what floats are good. What answer do you expect
from me?