To quote or not to quote (I think it's a bug)

Hi,

I’m not sure this is the right place to report issues about Syck but
that’s
the best I’ve found so far :slight_smile: I think I’ve found a problem with the way
Syck
quotes strings that could look like floats. Here is a short example:

irb(main):001:0> require ‘yaml’
=> true
irb(main):002:0> YAML.dump([“1.2”, “1.2.3”, “1.2_3”])
=> “— \n- “1.2”\n- 1.2.3\n- 1.2_3\n”

The first element is quoted appropriately, the second isn’t because
there’s
no ambiguity but the third should be quoted. The YAML float can have
underscores, they’re used as visual separators and should be ignored by
the
parser. So when seeing 1.2_3 the parser should read the float 1.23. Then
to
disambiguates the string “1.2_3” should be quoted.

Practically this isn’t really a problem as YAML converts on writes so
YAML.dump(1.2_3) gets directly written as 1.23. However it creates some
interoperability issues when “1.2_3” gest written unquoted by Syck and
then
read by another parser as being 1.23. The string isn’t a string anymore
in
that case.

What do you think?

Thanks,
Matthieu R.

On 2/19/07, Matthieu R. [email protected] wrote:

that case.

Are you certain that 1.2_3 is a valid YAML float?

http://yaml.org/type/float.html

I do not believe underscores are valid in either YAML floats or YAML
integers. Although they are allowed in Ruby, I believe Syck is
handling your example case properly and without ambiguity.

Blessings,
TwP

Hi Tim,

In that page:

“Any “_” characters in the number are ignored, allowing a readable
representation of large values.”

So clearly they’re allowed, in 1.2_3 the underscore is simply ignored
and
the parser should undestand 1.23. If I fancy to write my own YAML by
hand
and make it easily readable (which is kind of the original purpose) I
could
write 100_000_000.03 which would be a nice looking float. Hence the
ambiguity with “1.2_3”.

Cheers,
Matthieu

ambiguity with “1.2_3”.
But the specification is not just in english, it’s also in regex form:

[-+]?([0-9][0-9_])?.[0-9.]([eE][-+][0-9]+)? (base 10)
|[-+]?[0-9][0-9_](:[0-5]?[0-9])+.[0-9_] (base 60)
|[-+]?.(inf|Inf|INF) # (infinity)
|.(nan|NaN|NAN) # (not a number)

For a base 10 number, underscores are permitted only if they appear
BEFORE the decimal point, so for 1.2_3 there is no ambiguity.

Dan.

So clearly they’re allowed, in 1.2_3 the underscore is simply
ignored and
the parser should undestand 1.23. If I fancy to write my own
YAML by hand
and make it easily readable (which is kind of the original
purpose) I could
write 100_000_000.03 which would be a nice looking float. Hence the
ambiguity with “1.2_3”

Hmmmm… I just noticed that the specification
(Floating-Point Language-Independent Type for YAML™ Version 1.1) isn’t just unclear, it’s inconsistent.

The examples listed include:

exponentioal: 685.230_15e+03

Notwithstanding the spelling of exponential, that example is not
consistent with the regular expression (which allows underscores only if
they preceed the decimal point -
[-+]?([0-9][0-9_])?.[0-9.]([eE][-+][0-9]+)?).

Dan.

On Tue, Feb 20, 2007 at 10:03:25AM +0900, Matthieu R. wrote:

The first element is quoted appropriately, the second isn’t because there’s
no ambiguity but the third should be quoted. The YAML float can have
underscores, they’re used as visual separators and should be ignored by the
parser.

Only in YAML 1.1. Syck only speaks YAML 1.0. Unfortunately, the
types repository for 1.0 is no longer at yaml.org.

See Float Language-Independent Type For YAML
for the floats which Syck should support.

_why