what’s the best way to determine if a file is yaml?
thanks,
t.
what’s the best way to determine if a file is yaml?
thanks,
t.
Trans wrote:
what’s the best way to determine if a file is yaml?
Naive answer:
def File.yaml?(fname)
YAML.load(IO.read(fname))
true
rescue ArgumentError
false
end
Though, open up irb -ryaml and keep running this line:
YAML.load Array.new(60){rand 256}.pack(‘c*’)
I’m not sure that’s what you’re after.
And I’m guessing you didn’t mean:
def File.yaml?(fname)
extname(fname) =~ /^ya?ml$/
end
Devin
Trans wrote:
what’s the best way to determine if a file is yaml?
Process the file using a parser meant to process YAML. If the parse
fails,
it means:
The file isn’t YAML.
The chosen parser is not robust enough to process this specific,
valid
YAML file.
The YAML file, although more or less valid YAML, has syntax errors
not
consistent with the formal YAML specification.
The YAML specification contains ambiguities that allow a valid parser
to
fail on valid YAML syntax.
Other.
In other words, you cannot really say, absolutely and unambiguously,
that a
particular file is a YAML file.
Trans wrote:
what’s the best way to determine if a file is yaml?
In light of the other responses, which show how hard it is to do this in
general, what about a pragmatic approach that might work in most of the
cases you are interested in?
Look at the first N lines.
If any line has any non-printing characters, it’s not correct YAML and
wasn’t generated by YAML#dump.[1]
If any are longer than M chars or other binary file heuristics apply[2],
it’s probably not a manually written YAML file.
If it passes at least one of these two checks, then check to see if
80% of the (first N) lines match the following:
/^\s*(-|?|[\w\s]*:)\s/
Maybe add some logic to skip blocks of text like this (so they don’t
count against the 80%):
a: |
skip
me
Also, check for > in place of |.
And also skip blanks and comments /^\s*(#|$)/.
And then finally load it and rescue any ArgumentError.
There are probably a lot of corner cases that kill this approach if you
cannot tolerate false negatives (i.e., legit yaml that gets rejected by
the above).
[1] The YAML spec, http://yaml.org/spec/current.html, says nonprinting
chars are encoded (see 4.1.1. Character Set), and it seems to be true,
at least in the dump output:
a: !binary |
Ag==
However, YAML can load unescaped binary data, as Devin showed:
irb(main):025:0> YAML.load “a: \002”
=> {“a”=>“\002”}
[2] For example,
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/52548
Joel VanderWerf wrote:
wasn’t generated by YAML#dump.[1]
count against the 80%):There are probably a lot of corner cases that kill this approach if you
cannot tolerate false negatives (i.e., legit yaml that gets rejected by
the above).
yikes! if that’s what it takes then i must run away! i need
something snappy. actually it just occured to me that as of YAML 1.1
the document declaration is mandetory. I had forgotten about that. So
checking for an initial line starting with %YAML would do the trick as
long as docs where 1.1 compliant --at least in this regard.
Unfortuantely Syck itself isn’t 1.1 compliant in this respect
whatsoever
In the mean time I’m just going to go with ara’s suggestion. the use of
an initial ‘—’ is an acceptable requirment for my needs.
t.
On Sun, 10 Dec 2006, Trans wrote:
what’s the best way to determine if a file is yaml?
thanks,
t.
in ruby queue i detect whether stdin input is a normal list or yaml in
this
way:
if first_non_blank_line =~ %r/^\s*—\s*$/
load_yaml_from_stdin
else
process_line first_non_blank_line
while((line = next_line[stdin]))
process_line line
end
end
not perfect, but’s it worked well enough so far
cheers.
also, from the command line i’ve taken to this approach
list_input_on_stdin = ARGV.delete ‘-’
yaml_input_on_stdin = ARGV.delete ‘—’
for, for example
cat.rb - # dump stdin
cat.rb — # load the yaml doc on stdin and dump that
note that ‘–’ is used to indicate the end of options so it is not a
good
flag.
regards.
-a
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs