Understanding YAML and this practice in general

dubstep · March 15, 2011, 1:27am

Hi,

I’m currently reading the book called “Learn to Program” and I’m excited
to see a language like Ruby with a clear syntax and probably easier to
understand even for a newbie like me who have very little programming
experience, in fact I already got stocked and I would like someone to
help me
understand this.

In chapter 11 (Reading and Writing, Saving and Loading,…) the author
demonstrates how you can save the output of your programs and I started
wondering when and why you would need to save the output of a program?

I know its probably too early to ask this kind of questions but I really
started wondering since he started talking about YAML which apparently
can be used in multiple languages and I’m assuming this is an important
part of programming?

Can someone be so kind and explain me when and why you would need YAML?

Is this a common practice when writing a program where you dont have
access to a database and which requires little data management?

This is some code sample, this actually creates a file called
ListerQuote.txt and saves the output string.
#------------------
filename = ‘ListerQuote.txt’
test_string = 'I promise that Iend valid
swear absolutely that ’ + ‘I will never mention gazpacho soup again.’

File.open filename, ‘w’ do |f|
f.write test_string
end
read_string = File.read filename
puts(read_string == test_string)
#------------------------

Sorry if my question doesn’t make too much sense but I’m coming from a
web development world with only some knowledge in Actionscript 3.0,
Javascript, CSS and HTML.

Thanks a lot!

fs_tigre · March 15, 2011, 3:42am

Fily S. wrote in post #987440:

I started
wondering when and why you would need to save the output of a program?

Sorry if my question doesn’t make too much sense but I’m coming from a
web development world with only some knowledge in Actionscript 3.0,
Javascript, CSS and HTML.

Have you ever heard of a logfile? The idea is that something monitors a
program you are running and logs/writes information to a file about
various
things that occur while the program is running. You can then open the
logfile and read the file to see what happened while your programming
was
running. With a unix program like tail, you can even open the file and
watch the changes to the file in real time.

Do you know what a cookie is? It is a file on a user’s computer that
contains a short bit of information. Sometimes a javascript program
writes output in the form of a cookie.

There are even Ruby programs that make writing html easier, and their
output is an html file.

erb is a ruby program that will scan an html file(or any other file
type) and replace bits of ruby code embedded in the file with the output
of the ruby code. The result is an html file.

Suppose you run a business, and at the end of every day your employee
enters information about every transaction in a file. The information
contains the customer’s name and the dollar amount purchased–one line
in the file for every transaction. You ask your ruby programmer to
write a program that reads all the transaction files and print out the
total amount each customer spent that month. The program must read each
transaction and record all the transactions for each customer, i.e. one
customer could have 100 separate transactions spread through out the
files. The ruby programmer could output the totals to another file,
arranged alphabetically by customer name, and then send the file to
you my email.

Video games store totals and high scores to files. TV’s store favorite
channels and programs to be recorded in files. It would be easier to
try and count the programs that don’t store output to files…in fact I
can’t think of a single one.

If a computer program(including a database) doesn’t store output to
files, then no data can persist between runs of the program.

I know its probably too early to ask this kind of questions but I really
started wondering since he started talking about YAML which apparently
can be used in multiple languages and I’m assuming this is an important
part of programming?

Can someone be so kind and explain me when and why you would need YAML?

At the heart of your question is the question of data persistence–in
other words how can you make data persist from one program run to the
other. Anyone can make a string persist by writing a string to a file
and reading it back later, but what about arrays and even
objects(=things that store both data and functions/methods). How can
you
store those in a file and then read them back later? YAML and the many
other programs that “serialize” data allow you to easily store things
like arrays, hashes/dictionaries, and objects to a file and then
reconstitute them the next time you run a program. YAML’s claim to fame
is that the file it creates is human readable. In fact, you can even
edit the file by hand to change it. However, in the beginning, data
serializers stored what looked like random characters in a file. They
were actually complex codes that only a computer could decipher.

fs_tigre · March 15, 2011, 12:47pm

Thank you for your reply! I now have a better idea as to what YAML does
and its big plus, it basically makes the store objects human readable,
even though I still don’t know have a clear picture as to where or how
this will be used later, but I guess the answer will come later when I
gain more experience.

Again I don’t have too much experience with high level languages other
than Actionscript 3.0 and JavaScript.

Exited learning this new world!

Thanks a lot!

fs_tigre · March 15, 2011, 1:24pm

On Tue, Mar 15, 2011 at 11:48 AM, Fily S. [email protected]
wrote:

Thank you for your reply! I now have a better idea as to what YAML does
and its big plus, it basically makes the store objects human readable,
even though I still don’t know have a clear picture as to where or how
this will be used later, but I guess the answer will come later when I
gain more experience.

Again I don’t have too much experience with high level languages other
than Actionscript 3.0 and JavaScript.

Hi Fily,
it is often very useful to save the state of your program (or just a
piece of it to the hard disk). You usually do this for the following
reasons:

you need to restore the state of the program after you have switched
it
off and restarted it
you need to save user operations and work in a manner that you can
easily
load

Using traditional means (databases, export formats etc.) this process is
actually technically challenging to do. It usually involves writing a
lot of
code yourself to make the transformations, and dealing with any
corruption.

YAML on the other hand, saves your Ruby objects themselves. Its
representation is standardised, so saving the state of your Ruby objects
is
simply a manner of passing the object to the appropriate function.
Restoring is the reverse.

It is an extremely simple procedure that is useful in a lot of contexts
(it
is not suitable for large sets of data, or heavy use).

For instance, the YAML file can be moved between machines and used to
replicate the data on another instance of the running program, either by
manually transferring the file, or sending it over the network between
the
computers themselves.

Also YAML files are human readable and human editable, which is also an
advantage - especially if you are using them for compiling test data, or
exporting the state of a running program (could be handy for debugging
crashes).

YAML is not the only technique for dumping the state of a program in
Ruby
(and these techniques are not Ruby specific either). In the JavaScript
world, JSON fulfills almost exactly the same purpose. I believe that
measures have been taken to make JSON and YAML grammatically compatible.

regards,
Richard

fs_tigre · March 15, 2011, 3:53pm

Wow! Thanks a lot for taking the time to explain this so well.

I’m pretty sure I will have more questions as I learn more and more but
for now I’m clear.

Thanks a lot!

fs_tigre · March 15, 2011, 4:06pm

Fily S.:

I now have a better idea as to what YAML does and its big
plus, it basically makes the store objects human readable,
even though I still don’t know have a clear picture as
to where or how this will be used later, but I guess the
answer will come later when I gain more experience.

Example: I wrote a simple email signature randomiser¹ so
that my email signatures are picked at random from a given
pool (this email is to ruby-talk, so the one below is picked
from the ones that are both technical and in English).

I could store the signature database in an RDBMS, but that
would be an overkill; all I need is a simple command-line
program that reads the signatures from a file, selects those
matching the requirements and then picks one from them.

Without YAML I’d have to create a storage format that takes into
account the signature’s body, author, subject, source and tags
– with YAML I just serialise the signature object and I’m done.
On the flip side, if I even find a typo in a signature, I can
easily directly edit the YAML file² and be done with it.

¹ GitHub - chastell/signore
²
dotfiles/.local/share/signore/signatures.yml at master · chastell/dotfiles · GitHub

— Piotr S.

fs_tigre · March 15, 2011, 4:28pm

On Tue, Mar 15, 2011 at 4:06 PM, Piotr S.
[email protected] wrote:

Without YAML Id have to create a storage format that takes into
account the signatures body, author, subject, source and tags
with YAML I just serialise the signature object and Im done.
On the flip side, if I even find a typo in a signature, I can
easily directly edit the YAML file and be done with it.

Neither of this is, of course, limited to YAML. YAML just has the
benefit of being shipped with Ruby’s standard library, thus it is
ubiquitous within the Ruby world.

That doesn’t mean it’s the best, simplest, or most efficient tool to
store any ol’ text in a programming language-compatible way (that’s
highly usage based, since sometimes performance matters, sometimes a
small footprint in LOC matters, sometimes the time is limited to solve
a problem, etc.).

YAML’s excellent at storing Ruby objects in a human-readable fashion,
and importing them again Somewhere Else. It’s excellent for
configuration data, or simple jobs (I used a Ruby website templating
engine using YAML for single pages/posts many moons ago), but falls
apart when your data doesn’t easily fit into the Ruby object mold
(i.e. Hashes, Arrays, Strings, &c.).

Which brings me to a nicely apropos email signature:

The good thing about reinventing the wheel is that you can get a round one.
[Douglas Crockford on JSON vs XML]

The difference between JSON / YAML and XML is that that JSON and YAML
do one thing, and do it well(-ish), while XML is extremely flexible.
On the downside, that means JSON or YAML doesn’t always work
well(-ish), while XML is extremely flexible.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

fs_tigre · March 16, 2011, 1:12am

I still don’t know have a clear picture as
to where or how this will be used later

When you are learning a language that really isn’t important. The most
important thing is understanding that you can do something and how to do
it–not why it is important.

Later, when you are writing a program, a light bulb may go off in your
head, and you suddenly realize that some little thing you learned about
in the past will work perfectly in your code. Or, it may be that such a
light bulb may never go off, and the only way you will ever understand
how to apply what you learned is by reading other people’s code and
seeing how they used YAML or some other feature you learned about in
their code, and then you copy that idea in your code.

You can think of YAML as a lightweight database if you want. If you
don’t need all the features of a full blown database, then you can use
YAML instead. Or, you can think of YAML as a JSON or XML
substitute–and then ponder why anyone would need to use JSON or XML.

fs_tigre · March 15, 2011, 6:50pm

Thank you all very much for your help!

fs_tigre · March 16, 2011, 3:37pm

You can think of YAML as a lightweight database if you want. If you
don’t need all the features of a full blown database, then you can use
YAML instead. Or, you can think of YAML as a JSON or XML
substitute–and then ponder why anyone would need to use JSON or XML.

Got it, it is now definitely clear. Now can the output be formatted
(make text bold etc like in MySQL and CSS) if desired?

Thanks a lot

fs_tigre · March 17, 2011, 2:27am

The data stored in a mysql database does not have any inherent style
associated with it–the data is a string or a number or a date. You can
of course retrieve data from a database, insert the data into an html
page, and then add some css instructions for the data–which a web
browser then interprets as an instruction to display the data with a
certain style. So you see, it is the web browser that adds style to the
data–not the database. And because YAML is also a database, you can do
the same thing with data stored in a YAML file.

By the way, another light weight database is SQLite3. I believe Ruby on
Rails 3.0(the latest version) uses SQLite3 by default.

fs_tigre · March 17, 2011, 1:45pm

Thanks a lot for the good info!