Deep YAML


#1

I’m toying with GEDCOM again, and trying to write out the YAML
equivalent of a GEDCOM file (just for kicks, mainly).

It might look like this (edited for brevity):

0 @I1@ INDI
1 NAME Victoria /Hanover/
1 TITL Queen of England
1 FAMS @F1@
1 FAMC @F2@
0 @I2@ INDI
1 NAME Albert Augustus Charles//
1 TITL Prince
1 FAMS @F1@
1 FAMC @F3@

0 @F1@ FAM
1 HUSB @I2@
1 WIFE @I1@
1 CHIL @I3@

The stuff between @ symbols can be thought of as pointers (defined when
in front, referenced when in back). The natural way to store this in
ruby is to set up a hash for these pointer names while parsing and store
the actual object references. i.e. victoria[‘FAMS’] does not equal
@F1@” but the actual family Hash. Now, when trying to dump the ultimate
data structures (which is made up entirely of Arrays, Hashes, and
Strings), you can see that by the very nature of the data almost
everything in the file will be referenced in one long link from the root
person, who is usually at the beginning. So basically the first entry in
the YAML top-level array would nestingly include every other individual
and family. This naturally makes for a very deep structure, which is
neither human-friendly nor stack-friendly. In fact even modest GEDCOM
files break the stack during to_yaml with this approach.

Is there no way to have yaml break deep nesting with an ID reference? Or
better yet, use an ID at any point other than the shallowest point where
that object occurs?


#2

Hans F. wrote:

I’m toying with GEDCOM again, and trying to write out the YAML
equivalent of a GEDCOM file (just for kicks, mainly).

It might look like this (edited for brevity):

0 @I1@ INDI
1 WIFE @I1@
1 CHIL @I3@

Is there no way to have yaml break deep nesting with an ID reference? Or
better yet, use an ID at any point other than the shallowest point where
that object occurs?

YAML will make an ID reference rather than inserting the object if it
has already encountered that object as it traverses the object tree. You
just have to make sure that it encounters the objects in the correct
order.

Here’s my reasoning. We are trying to dump a really deep tree to YAML:

first build the tree:

deeptree = []
curr = deeptree
refs = []
5000.times do |i|
n = [i, []]
curr << n
curr = n[1]
refs << n
end

then try and dump it:

deeptree.to_yaml #=> fails with stack error
refs.to_yaml #=> fails with stack error
refs.reverse.to_yaml #=> succeeds!

Deeptree looks like:
[1, [2, [3, [4, []]]]]
which fails because its so deep.

Refs looks like:
[
[1, [2, [3, [4, []]]]],
[2, [3, [4, []]]],
[3, [4, []]],
[4, []],
[]
]
which fails because the first element is so deep.

refs.reverse looks like:
[
[],
[4, []],
[3, [4, []]],
[2, [3, [4, []]]],
[1, [2, [3, [4, []]]]]
]

which works because at each step YAML can insert an object reference
because it has already encountered the second array element in the
previous step.

So in your case, instead of just saving the family tree, save an array
containing references to all members of the tree. Make sure to start
with the leaves of the tree and work your way up through the ancestors.
When you reload the last element of the array is your complete tree.

hope this helps,
Dan


Illustrations:

irb> arr
=> [0, [1, [2, [3, [4, []]]]]]

irb> refs
=> [[1, [2, [3, [4, []]]]], [2, [3, [4, []]]], [3, [4, []]], [4, []],
[]]

irb> puts arr.to_yaml

  • 0
    • 1
      • 2
        • 3
          • 4
          • []
            => nil

irb> puts refs.to_yaml

    • 1
    • &id001
      • 2
      • &id002
        • 3
        • &id003
          • 4
          • &id004 []
  • *id001

  • *id002

  • *id003

  • *id004
    => nil

irb> puts refs.reverse.to_yaml

  • &id001 []

  • &id002

    • 4
    • *id001
  • &id003

    • 3
    • *id002
  • &id004

    • 2
    • *id003
    • 1
    • *id004
      => nil

#3

Daniel L. wrote:

then try and dump it:

deeptree.to_yaml #=> fails with stack error
refs.to_yaml #=> fails with stack error
refs.reverse.to_yaml #=> succeeds!

I thought of this, ufortunately this data is more like a doubly-linked
tree. i.e. if you reverse it you still get the same behavior, just from
a different angle.


#4

Hans F. wrote:

Daniel L. wrote:

then try and dump it:

deeptree.to_yaml #=> fails with stack error
refs.to_yaml #=> fails with stack error
refs.reverse.to_yaml #=> succeeds!

I thought of this, ufortunately this data is more like a doubly-linked
tree. i.e. if you reverse it you still get the same behavior, just from
a different angle.

Because there are links going up the tree as well as down?

Dan


#5

Daniel L. wrote:

first build the tree:

deeptree = []
curr = deeptree
refs = []
5000.times do |i|
n = [i, []]
curr << n
curr = n[1]
refs << n
end

Oops, that actually makes a tree that looks like this:
[[0, [[1, [[2, [[3, [[4, []]]]]]]]]]]

I meant this:

refs = []
deeptree = []
curr = deeptree
5000.times do |i|
n = []
curr << i
curr << n
curr = n
refs << n
end

which does look like this:
[0, [1, [2, [3, [4, []]]]]]

soz,
Dan


#6

Daniel L. wrote:

Because there are links going up the tree as well as down?

Dan

Yes, exactly. Links from parent to child, and also from child to parent
(and from person to family and family to person).