Forum: Ruby Deep YAML

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F0a85fe10cdceda395c89279bb0850f9?d=identicon&s=25 Hans Fugal (Guest)
on 2007-08-02 05:50
(Received via mailing list)
I'm toying with GEDCOM again, and trying to write out the YAML
equivalent of a GEDCOM file (just for kicks, mainly).

It might look like this (edited for brevity):

0 @I1@ INDI
1 NAME Victoria /Hanover/
1 TITL Queen of England
1 FAMS @F1@
1 FAMC @F2@
0 @I2@ INDI
1 NAME Albert Augustus Charles//
1 TITL Prince
1 FAMS @F1@
1 FAMC @F3@
...
0 @F1@ FAM
1 HUSB @I2@
1 WIFE @I1@
1 CHIL @I3@

The stuff between @ symbols can be thought of as pointers (defined when
in front, referenced when in back). The natural way to store this in
ruby is to set up a hash for these pointer names while parsing and store
the actual object references. i.e. victoria['FAMS'] does not equal
"@F1@" but the actual family Hash. Now, when trying to dump the ultimate
data structures (which is made up entirely of Arrays, Hashes, and
Strings), you can see that by the very nature of the data almost
everything in the file will be referenced in one long link from the root
person, who is usually at the beginning. So basically the first entry in
the YAML top-level array would nestingly include every other individual
and family. This naturally makes for a very deep structure, which is
neither human-friendly nor stack-friendly. In fact even modest GEDCOM
files break the stack during to_yaml with this approach.

Is there no way to have yaml break deep nesting with an ID reference? Or
better yet, use an ID at any point other than the shallowest point where
that object occurs?
0158871402c1ecfa57952e8a379cfd10?d=identicon&s=25 Daniel Lucraft (lucraft)
on 2007-08-02 12:43
Hans Fugal wrote:
> I'm toying with GEDCOM again, and trying to write out the YAML
> equivalent of a GEDCOM file (just for kicks, mainly).
>
> It might look like this (edited for brevity):
>
> 0 @I1@ INDI
> 1 WIFE @I1@
> 1 CHIL @I3@
>
>
> Is there no way to have yaml break deep nesting with an ID reference? Or
> better yet, use an ID at any point other than the shallowest point where
> that object occurs?

YAML will make an ID reference rather than inserting the object if it
has already encountered that object as it traverses the object tree. You
just have to make sure that it encounters the objects in the correct
order.

Here's my reasoning. We are trying to dump a really deep tree to YAML:

# first build the tree:
deeptree = []
curr = deeptree
refs = []
5000.times do |i|
  n = [i, []]
  curr << n
  curr = n[1]
  refs << n
end

# then try and dump it:
deeptree.to_yaml       #=> fails with stack error
refs.to_yaml           #=> fails with stack error
refs.reverse.to_yaml   #=> succeeds!

Deeptree looks like:
[1, [2, [3, [4, []]]]]
which fails because its so deep.

Refs looks like:
[
  [1, [2, [3, [4, []]]]],
  [2, [3, [4, []]]],
  [3, [4, []]],
  [4, []],
  []
]
which fails because the first element is so deep.

refs.reverse looks like:
[
  [],
  [4, []],
  [3, [4, []]],
  [2, [3, [4, []]]],
  [1, [2, [3, [4, []]]]]
]

which works because at each step YAML can insert an object reference
because it has already encountered the second array element in the
previous step.

So in your case, instead of just saving the family tree, save an array
containing references to all members of the tree. Make sure to start
with the leaves of the tree and work your way up through the ancestors.
When you reload the last element of the array is your complete tree.

hope this helps,
Dan

--------------------
Illustrations:

irb> arr
=> [0, [1, [2, [3, [4, []]]]]]

irb> refs
=> [[1, [2, [3, [4, []]]]], [2, [3, [4, []]]], [3, [4, []]], [4, []],
[]]

irb> puts arr.to_yaml
---
- 0
- - 1
  - - 2
    - - 3
      - - 4
        - []
=> nil

irb> puts refs.to_yaml
---
- - 1
  - &id001
    - 2
    - &id002
      - 3
      - &id003
        - 4
        - &id004 []

- *id001
- *id002
- *id003
- *id004
=> nil

irb> puts refs.reverse.to_yaml
---
- &id001 []

- &id002
  - 4
  - *id001
- &id003
  - 3
  - *id002
- &id004
  - 2
  - *id003
- - 1
  - *id004
=> nil
0158871402c1ecfa57952e8a379cfd10?d=identicon&s=25 Daniel Lucraft (lucraft)
on 2007-08-02 12:49
Daniel Lucraft wrote:
> # first build the tree:
> deeptree = []
> curr = deeptree
> refs = []
> 5000.times do |i|
>   n = [i, []]
>   curr << n
>   curr = n[1]
>   refs << n
> end

Oops, that actually makes a tree that looks like this:
[[0, [[1, [[2, [[3, [[4, []]]]]]]]]]]

I meant this:

refs = []
deeptree = []
curr = deeptree
5000.times do |i|
  n = []
  curr << i
  curr << n
  curr = n
  refs << n
end

which does look like this:
[0, [1, [2, [3, [4, []]]]]]

soz,
Dan
F0a85fe10cdceda395c89279bb0850f9?d=identicon&s=25 Hans Fugal (Guest)
on 2007-08-02 15:36
(Received via mailing list)
Daniel Lucraft wrote:
> # then try and dump it:
> deeptree.to_yaml       #=> fails with stack error
> refs.to_yaml           #=> fails with stack error
> refs.reverse.to_yaml   #=> succeeds!

I thought of this, ufortunately this data is more like a doubly-linked
tree. i.e. if you reverse it you still get the same behavior, just from
a different angle.
0158871402c1ecfa57952e8a379cfd10?d=identicon&s=25 Daniel Lucraft (lucraft)
on 2007-08-02 15:50
Hans Fugal wrote:
> Daniel Lucraft wrote:
>> # then try and dump it:
>> deeptree.to_yaml       #=> fails with stack error
>> refs.to_yaml           #=> fails with stack error
>> refs.reverse.to_yaml   #=> succeeds!
>
> I thought of this, ufortunately this data is more like a doubly-linked
> tree. i.e. if you reverse it you still get the same behavior, just from
> a different angle.

Because there are links going up the tree as well as down?

Dan
F0a85fe10cdceda395c89279bb0850f9?d=identicon&s=25 Hans Fugal (Guest)
on 2007-08-02 17:16
(Received via mailing list)
Daniel Lucraft wrote:
> Because there are links going up the tree as well as down?
>
> Dan

Yes, exactly. Links from parent to child, and also from child to parent
(and from person to family and family to person).
This topic is locked and can not be replied to.