I’m working on a script that examines a DITA XML file and tries to
determine where we put conrefs (where content is being pulled from). I
have most of the code working but I’m trying now to determine what type
of element something comes from.
All XML tags have ID numbers
this is a paragraph
- list item
If I need to reference the list item in a document for example, the id
number is used to pull that data into the other document. What the
script is trying to accomplish is to create a list of what conrefs are
in each file and reporting on them.
It’s easy enough to determine if a con ref is in a file, then open that
document to get the title of the document. But what is killing me is
trying to determine what type of element is being referenced. For
example, all I know is I’m looking for: ‘a4563’ easy enough to find
via a .match, but what I really want to know is what element is that id
number part of in the example of ‘a4563’
I suspect that I’ll need to do some regex groupings, but my regex-fu in
this area is very weak!
Anybody have some suggestions?