Search and replace

ishamid · December 2, 2006, 6:45pm

[total novice here]

Hi,

I have a series of expressions like this (shortened from verbose xml)

[<text:sequence text:ref-name=“refAutoNr0”>1</text:sequence>
[<text:sequence text:ref-name=“refAutoNr1”>2</text:sequence>
[<text:sequence text:ref-name=“refAutoNr2”>3</text:sequence>
[<text:sequence text:ref-name=“refAutoNr3”>4</text:sequence>

I want to globally replace each such line with just

====================
\head

followed by a line space so I get

====================
\head

\head

etc.

I am modifying a script with lines like

====================
data.gsub!(/.?<(office:text).?>(.?)</\1>./mois) do
‘\starttext’ + “\n” + $2 + “\n” + ‘\stoptext’

and don’t yet know enough to completely understand. Probably a few more
hours/days of study will get me there but I need this urgently so…

THNX in advance

Best
Idris

ishamid · December 2, 2006, 7:01pm

ishamid wrote:

/ …

and don’t yet know enough to completely understand. Probably a few more
hours/days of study will get me there but I need this urgently so…

If you will post a short, complete data example, even just one record as
it
appears in your database, so we don’t have to try to read between the
lines, someone here will be happy to produce a way to filter the data in
the way you want.

ishamid · December 2, 2006, 7:17pm

ishamid wrote:

=====================
\head
I am modifying a script with lines like

Urght. ducks

Best
Idris

Regexps and XML always tend to blow up for me. The pattern you’re
searching for seems to be a complete element, why not use and XPath?

With REXML, it should be something like:

document.elements.each(’//text:sequence’) {|sequence|
sequence.replace_with(REXML::Text.new("\head\n", true))}

Substitute the XPath expression with one of desired precision. I’m a
little unsure around how REXML treats namespaces in XPath and such, but
if you know what prefix will be used in the document, that should work
out.

The script might also require a little more massaging if you’re
outputting to plaintext, but treating XML like, well, XML might get the
heavy lifting of searching for patterns in it done faster if you use a
pattern language operating on the DOM structure directly.

David V.

ishamid · December 2, 2006, 7:55pm

Hi Paul,

On Dec 2, 10:56 am, Paul L. wrote:

If you will post a short, complete data example, even just one record
as it

appears in your database, so we don’t have to try to read between the
lines, someone here will be happy to produce a way to filter the data in
the way you want.

Ok, here are 4 bibliography entries. I just did a follow-up posting
with more detail (including the full script I’m trying to modify) so
you may prefer to respond to that one. Thank you very much for your
help!.

======================
<text:p text:style-name=“ID”>[<text:sequence text:ref-name=“refAutoNr0”
text:name=“AutoNr” text:formula=“ooow:AutoNr+1”
style:num-format=“1”>1</text:sequence></text:p>
<text:p text:style-name=“P6”>'Abd al-RÃ¢ziq, Ahmad</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T3”>Die al-Azhar-Moschee</text:span><text:span
text:style-name=“T4”>., in, </text:span><text:span
text:style-name=“T3”>“SchÃ¤tze der Kalifen: Islamische Kunst zur
Fatimidenzeit.”</text:span><text:span text:style-name=“T4”>,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text:p>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“ID”><text:span
text:style-name=“T5”>[</text:span><text:sequence
text:ref-name=“refAutoNr1” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>2</text:sequence></text:p>
<text:p text:style-name=“P8”>'Abd al-RÃ¢ziq, Ahmad</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T6”>La mosquÃ©e al-Azhar</text:span><text:span
text:style-name=“T7”>., in, </text:span><text:span
text:style-name=“T6”>“TrÃ©sors fatimides du Caire. Exposition
prÃ©sentÃ©e Ã l’Institut du Monde Arabe …
</text:span><text:span
text:style-name=“T8”>1998.”</text:span><text:span
text:style-name=“T9”>, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text:p>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr2” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>3</text:sequence></text:p>
<text:p text:style-name=“Standard”>text:s/'Amri, Husay
'Abdallah</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma’iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name=“Style2”>New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr3” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>4</text:sequence></text:p>
<text:p text:style-name=“Standard”>Abarahamov, Binyamin</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>An Isma’ili Epistemology: The Case of
Al-Da’i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name=“Style2”>Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>

======================

ishamid · December 3, 2006, 2:31am

ishamid wrote:

Ok, here are 4 bibliography entries. I just did a follow-up posting
with more detail (including the full script I’m trying to modify) so
you may prefer to respond to that one. Thank you very much for your
help!.

Okay, thanks for the data example. Now to move forward, could you please
tell us what you want to do with it? Which parts of the data end up in
the
output, and in what form?

You earlier said you wanted to process the XML to get a series of

\head

\head
\head
\head

But I think you mean these to be placeholders for the actual data, and I
can’t sort out which parts of the XML are meant to end up in the “\head”
elements.

It would help if you could show an example of the data in the XML and
its
literal relocation into the desired output format.

Postscript. I copied your posted data example and couldn’t parse it,
because
there is a mismatch between opening and closing tags – it’s a simple
sanity check I always perform when dealing with XML, and unfortunately
the
posted data isn’t a complete, internally consistent XML sample. That
would
have allowed me to indent/format the XML and get some idea of its
overall
structure.

Without an internally consistent XML data block with balanced tags, I
can’t
parse the XML, and if I can’t parse the XML, I can’t extract any data
from
it in a reliable way.

ishamid · December 2, 2006, 8:06pm

Thank you, David, for your pointers. I’m still very much a novice (at
the level of Chris P.'s Learn to Program) so I could not follow them
all, but I do hope to learn more fast. I just sent a follow-up with
more detail, including the script I’m trying to modify; I hope you have
a chance to look at it…

Thank you again
Idris

ishamid · December 3, 2006, 5:55am

ishamid wrote:

Ok, here are 4 bibliography entries. I just did a follow-up posting
text:style-name=“T3”>Die al-Azhar-Moschee</text:span><text:span
text:formula=“ooow:AutoNr+1”
147-149</text:span></text:p>
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>An Isma’ili Epistemology: The Case of
Al-Da’i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name=“Style2”>Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>

======================

puts DATA.read.gsub( %r{<(text:sequence)\s[^>]>(.?)</\1>}i,
“\starttext\n\2\n\stoptext” )

— output -----

<text:p text:style-name=“ID”>[\starttext
1
\stoptext</text:p>
<text:p text:style-name=“P6”>'Abd al-R\xC3\xA2ziq,
Ahmad</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T3”>Die al-Azhar-Moschee</text:span><text:span
text:style-name=“T4”>., in, </text:span><text:span
text:style-name=“T3”>“Sch\xC3\xA4tze der Kalifen: Islamische Kunst
zur
Fatimidenzeit.”</text:span><text:span text:style-name=“T4”>,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text:p>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“ID”><text:span
text:style-name=“T5”>[</text:span>\starttext
2
\stoptext</text:p>
<text:p text:style-name=“P8”>'Abd al-R\xC3\xA2ziq,
Ahmad</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T6”>La mosqu\xC3©e al-Azhar</text:span><text:span
text:style-name=“T7”>., in, </text:span><text:span
text:style-name=“T6”>“Tr\xC3©sors fatimides du Caire. Exposition
pr\xC3©sent\xC3©e \xC3 l’Institut du Monde Arabe …
</text:span><text:span
text:style-name=“T8”>1998.”</text:span><text:span
text:style-name=“T9”>, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text:p>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“ID”>[\starttext
3
\stoptext</text:p>
<text:p text:style-name=“Standard”>text:s/'Amri, Husay
'Abdallah</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma’iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name=“Style2”>New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”>[\starttext
4
\stoptext</text:p>
<text:p text:style-name=“Standard”>Abarahamov, Binyamin</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>An Isma’ili Epistemology: The Case of
Al-Da’i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name=“Style2”>Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>

Search and replace

I have a series of expressions like this (shortened from verbose xml)

[<text:sequence text:ref-name=“refAutoNr0”>1</text:sequence> [<text:sequence text:ref-name=“refAutoNr1”>2</text:sequence> [<text:sequence text:ref-name=“refAutoNr2”>3</text:sequence> [<text:sequence text:ref-name=“refAutoNr3”>4</text:sequence>

==================== \head

\head

==================== data.gsub!(/.?<(office:text).?>(.?)</\1>./mois) do ‘\starttext’ + “\n” + $2 + “\n” + ‘\stoptext’

[<text:sequence text:ref-name=“refAutoNr0”>1</text:sequence>
[<text:sequence text:ref-name=“refAutoNr1”>2</text:sequence>
[<text:sequence text:ref-name=“refAutoNr2”>3</text:sequence>
[<text:sequence text:ref-name=“refAutoNr3”>4</text:sequence>

====================
\head

====================
data.gsub!(/.?<(office:text).?>(.?)</\1>./mois) do
‘\starttext’ + “\n” + $2 + “\n” + ‘\stoptext’