OOo and regexp

[novice]
Hi,

Paul L. suggested that I give more detail about my problem. Ok,
here it is:

BACKGROUND:
Save a small document in OOo format, like a bibliographic entry with,
say, the article title in bold and the journal in italics

  • under options - save → disable xml size optimization
  • save the file
  • copy the file to a subdirectory
  • run “unzip filename”

content.xml has the data we want to convert to TeX. A sample
content.xml is given at the end of this message, after the script.

RUBY: I have a script provided by a colleague that does a lot of the
work needed to convert this to a sane ConTeXt file. I am trying to
teach myself enough ruby to edit this script as needed for academic
articles (I edit an academic journal in TeX). The script is reproduced
at the end of this message.

PROBLEMS: Yesterday I did learn about regexp and made progress, though
the script is still buggy:

i) In the script (l. 110–112) I have

===========
str.gsub!(/"(.*?)"/) do
‘\quotation {’ + $1 + ‘}’
end

but line 114 of content.xml the " pair is not converted, though
it is converted elsewhere.

ii) (really weird) In the script (l. 45–47) I have

============
@data.gsub!(/[<(text:sequence
text:ref-name=“refAutoNr0”).?>.?</text:sequence>/mois) do
‘\startitemize’ + ‘\head’
end

This apparently works fine. Now I want some linespace between
‘\startitemize’ & ‘\head’, so I put a “\n\n” in between them. This
causes the xml tags to appear in the output file like this

============
<text:p text:style-name=“ID”>\startitemize

\head</text:p>

iii) any tips for improving this script are appreciated. I’m sure I’ll
have more questions over the next couple of days as I work on this.

Thank you all in advance for any help or pointers for this novice :slight_smile:

Best
Idris

================idris.rb==============
class OpenOffice

 # using an xml parser if overkill and we need to regexp anyway

 attr_reader :display, :inline, :translate
 attr_writer :display, :inline, :translate

 def initialize
     @data = nil
     @file = ''
     @display = Hash.new
     @inline = Hash.new
     @translate = Hash.new
 end

 def load(filename)
     if not filename.empty? and FileTest.file?(filename) then
         begin
             @data, @file = IO.read(filename), filename
         rescue
             @data, @file = nil, ''
         end
     else
         @data, @file = nil, ''
     end
 end

 def save(filename='')
     if filename.empty? then
         filename = "clean-#{@file}.tex"
     end
     if f = open(filename,'w') then
         f.puts(@data)
         f.close
     end
 end

 def convert
     @translations = Hash.new
     @translate.each do |k,v|
         @translations[/#{k}/] = v
     end
     if @data then
         @data.gsub!(/\[<(text:sequence

text:ref-name=“refAutoNr0”).?>.?</text:sequence>/mois) do
‘\startitemize’ + “\n\n” + ‘\head’ # + “\n\n”
end
@data.gsub!(/[</(text:span)><(text:sequence
text:ref-name=“refAutoNr[^0].?").?>.?</text:sequence>/mois) do
‘\head’
end
@data.gsub!(/[<(text:sequence
text:ref-name="refAutoNr[^0].
?”).?>.?</text:sequence>/mois) do
‘\head’
end
@data.gsub!(/.?<(office:text).?>(.?)</\1>./mois) do
‘\enableregime[utf]’ + “\n” + ‘\useencoding[cyr]’ + “\n\n” +
‘\definetypeface [russian]’
+ “\n” + ’ ’ + ‘[rm] [serif] [computer-modern] [default]
[encoding=t2a]’ + “\n\n” +
‘\starttext’+ “\n\n” + ‘\switchtobodyfont[russian]’ + “\n” + $2 +
“\n” +
‘\stopitemize’ + “\n\n” + ‘\stoptext’
end

@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).?>.?</\1>/mois)
do
# remove
end
# @data.gsub!(/<(text:span
text:style-name=“T10”)>(.?)</text:span>/mois) do
# ‘{’ + '\bf ’ + $2 + ‘}’
# end
# @data.gsub!(/<(text:span
text:style-name=“Style2”)>(.
?)</text:span>/mois) do
# ‘{’ + '\it ’ + $2 + ‘}’
# end

@data.gsub!(/text:span.*?text:style-name=([\'\"])(.*?)\1(.?)</text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||‘’) + clean_display(text) +
(inline[tag][1]||‘’)
else
clean_display(text)
end
end
@data.gsub!(/text:span.*?text:style-name=(“.*?”)/) do
# remove
end
@data.gsub!(/<?.
??>/) do
# remove
end
@data.gsub!(//) do
# remove
end
@data.gsub!(/text:p[^]*?/>/) do
# remove
end

@data.gsub!(/text:p.*?text:style-name=([\'\"])(.*?)\1(.*?)</text:p>/)
do
tag, text = $2, $3
if display[tag] then
“\n” + (display[tag][0]||‘’) + clean_inline(text)

  • (display[tag][1]||‘’) + “\n”
    else
    “\n” + clean_inline(text) + “\n”
    end
    end
    @data.gsub!(/text:s[^]?/>/) do
    # remove
    end
    @data.gsub!(/text:bookmark[^]
    ?/>/) do
    # remove
    end
    @data.gsub!(/\t/,’ ‘)
    @data.gsub!(/^ +$/,’')
    @data.gsub!(/\n\n+/moi,“\n\n”)
    end
    end

    def clean_display(str)
    str.gsub!(/"(.*?)"/) do
    ‘\quotation {’ + $1 + ‘}’
    end
    str.gsub!(/&/) do
    ‘&’
    end
    str
    end

    def clean_inline(str)
    @translations.each do |k,v|
    str.gsub!(k,v)
    end
    str
    end

end

def convert(filename)

 doc = OpenOffice.new

 doc.display['P1'] = ['\chapter{','}']
 doc.display['P2'] = ['\start'+"\n","\n"+'\stop']
 doc.display['P3'] = doc.display['P2']
# doc.display['ID'] = ['\relax']

 doc.inline['T1'] 		= ['','']
 doc.inline['T2'] 		= ['','']
doc.inline['T3'] 		= ['{\bf ','}']
doc.inline['T6'] 		= ['{\bf ','}']
doc.inline['T8'] 		= ['{\bf ','}']
doc.inline['T10'] 		= ['{\bf ','}']
doc.inline['T11'] 		= ['{\bf ','}']
doc.inline['Style2'] 	= ['{\it ','}']

 # doc.translate['¬'] 		= 'XX'
doc.translate['&apos;'] = '`'
doc.translate['&amp;'] 	= '\&'

 doc.load(filename)

 doc.convert

 doc.save

end

filename = ARGV[0]

filename = ‘content.xml’ if not filename or filename.empty?

convert(‘content.xml’)
===========content.xml============

<?xml version="1.0" encoding="UTF-8"?>

<office:document-content
xmlns:office=“urn:oasis:names:tc:opendocument:xmlns:office:1.0”
xmlns:style=“urn:oasis:names:tc:opendocument:xmlns:style:1.0”
xmlns:text=“urn:oasis:names:tc:opendocument:xmlns:text:1.0”
xmlns:table=“urn:oasis:names:tc:opendocument:xmlns:table:1.0”
xmlns:draw=“urn:oasis:names:tc:opendocument:xmlns:drawing:1.0”
xmlns:fo=“urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0”
xmlns:xlink=“XLink namespace
xmlns:dc=“DCMI: DCMI Metadata Terms
xmlns:meta=“urn:oasis:names:tc:opendocument:xmlns:meta:1.0”
xmlns:number=“urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0”
xmlns:svg=“urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0”
xmlns:chart=“urn:oasis:names:tc:opendocument:xmlns:chart:1.0”
xmlns:dr3d=“urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0”
xmlns:math=“MathML Namespace
xmlns:form=“urn:oasis:names:tc:opendocument:xmlns:form:1.0”
xmlns:script=“urn:oasis:names:tc:opendocument:xmlns:script:1.0”
xmlns:ooo=“http://openoffice.org/2004/office
xmlns:ooow=“http://openoffice.org/2004/writer
xmlns:oooc=“http://openoffice.org/2004/calc
xmlns:dom=“XML Events namespace
xmlns:xforms=“XForms Namespace
xmlns:xsd=“XML Schema
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance
office:version=“1.0”>
office:scripts/
office:font-face-decls
<style:font-face style:name=“Wingdings” svg:font-family=“Wingdings”
style:font-pitch=“variable” style:font-charset=“x-symbol”/>
<style:font-face style:name=“Symbol” svg:font-family=“Symbol”
style:font-family-generic=“roman” style:font-pitch=“variable”
style:font-charset=“x-symbol”/>
<style:font-face style:name=“Tahoma2” svg:font-family=“Tahoma”/>
<style:font-face style:name=“Arial Unicode MS”
svg:font-family=“'Arial Unicode MS'”
style:font-pitch=“variable”/>
<style:font-face style:name=“MS Mincho” svg:font-family=“'MS
Mincho'” style:font-pitch=“variable”/>
<style:font-face style:name=“Tahoma1” svg:font-family=“Tahoma”
style:font-pitch=“variable”/>
<style:font-face style:name=“Garamond” svg:font-family=“Garamond”
style:font-family-generic=“roman” style:font-pitch=“variable”/>
<style:font-face style:name=“Times New Roman”
svg:font-family=“'Times New Roman'”
style:font-family-generic=“roman” style:font-pitch=“variable”/>
<style:font-face style:name=“Arial” svg:font-family=“Arial”
style:font-family-generic=“swiss” style:font-pitch=“variable”/>
<style:font-face style:name=“Tahoma” svg:font-family=“Tahoma”
style:font-family-generic=“swiss” style:font-pitch=“variable”/>
</office:font-face-decls>
office:automatic-styles
<style:style style:name=“P1” style:family=“paragraph”
style:parent-style-name=“Standard”
style:master-page-name=“First_20_Page”>
<style:paragraph-properties fo:text-align=“center”
style:justify-single-word=“false”/>
</style:style>
<style:style style:name=“P2” style:family=“paragraph”
style:parent-style-name=“Standard”>
<style:paragraph-properties fo:text-align=“center”
style:justify-single-word=“false”/>
<style:text-properties fo:font-size=“14pt” fo:font-weight=“bold”
style:font-size-asian=“14pt” style:font-weight-asian=“bold”
style:font-size-complex=“14pt” style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“P3” style:family=“paragraph”
style:parent-style-name=“Standard”>
<style:paragraph-properties fo:text-align=“center”
style:justify-single-word=“false”/>
<style:text-properties fo:font-size=“18pt” fo:font-weight=“bold”
style:font-size-asian=“18pt” style:font-weight-asian=“bold”
style:font-size-complex=“18pt” style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“P4” style:family=“paragraph”
style:parent-style-name=“Standard”
style:master-page-name=“Convert_20_1”/>
<style:style style:name=“P5” style:family=“paragraph”
style:parent-style-name=“Standard”
style:master-page-name=“Convert_20_2”/>
<style:style style:name=“P6” style:family=“paragraph”
style:parent-style-name=“Standard”>
<style:text-properties style:font-name-asian=“Wingdings”
style:font-size-complex=“10pt”/>
</style:style>
<style:style style:name=“P7” style:family=“paragraph”
style:parent-style-name=“reference”>
<style:text-properties style:font-name-asian=“Wingdings”/>
</style:style>
<style:style style:name=“P8” style:family=“paragraph”
style:parent-style-name=“Standard”>
<style:text-properties fo:language=“fr” fo:country=“FR”
style:font-name-asian=“Wingdings” style:font-size-complex=“10pt”/>
</style:style>
<style:style style:name=“P9” style:family=“paragraph”
style:parent-style-name=“Standard”>
<style:text-properties style:font-size-complex=“10pt”/>
</style:style>
<style:style style:name=“P10” style:family=“paragraph”
style:parent-style-name=“reference”>
<style:text-properties fo:font-size=“11pt”
style:font-size-asian=“11pt” style:font-size-complex=“9pt”/>
</style:style>
<style:style style:name=“P11” style:family=“paragraph”
style:parent-style-name=“reference2”>
<style:text-properties fo:font-size=“11pt”
style:font-size-asian=“11pt” style:font-size-complex=“9pt”/>
</style:style>
<style:style style:name=“T1” style:family=“text”>
<style:text-properties fo:font-size=“21pt” fo:font-weight=“bold”
style:font-size-asian=“21pt” style:font-weight-asian=“bold”
style:font-size-complex=“21pt” style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T2” style:family=“text”>
<style:text-properties fo:font-size=“21pt” fo:font-weight=“bold”
style:font-size-asian=“21pt” style:font-weight-asian=“bold”
style:font-size-complex=“22pt” style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T3” style:family=“text”>
<style:text-properties fo:font-weight=“bold”
style:font-name-asian=“Wingdings” style:font-weight-asian=“bold”
style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T4” style:family=“text”>
<style:text-properties style:font-name-asian=“Wingdings”/>
</style:style>
<style:style style:name=“T5” style:family=“text”>
<style:text-properties fo:language=“fr” fo:country=“FR”/>
</style:style>
<style:style style:name=“T6” style:family=“text”>
<style:text-properties fo:language=“fr” fo:country=“FR”
fo:font-weight=“bold” style:font-name-asian=“Wingdings”
style:font-weight-asian=“bold” style:font-size-complex=“10pt”
style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T7” style:family=“text”>
<style:text-properties fo:language=“fr” fo:country=“FR”
style:font-name-asian=“Wingdings” style:font-size-complex=“10pt”/>
</style:style>
<style:style style:name=“T8” style:family=“text”>
<style:text-properties fo:font-weight=“bold”
style:font-name-asian=“Wingdings” style:font-weight-asian=“bold”
style:font-size-complex=“10pt” style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T9” style:family=“text”>
<style:text-properties style:font-name-asian=“Wingdings”
style:font-size-complex=“10pt”/>
</style:style>
<style:style style:name=“T10” style:family=“text”>
<style:text-properties fo:font-weight=“bold”
style:font-weight-asian=“bold” style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T11” style:family=“text”>
<style:text-properties fo:font-weight=“bold”
style:font-weight-asian=“bold” style:font-size-complex=“10pt”
style:font-weight-complex=“bold”/>
</style:style>
<style:style style:name=“T12” style:family=“text”>
<style:text-properties style:font-size-complex=“10pt”/>
</style:style>
</office:automatic-styles>
office:body
office:text
text:sequence-decls
<text:sequence-decl text:display-outline-level=“0”
text:name=“Illustration”/>
<text:sequence-decl text:display-outline-level=“0”
text:name=“Table”/>
<text:sequence-decl text:display-outline-level=“0”
text:name=“Text”/>
<text:sequence-decl text:display-outline-level=“0”
text:name=“Drawing”/>
<text:sequence-decl text:display-outline-level=“0”
text:name=“AutoNr”/>
</text:sequence-decls>
<text:p text:style-name=“P1”><text:span
text:style-name=“T1”>Isma</text:span><text:span
text:style-name=“T2”>'</text:span><text:span
text:style-name=“T1”>ilis: A Bibliography</text:span></text:p>
<text:p text:style-name=“P2”/>
<text:p text:style-name=“P2”/>
<text:p text:style-name=“P2”/>
<text:p text:style-name=“P3”>Compiled by:</text:p>
<text:p text:style-name=“P3”>Ramin Khanbagi</text:p>
<text:p text:style-name=“P4”/>
<text:p text:style-name=“Standard”/>
<text:p text:style-name=“Standard”/>
<text:p text:style-name=“P5”/>
<text:p text:style-name=“Standard”/>
<text:p text:style-name=“Standard”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr0” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>1</text:sequence></text:p>
<text:p text:style-name=“P6”>'Abd al-Râziq, Ahmad</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T3”>Die al-Azhar-Moschee</text:span><text:span
text:style-name=“T4”>., in, </text:span><text:span
text:style-name=“T3”>"Schätze der Kalifen: Islamische Kunst zur
Fatimidenzeit."</text:span><text:span text:style-name=“T4”>,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text:p>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“ID”><text:span
text:style-name=“T5”>[</text:span><text:sequence
text:ref-name=“refAutoNr1” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>2</text:sequence></text:p>
<text:p text:style-name=“P8”>'Abd al-Râziq, Ahmad</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T6”>La mosquée al-Azhar</text:span><text:span
text:style-name=“T7”>., in, </text:span><text:span
text:style-name=“T6”>"Trésors fatimides du Caire. Exposition
présentée àl'Institut du Monde Arabe …
</text:span><text:span
text:style-name=“T8”>1998."</text:span><text:span
text:style-name=“T9”>, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text:p>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“P7”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr2” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>3</text:sequence></text:p>
<text:p text:style-name=“Standard”>text:s/'Amri, Husay
'Abdallah</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma'iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name=“Style2”>New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr3” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>4</text:sequence></text:p>
<text:p text:style-name=“Standard”>Abarahamov, Binyamin</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>An Isma'ili Epistemology: The Case of
Al-Da'i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name=“Style2”>Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr4” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>5</text:sequence></text:p>
<text:p text:style-name=“Standard”>Abel, A.</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T10”>De historische betekenis van de Loutere Broeders
van Basra (Bassorah), een wijsgerig gezelschap in de Islam van de Xe
eeuw</text:span>., <text:span text:style-name=“Style2”>Orientalia
Gandensia</text:span>, 1 (1964), pp. 157-170.</text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr5” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>6</text:sequence></text:p>
<text:p text:style-name=“P9”>Abou Said, A.C.</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T11”>Abbasid and Fatimid Political Relations during
the Buhawid Period</text:span><text:span text:style-name=“T12”>.,
University of Cambridge, 1967.</text:span></text:p>
<text:p text:style-name=“reference2”>[<text:span
text:style-name=“Style2”>Dissertation</text:span>]</text:p>
<text:p text:style-name=“reference2”/>
<text:p text:style-name=“reference2”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr6” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>7</text:sequence></text:p>
<text:p text:style-name=“P9”>Abu Firas, Shihab al-Din
al-Maynaqi</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T11”>Ash-Shafiya': An Isma'ili
Treatise</text:span><text:span text:style-name=“T12”>., Edited and
Translated with an Introduction and Commentary by Sami Nasib Makarim,
Beirut: American University of Beirut, 1966.</text:span></text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr7” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>8</text:sequence></text:p>
<text:p text:style-name=“P9”>Abu'l-Fida, al-Malik
al-Mu'ayyad 'Imad al-Din Ismai'l b. 'Ali</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T11”>The Memoirs of a Syrian
Prince</text:span><text:span text:style-name=“T12”>., Translated by
Peter Malcom Holt, Wiesbaden: Franz Steiner Verlag, [Freiburger
Islamstudien], 1983.</text:span></text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”><text:bookmark-start
text:name=“a01”/>[<text:sequence text:ref-name=“refAutoNr8”
text:name=“AutoNr” text:formula=“ooow:AutoNr+1”
style:num-format=“1”>9</text:sequence></text:p>
<text:p text:style-name=“P9”>Abu-Lughod, J.<text:bookmark-end
text:name=“a01”/></text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T11”>Cairo: 1001 Years of the City
Victorious</text:span><text:span text:style-name=“T12”>., Princeton:
Princeton University Press, 1971. </text:span></text:p>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“reference”/>
<text:p text:style-name=“ID”><text:bookmark-start
text:name=“a02”/>[<text:sequence text:ref-name=“refAutoNr9”
text:name=“AutoNr” text:formula=“ooow:AutoNr+1”
style:num-format=“1”>10</text:sequence></text:p>
<text:p text:style-name=“P6”>Adamji, Ebrahimji N. and Sorabji M.
Darookhanawala</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T8”>Two Indian Travellers: East Africa, 1902-1905:
Being Accounts of Journeys Made by Ebrahimji N. Adamji, a Very Young
Bohra Merchant from Mombasa & Sorabji M. Darookhanawala, a
Middle-Aged Parsi Engineer from Zanzibar</text:span><text:span
text:style-name=“T9”>., Edited by C. Salvadori and J. Aldrick, Mombasa:
Friends of Fort Jesus, 1997.</text:span></text:p>
<text:p text:style-name=“P10”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr1113” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>11</text:sequence></text:p>
<text:p
text:style-name=“P9”>Каландаров, Тохир
Сафарбекович</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T11”>РелигиознаяситуациянаПамире(кпроблемерелигиозногосинкретизма).
(Summary: The Religious Situation on the Pamirs (to the problem of
religious syncretism).)</text:span><text:span text:style-name=“T12”>.,
</text:span><text:span
text:style-name=“Style2”>Восток</text:span><text:span
text:style-name=“T12”>, 2000 vi, pp. 36-49;219</text:span></text:p>
<text:p text:style-name=“reference2”>[Ismailis in
Tajikistan.]</text:p>
<text:p text:style-name=“reference2”/>
<text:p text:style-name=“reference2”/>
<text:p text:style-name=“ID”>[<text:sequence
text:ref-name=“refAutoNr1114” text:name=“AutoNr”
text:formula=“ooow:AutoNr+1”
style:num-format=“1”>12</text:sequence></text:p>
<text:p
text:style-name=“P9”>Шохуморов, Саиданвар</text:p>
<text:p text:style-name=“reference”><text:span
text:style-name=“T11”>Исмаилизм:
традицииисовременность</text:span><text:span
text:style-name=“T12”>., </text:span><text:span
text:style-name=“Style2”>ЦентральнаяАзияиКавказ</text:span><text:span
text:style-name=“T12”>, 2000ii/8, pp. 128-138</text:span></text:p>
<text:p text:style-name=“P11”>[Also online at <text:span
text:style-name=“T10”>Central Asia and the Caucasus – An Open Access Journal</text:span>]</text:p>
</office:text>
</office:body>
</office:document-content>

ishamid wrote:

[novice]
Hi,

Paul L. suggested that I give more detail about my problem. Ok,
here it is:

BACKGROUND:
Save a small document in OOo format,

Do you mean an Open Office Open Document format? The sort of data file
that
typically has a suffix of “.odt” and consists of a compressed set of XML
files for various purposes?

like a bibliographic entry with,
say, the article title in bold and the journal in italics

  • under options - save -> disable xml size optimization
  • save the file
  • copy the file to a subdirectory
  • run “unzip filename”

I think this answers my first question.

the script is still buggy:
it is converted elsewhere.
I am unable to correlate this line number with a " sequence in the
corresponding line in your provided XML sample. Are the two quote
sequences
on separate lines? If so, use this form:

str.gsub!(/"(.*?)"/m) do
‘\quotation {’ + $1 + ‘}’
end

Note the added ‘m’. This won’t work if you are parsing the file line by
line
and if the two " sequences are on different XML lines.

If (1) you have two " sequences on different lines, and if (2) you
are
processing the XML content line by line, then you will have to change
how
you process the file in the most fundamental way to get this particular
TeX
conversion to work.

This apparently works fine. Now I want some linespace between
‘\startitemize’ & ‘\head’, so I put a “\n\n” in between them. This
causes the xml tags to appear in the output file like this

============
<text:p text:style-name=“ID”>\startitemize

\head</text:p>

Yes. This is what you instructed the computer to do, and apparently the
computer succeeded in meeting your request. I assume this is on the TeX
side of the conversion process, and I don’t happen to know how a
linefeed
is represented in TeX, but I believe that (a TeX linefeed) is what you
want
to insert, not bare linefeeds (unless I have completely misunderstood
you).

iii) any tips for improving this script are appreciated. I’m sure I’ll
have more questions over the next couple of days as I work on this.

I had hoped for a list of desired conversions, rather than a script that
needs work. Most people are reluctant to dig into someone else’s code,
such
an approach normally takes much longer than starting over.

I was able to format your XML this time, because the example was
complete,
and having taken a look at it, I assume this is a OpenOffice Open
Document
format file, yes?

Postscript. Have you considered all your options? OpenOffice will save
its
documents in many formats, several of which preserve the original
formatting. For example, you could save the document as RTF, then use
the
utility “rtf2TeX” to perform the conversion to TeX.

I haven’t actually done this, but I can see your effort level and I
thought
I would alert you to some other options.

I would also have mentioned saving as HTML and using html2tex, but I
doubt
you would be pleased with the outcome (no paging or footnotes AFAIK).

Post-postscript. The TeX output is a requirement, yes? There are many
excellent output formats that are in wider use today.