Regexp working with mixed lines endings

Hey all,

i’ve an audio file (wav) containing some xml metadatas at start or
ending of the ausio datas.

my regexp works fine with unix lines endings.

however some recorder puts mixed line ending where my regexp isn’t
working.

is their a special option able to work with all kind of endings ?

my regexps :

rgxstart=Regexp.new("")
rgxstop=Regexp.new("")

the comparaison i do :

rgxstart === l.chomp

l being :

File.open().each { |l| …}

Une bévue wrote:

/ …

is their a special option able to work with all kind of endings ?

Sure. For mixed Windows and Unix/Linux line endings, just delete the
carriage returns:

data.gsub!{/\r/,"")

l being :

File.open().each { |l| …}

Try this instead:

data = File.read(filename)

data.gsub!(/\r/,"")

array = []

data.split("\n").each do |line|

process lines here

array << line
end

By using this approach, all your XML lines will be made uniform. At the
end
of the processing, you will need to reintegrate the lines into a block
for
storage:

data = array.join("\n")

file.open(filename,“w”) { |f| f.write data }

Paul L. [email protected] wrote:

Sure. For mixed Windows and Unix/Linux line endings, just delete the
carriage returns:

data.gsub!{/\r/,“”)

> array < end > > By using this approach, all your XML lines will be made uniform. At the end > of the processing, you will need to reintegrate the lines into a block for > storage: > > data = array.join("\n") > > file.open(filename,"w") { |f| f.write data }

OK fine thanks very much it’s a nice solution somehow “normalizing” win*
line endings :wink:

In fact i’ve a little bit modified what u’ve wroten :
data.gsub!(/\r\n/,“\n”)
data.gsub!(/\r/,“\n”)

because i’ve discovered in the mean time i could have :
\r
\n
\r\n

lines endings )))

does \n\r exists ? (wikipedia says NO)

also because the most part of the audio input file is “binary” datas
there line ending is out of meaning, i suppose.

anyway, thanks a lot i’m now “armed” to face any situation :wink:

right now with the two first examples files i get doing my wav2xml and
reading the xml file gave me syntax colored results (within two
different text editors), then i think it is a proof the prob is cured !

Une bévue wrote:

In fact i’ve a little bit modified what u’ve wroten :
data.gsub!(/\r\n/,"\n")
data.gsub!(/\r/,"\n")

What’s the point? You have the following possibilities:

\r\n

\n\r

\n

All of these cases are handled by my posted method.

because i’ve discovered in the mean time i could have :
\r
\n
\r\n

lines endings )))

Okay, the first ("\r") might be old-style Macintosh line endings. Here
is a
solution for all the possibilities:

data.gsub!(%r{(\r\n|\n\r|\r)},"\n")

does \n\r exists ? (wikipedia says NO)

Doesn’t matter. Someone might type it in manually. If it exists, the
above
method will handle it.

also because the most part of the audio input file is “binary” datas
there line ending is out of meaning, i suppose.

What? You are reading binary files? Then don’t try to filter line
endings.

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

Une bévue wrote:

endings.

BUT I DON’T have the choice the audio files i get does have metadatas
writen in xml mixed with binary audio datas. The line endings are
“correct” within the xml. I have to face with the output given by
various recorders.

If you read a file that is part text and part binary, DO NOT filter line
endings. Instead, write your parsing code to accommodate different line
endings on the fly. One way to do this is to read a specific block size
from the file (by detecting a delimiter that separates the text from the
binary parts), work on that block, then reattach the block to the file.

anyway thanks a lot helping me for that line endings :wink:

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

then don’t work…

Treat the text part differently than the binary part. Read the entire
file,
split it up based on some kind of delimiters, edit the text part,
recombine
the separated parts, save the file.

BTW, how is the binary data mixed with the text data? Is this an XML
file
that uses the CDATA blocking convention? That scheme is quite
manageable.

Paul L. [email protected] wrote:

method will handle it.
OK, thanks, i’ll try that asap.

also because the most part of the audio input file is “binary” datas
there line ending is out of meaning, i suppose.

What? You are reading binary files? Then don’t try to filter line endings.

BUT I DON’T have the choice the audio files i get does have metadatas
writen in xml mixed with binary audio datas. The line endings are
“correct” within the xml. I have to face with the output given by
various recorders.

i’ve uploaded in http://thoraval.yvon.free.fr/Audio

a *** truncated *** version of one of the file i’m getting the xml part,
this file is named “bidule-truncated.wav” don’t play it as an audio file
because i’ve writen :

[audio part truncated]

in the middle of the audio part to make it lighter (4k instead of MBs).

anyway thanks a lot helping me for that line endings :wink:

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

then don’t work…