Normalizing newlines

amoeen · September 13, 2009, 8:10pm

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What’s the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

Thanks

Alan

amoeen · September 13, 2009, 8:23pm

On Sun, Sep 13, 2009 at 8:10 PM, Alan M. [email protected] wrote:

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. Â What’s the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

The script is line oriented or has the entire text slurped?

amoeen · September 13, 2009, 8:41pm

Alan M. wrote:

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What’s the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

Look at the original strings in the examples:

----------------------------------------------------------- String#chomp
str.chomp(separator=$/) => new_str

 Returns a new +String+ with the given record separator removed from
 the end of _str_ (if present). If +$/+ has not been changed from
 the default Ruby record separator, then +chomp+ also removes
 carriage return characters (that is it will remove +\n+, +\r+, and
 +\r\n+).

    "hello".chomp            #=> "hello"
    "hello\n".chomp          #=> "hello"
    "hello\r\n".chomp        #=> "hello"
    "hello\n\r".chomp        #=> "hello\n"
    "hello\r".chomp          #=> "hello"
    "hello \n there".chomp   #=> "hello \n there"
    "hello".chomp("llo")     #=> "he"

amoeen · September 13, 2009, 9:02pm

Hi,

Am Montag, 14. Sep 2009, 03:10:12 +0900 schrieb Alan M.:

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What’s the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

str.gsub /(\r?\n)|\r/, $/

Bertram

amoeen · September 13, 2009, 9:15pm

In article
[email protected],
Xavier N. [email protected] wrote:

On Sun, Sep 13, 2009 at 8:10 PM, Alan M. [email protected] wrote:

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. Â What’s the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

The script is line oriented or has the entire text slurped?

I’m not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. I first split the lines and then use
csv methods to do the rest.

Alan

amoeen · September 13, 2009, 9:36pm

In article [email protected],
7stud – [email protected] wrote:

----------------------------------------------------------- String#chomp
“hello\r\n”.chomp #=> “hello”
“hello\n\r”.chomp #=> “hello\n”
“hello\r”.chomp #=> “hello”
“hello \n there”.chomp #=> “hello \n there”
“hello”.chomp(“llo”) #=> “he”

Thanks. I’d prefer to normalize the lines and then use .split on \r
thank chomp each line.

Alan

amoeen · September 13, 2009, 9:36pm

In article [email protected],
Bertram S. [email protected] wrote:

str.gsub /(\r?\n)|\r/, $/

Bertram

Thanks, that got me on the right track.

Alan

amoeen · September 14, 2009, 1:30am

In article
[email protected],
Xavier N. [email protected] wrote:

lines = csv.split(/[\015\012]+/)
The exact regexp depends on what you need. That one prevents blank
items in lines in particular, may do.

Yes, this will do what I want.

That said, I’d normally use a CSV library to handle the input.

Well the basic idea is to simply allow cut and paste functionality from
a spreadsheet to a LaTeX document, so I’m never dealing with huge
amounts of data generally.

Thanks for the help.

Alan

amoeen · September 14, 2009, 1:45am

On Mon, Sep 14, 2009 at 1:30 AM, Alan M. [email protected] wrote:

Well the basic idea is to simply allow cut and paste functionality from
a spreadsheet to a LaTeX document, so I’m never dealing with huge
amounts of data generally.

Problem is not the size of the data. CSV is brittle, record
separators, field separators, quoting, escaping. With a library you
may write about the same amount of code AND have a robust script for
the same price.

If the data is known and dead simple a split would do though.

amoeen · September 14, 2009, 2:55am

In article
[email protected],
Xavier N. [email protected] wrote:

If the data is known and dead simple a split would do though.

I see. I think that for what its intended use will be, the split should
work fine (and certainly in testing hasn’t failed yet.) If I begin
getting reports of failure on more complicated data I may need to
rethink things.

Thanks for you comments.

Alan

amoeen · September 13, 2009, 9:37pm

On Sun, Sep 13, 2009 at 9:15 PM, Alan M. [email protected] wrote:

I’m not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. Â I first split the lines and then use
csv methods to do the rest.

I mean, do you have the entire CSV in memory? I guess the split means
you do. In that case you could split with a regexp like this:

lines = csv.split(/[\015\012]+/)

The exact regexp depends on what you need. That one prevents blank
items in lines in particular, may do.

That said, I’d normally use a CSV library to handle the input.

amoeen · September 15, 2009, 2:58am

On Sunday 13 September 2009 07:55:05 pm Alan M. wrote:

I see. I think that for what its intended use will be, the split should
work fine (and certainly in testing hasn’t failed yet.) If I begin
getting reports of failure on more complicated data I may need to
rethink things.

I suppose so, but take a look at CSV in the standard library. It’s
really
incredibly simple to use, and it’s in the standard library, so it’s not
as if
this is another dependency.

Even if you want to spit the whole thing into RAM, you could do:

require ‘csv’
rows = []
CSV.open(‘foo.csv’){|row| rows << row}

Parsing a string you’ve got in RAM isn’t any harder.

Look at it another way, too – if there is some weird failure case (as
an
example, try putting a double quote in the CSV and see what happens),
you can
work with the standard library people to get it fixed.

Normalizing newlines

----------------------------------------------------------- String#chomp str.chomp(separator=$/) => new_str

----------------------------------------------------------- String#chomp
str.chomp(separator=$/) => new_str