Fromdos dos2unix in ruby

how can I achieve in ruby the result of running:
fromdos dos_file.txt unix_file.txt

or in vim:
set ff=unix

?

thanks,
chris

On 18 Aug., 13:38, Krzysztof C. [email protected] wrote:

how can I achieve in ruby the result of running:
fromdos dos_file.txt unix_file.txt

or in vim:
set ff=unix

?

thanks,
chris

just to add, I need to do that conversion under windows.

thanks,
chris

krzysztof cierpisz [email protected] writes:

On 18 Aug., 13:38, Krzysztof C. [email protected] wrote:

how can I achieve in ruby the result of running:
fromdos dos_file.txt unix_file.txt

or in vim:
set ff=unix

?

just to add, I need to do that conversion under windows.

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

2009/8/18 krzysztof cierpisz [email protected]:

File.open(ARGV[0]).each {|line|
Windows d is a file with 0 bytes
You are not closing the File object properly so your output might
never get flushed to disk…

Cheers

robert

Well, you would read from the input file, replace the dos/windows line
endings with unix ones and write to the output file.

I tried with following dos2unix.rb script

dos2unix.rb

out = File.open(ARGV[1],“w”)

File.open(ARGV[0]).each {|line|
out << line.gsub!(/\r$/,’’)
}

out.close
#########################################

this:
ruby dos2unix.rb u8nl_utf8_tab.dos.txt d

works fine on Linux (d with length 408 bytes) but not on Windows, on
Windows d is a file with 0 bytes

input file u8nl_utf8_tab.dos.txt looks like this:
col1,col2|~|
“first line of cell 1
second line of cell 1”,only line in 2|~|
"Czy specjalny telefon przeznaczony dla dzieci w wieku od 3 do 7 lat
podbije rynek? Jest prosty, bezpieczny i ma tylko 4 klawisze.
Sprzedawać go chce między innymi telefonia ojca Rydzyka. więcej
",“Copyright © World Group.
Реклама
Help
Сделать World стартовой”|~|
äöüб фыва,“asdf,эжх”|~|

thanks,
chris

On Aug 18, 2009, at 11:21 AM, Robert K. wrote:

out = File.open(ARGV[1],“w”)

File.open(ARGV[0]).each {|line|
out << line.gsub!(/\r$/,‘’)

You open the file with the default mode of ‘r’ here so the File class
is going to do the line-ending conversion for you. Then you use
String#gsub! which returns nil when no changes are made. You are never
going to get output this way.

You are not closing the File object properly so your output might
never get flushed to disk…

Cheers

robert


remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Try something like this:

buffer = ‘’
File.open(ARGV[1], ‘wb’) do |out| # open for writing binary
File.open(ARGV[0], ‘rb’) do |in| # open for reading binary
while in.read(1024, buffer) # read upto 1024 bytes into
buffer
out.write buffer.gsub(/\r\n/, “\n”) # change ending and write
out
end
end # end of block closes input
end # end of block closes output

-Rob

P.S. This is untested straight from my head.

Rob B. http://agileconsultingllc.com
[email protected]

#########################################
Cheers

robert

can you let me know how to close it properly?

thanks,
chris

end # end of block closes input
end # end of block closes output

-Rob

thanks Rob,

I just added binary mode to what I had, and now it’s working under
Windows as well.
I am always forgetting about “b” mode under windows.

thanks
chris

2009/8/18 Robert D. [email protected]:

If performance can be an issue we could use File#each with 10.chr as a seperator

  in.each 10.chr do | line |
    out.print line.sub( /\r\n\z/, 10.chr )
  end

Just for the record… in Ruby “\n” == 10.chr in all platforms. I find
“\n” to be more obvious.

On Tue, Aug 18, 2009 at 7:19 PM, Xavier N.[email protected] wrote:

I wanted to point out the subtle bug because I thought it useful. But
I hate backslashes and use 10.chr often, this however is not good
practice, because it is unconventional, it is just me ;).
In the infinitesimal hope that 10.chr is useful for some folks anyway.
Cheers
Robert

On Tue, Aug 18, 2009 at 5:50 PM, Rob
Biedenharn[email protected] wrote:

Try something like this:

buffer = ‘’
File.open(ARGV[1], ‘wb’) do |out| # open for writing binary
File.open(ARGV[0], ‘rb’) do |in| # open for reading binary
while in.read(1024, buffer) # read upto 1024 bytes into buffer
out.write buffer.gsub(/\r\n/, “\n”) # change ending and write out
fancy little bug here Rob, do you spot it?

What if \r is the 1024th char?
This will happen, one day :wink:
Unless the file is hugh I would try
File.open…
File.open …
out.print in.read.gsub( /\r\n/, /\n/ )
end
end

If performance can be an issue we could use File#each with 10.chr as a
seperator

 in.each 10.chr do | line |
     out.print line.sub( /\r\n\z/, 10.chr )
 end

HTH
Robert

On 18.08.2009 21:46, Robert D. wrote:

I wanted to point out the subtle bug because I thought it useful. But
I hate backslashes and use 10.chr often, this however is not good
practice, because it is unconventional, it is just me ;).
In the infinitesimal hope that 10.chr is useful for some folks anyway.

I would let Ruby do the line detection to avoid the issue Robert pointed
out. For the record, this is what I’d probably be doing:

WIN_LE = “\r\n”.freeze

File.open ARGV[0] do |in|
File.open ARGV[1], “wb” do |out|
in.each do |line|
line.chomp!
out.print line, WIN_LE
# or:
# out.write(line)
# out.write(WIN_LE)
end
end
end

In this particular case I would not use File.foreach because then “out”
is created even if “in” isn’t there.

Kind regards

robert

On Tue, Aug 18, 2009 at 10:39 PM, Xavier N.[email protected] wrote:

spurious \015 that may come up.
Yup, I thought my code solved the issue, tell Ruby that a line ends
with “\n” ( that was tough to type :wink: in each and replace a potential
“\r” before?
But maybe this does not work on binary files under Windows, no way to
test, sorry.

Cheers
Robert

On Tue, Aug 18, 2009 at 11:39 PM, Robert D.[email protected]
wrote:

Yup, I thought my code solved the issue, tell Ruby that a line ends
with “\n” ( that was tough to type :wink: in each and replace a potential
“\r” before?
But maybe this does not work on binary files under Windows, no way to
test, sorry.

The idea is good, but this topic is brittle (though easy when you get
the facts straight).

Problem is on CRLF platforms the I/O system filters out the CR of any
pair CRLF before the string arrives to Ruby land. That is, if you work
in text-mode. In fact that is the definition of text-mode, that the
conversion is on.

When you write in text mode in a CRLF platform, the I/O system
monitors the stream of bytes, and inserts a CR every time he sees an
LF. Unconditionally.

On Unix these conversions do not happen, text-mode and binary-mode are
the same, and Unix uses LF on disk to mean a newline.

And the point is those conversions happen in text-mode no matter
which is the input record separator
, so in those solution the file
opened for reading should be opened in binary mode anyway. If you
don’t do this, a file that has on disk

\r\r\n

will go up as \r\n on Windows, and that gsubed to \n, so you’ve lost a
\r that didn’t belong to the newline.

In a portable script you have to work in binary mode, and in a
Windows-only script it is enough to read in text-mode and write
verbatim in binary-mode.

On Tue, Aug 18, 2009 at 10:15 PM, Robert
Klemme[email protected] wrote:

WIN_LE = “\r\n”.freeze

File.open ARGV[0] do |in|
 File.open ARGV[1], “wb” do |out|
  in.each do |line|
   line.chomp!
   out.print line, WIN_LE

Hey but this is dos2unix :-).

You can’t read in text-mode just like that in a portable way, because
chomp! only chomps “\n”.

If you can assume the program is gonna run only on Windows then the
solution is trivial: read in text-mode, and write in binary mode. No
chomping or gsubs needed, just read and write.

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

2009/8/18 Xavier N. [email protected]:

Hey but this is dos2unix :-).

Ooops, make that then

LE = “\n”.freeze

and of course

out.print line, LE

You can’t read in text-mode just like that in a portable way, because
chomp! only chomps “\n”.

No.

$ allruby -e ‘p “a\r\n”.chomp’
CYGWIN_NT-5.1 padrklemme1 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin

ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
“a”

ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
“a”

If you can assume the program is gonna run only on Windows then the
solution is trivial: read in text-mode, and write in binary mode. No
chomping or gsubs needed, just read and write.

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

String#chomp does that nicely.

Kind regards

robert

On Wed, Aug 19, 2009 at 9:14 AM, Robert
Klemme[email protected] wrote:

If the program has to be portable then you need to deal with the
spurious \015 that may come up.

String#chomp does that nicely.

Oh you are right. I thought chomp chomped the input record separator,
but I see in the Pickaxe that’s unless $/ has been untouched.

On Wed, Aug 19, 2009 at 12:38 AM, Xavier N.[email protected] wrote:

the same, and Unix uses LF on disk to mean a newline.

And the point is those conversions happen in text-mode no matter
which is the input record separator
, so in those solution the file
opened for reading should be opened in binary mode anyway. If you
don’t do this, a file that has on disk
But I did open it in binary mode, did I not?
Anyway, if I had a typo in my snippet, thanx for the correction.

The only issue I can see is the following

Newline = “\n” || 10.chr || “\012” || “;-)”

file.open( “…”, “rb”){ | f |
f.each( Newline ) { …
####### ^
####### Does this work on Windows?

Cheers
Robert