Ruby code style / Regular Expression to remove blank lines / test case

Hi Group,

Forgive me to put 3 seemingly irrelevant words in the subject line. But
they are the purposes of this post.

This is basically my first Ruby class, which is to remove all empty
lines and blank times from a txt file. a simple test case is also
included.

I tested the code and it works fine. But I did not have much idea about
if the code style is good, or if there is some potential issues. Can
someone pls do a quick review and kindly provide some feedback?

I want to know -

  1. Is there any potential issue in the code? did I miss some boundary
    check?

  2. Is the code style good in general?
    e.g, name convention, indent, etc.

  3. what can i do if i want to make it more professional?
    e.g., better choice of functions, error-handling, object-oriented,
    test-driven, etc.

I am open to any feedback - as far as they help make better Ruby
programing.

==================================================================

a copy is included as below

require ‘test/unit’

this class is to remove all the empty and blank lines.

Usage: ruby Blanker.rb input.txt output.txt

class BlankLineRemover
def initialize()
puts “BlankLineRemover initialized.”
end

def generate(input_filename, output_filename)

reader = File.read(input_filename)

f = File.new(output_filename, "w")

f.puts remove(reader)

f.close

end

def remove(stringToremove)

regEx = /^[\s]*$\n/

stringToReturn = stringToremove.gsub(regEx, '')
stringToReturn.strip!

if stringToReturn.length >= 1
  return stringToReturn
else
  return ""
end

end

end

class TestCleaner < Test::Unit::TestCase
def test_basic

sInput1 = "line1\n\t\nline4\n\tline5\n"
sExpectedOutput1 = "line1\nline4\n\tline5"

sInput2=""
sExpectedOutput2 = ""

sInput3="\n\t\t\n"
sExpectedOutput3 = ""

testCleaner = BlankLineRemover.new
assert_equal(sExpectedOutput1, testCleaner.remove(sInput1))
assert_equal(sExpectedOutput2, testCleaner.remove(sInput2))
assert_equal(sExpectedOutput3, testCleaner.remove(sInput3))

end
end

unless ARGV.length == 2
puts “Usage: ruby Blanker.rb input.txt output.txt\n”
exit
end

simpleRemover = BlankLineRemover.new

simpleRemover.generate(ARGV[0],ARGV[1])

==================================================================

On Thu, Sep 22, 2011 at 5:45 AM, Jim W. [email protected] wrote:

Hi Group,

Forgive me to put 3 seemingly irrelevant words in the subject line. But
they are the purposes of this post.

This is basically my first Ruby class, which is to remove all empty
lines and blank times from a txt file. a simple test case is also
included.

Welcome to the wonderful world of Ruby!

e.g, name convention, indent, etc.
I usually indent with two spaces per level but I guess that’s a matter
of taste.

require ‘test/unit’

this class is to remove all the empty and blank lines.

Usage: ruby Blanker.rb input.txt output.txt

class BlankLineRemover
def initialize()
puts “BlankLineRemover initialized.”

I wouldn’t put an unconditional output to stdout here since users of
the class might not be interested in that information.

end

def generate(input_filename, output_filename)

reader = File.read(input_filename)

You are reading the whole file into memory which can lead to dramatic
failure if the file is large. However, it’s not needed since you can
easily pull this off by reading one line at a time.

f = File.new(output_filename, “w”)

f.puts remove(reader)

f.close

Here you are not using the block form of File.open which will leave an
open file descriptor in case of exceptions being thrown from method
#remove.
http://blog.rubybestpractices.com/posts/rklemme/001-Using_blocks_for_Robustness.html

end

def remove(stringToremove)

regEx = /^[\s]*$\n/

You do not need the character class. So you can do /^\s*$\n/.
However I would pick another approach. I would just read lines and
decide based on regular expression whether I write them to the output
file. (see below)

stringToReturn = stringToremove.gsub(regEx, ‘’)

It’s more efficient to use the regular expression directly in the gsub
call.

stringToReturn.strip!

Basically #strip is sufficient - you don’t need the #gsub:

irb(main):002:0> " \n".strip
=> “”
irb(main):003:0> " \t \n".strip
=> “”

For methods and local variables we usually use “string_to_return”
instead of camel case.

if stringToReturn.length >= 1
return stringToReturn
else
return “”
end

This distinction is useless and even a bit harmful: the else branch
will always create a new instance which won’t happen if you simply
return stringToReturn.

sInput2=“”
sExpectedOutput2 = “”

sInput3=“\n\t\t\n”
sExpectedOutput3 = “”

I would not define these as variables but directly use the literals in
method calls. Much more readable because the information is more
local.

exit
end

simpleRemover = BlankLineRemover.new

simpleRemover.generate(ARGV[0],ARGV[1])

==================================================================

Attachments:
http://www.ruby-forum.com/attachment/6616/Blanker.rb

=== SPOILER ALERT ===

For just removing empty lines a whole class is not needed and might be
a bit heavy. You can do it in a few lines like this

File.open ARGV[1], “w” do |out|
File.foreach ARGV[0] do |line|
line.strip!
out.puts line unless line.empty?
end
end

:slight_smile:

It’s a bit more robust to do it like this though, because then you
avoid opening (and thus creating) the output file if the input is
missing:

File.open ARGV[0] do |in|
File.open ARGV[1], “w” do |out|
in.each do |line|
line.strip!
out.puts line unless line.empty?
end
end
end

If you want to get more fancy and decouple the reading and writing
from the transformation you could define a method for the purpose of
reading and writing and leave the transformation and decision whether
to output to a block passed:

def copy_file(file_in, file_out)
File.open file_out, “w” do |out|
File.foreach file_in do |line|
line = yield(line) and out.puts line
end
end
end

copy_file ARGV[0], ARGV[1] do |line|
line.strip!
line.empty? ? nil : line
end

… or for inserting comments (i.e. prefix each line with "# ")

copy_file ARGV[0], ARGV[1] do |line|
"# " << line
end

But I’m digressing… Hope that was not too much detail. :-}

Kind regards

robert