Alternative Bash commands in Ruby

Hello.

I’m learning Ruby. And I have a question with respect to working with
files. How can I get the number of rows in a file similar to the wc -l
from Bash?

If you want to make coding simple, you could slurp the file into a Ruby
array (see IO#readlines) and take the length of the array.

For production code, this is not advisable though - unless you know from
the architectural design that the files won’t be very large -, because
you need to hold the whole file in memory. A better way would therefore
be to loop through the file (for instance using IO#foreach) and count
the rows.

A note aside: bash does NOT have a ‘wc -l’ command. wc is a program
which exists independently from bash. Therefore you could also shell out
and call wc from Ruby, i.e.

number_of_lines = %x(wc -l <#{myfile}).to_i

but creating a subprocess just for this purpose is probably not worth
the effort. Also, error handling will be more tricky. If you do it
entirely within Ruby, you get an exception, if for instance the file
does not exist.

How can I get the number of rows in a file similar to
the wc -l from Bash?

Do you mean lines? Or rows separated by \t or similar
like in a .csv file or so?

Anyway. Both is easy, several ways exist.

If the file is small, you can do:

File.readlines(‘path to file here’).size + 1

(I think it has to be +1 since ruby arrays start
at 0 and lines would start to be counted at 1)

If you mean horizontal separators, this is also easy
easy. First decide which separator you need, then you
can .split() on it; for more fancy input perhaps you
may have to use a conditional regex which also works
for .split().

You could essentially re-create all UNIX/Linux ecosystem
programs in ruby. It will be slower naturally, in particular
for larger datasets, but doable.

Robert H. wrote in post #1185276:

Do you mean lines? Or rows separated by \t or similar
like in a .csv file or so?

Anyway. Both is easy, several ways exist.

If the file is small, you can do:

File.readlines(‘path to file here’).size + 1

The OP for sure means lines, because s/he refers to the wc command; but
I have one style (i.e. readability) question regarding to your code: You
use File.readlines instead of IO.readlines. Of course, in both cases the
same method is actually invoked, but wouldn’t it be more natural to
write IO.readlines, since the method is defined in IO and not in File?

With class methods, I tend to write the name of the class, where the
method is actually defined. It makes it easier to find, if the reader of
your code wants to look it up in the Ruby core docs. In the way you
wrote it, a reader would first lookup readlines in the File class and,
not finding it there, hunt for it in the parent class.

Ronald

Robert H. wrote in post #1185276:
[…]

If the file is small, you can do:

File.readlines(‘path to file here’).size + 1

(I think it has to be +1 since ruby arrays start
at 0 and lines would start to be counted at 1)

How do “start at 0” and “lines would start … at 1” lead to +1?

edited:

Sorry, nevermind, got put off by “number of rows” (whereas wc -l prints
newline count)

Thank you for your responses!

Interesting remark.

File have 4 lines (4/4)
line 1
line 2
line 3
line 4

wc -l comman result:
$ cat testfile | wc -l
3

Ruby with and without " + 1":
$ ruby -e “puts File.readlines(‘testfile’).size + 1”
5

$ ruby -e “puts File.readlines(‘testfile’).size”
4

First, I just noticed that Robert H.'s idea of adding 1 was wrong. I
think he confused the semantics of “size” (number of elements in an
array) with the “index of the last element”. But that’s just a minor
point.

That wc reports a different value, is likely caused by the fact that the
last line of the file doesn’t end in a new line. You can easiliy
demonstrate it like this:

$ echo -n dd >tmp/x
$ od -cx tmp/x
0000000   d   d
       6464
0000002
$ wc -l tmp/x
0 tmp/x

As you can see, it reports zero lines, although the file is not empty.
One could argue that the man-page of wc is lying: wc doesn’t count the
lines of a file, it counts the newline characters found in the file. You
have to decide for your Ruby program, how you define what is a “line” in
a file.