Iteration through File.file? misses entries for which File.file?(entry) == true

Kyle_Barbour · August 2, 2010, 2:36pm

Hello everyone,

I’m new to Ruby, so I’m most likely making an elementary mistake.
However, searching Google didn’t help me find my answer, and after
mucking around with irb I still haven’t figured it out.

I wrote a method that was intended to take all of the files in a given
directory and put them into an array. Here’s the code I wrote:

def getFiles(dir)
pwdFiles = Array.new

Dir.foreach(dir) do |entry|
    pwdFiles.push(entry) if File.file?(entry) == true
end

end

This works fine when called on the working directory — that is,
getFiles(Dir.pwd). However, when I call it on a subdirectory — for
example, getFiles(“doc”) — it only returns one file. I tested to
make sure that the files being ignored in the subdirectory actually
gave “true” when tested with File.file?(entry), which they do.

Anyone know what I’m doing wrong?

Kyle

Kyle_Barbour · August 2, 2010, 3:05pm

Kyle Barbour wrote:

Hello everyone,

I’m new to Ruby, so I’m most likely making an elementary mistake.
However, searching Google didn’t help me find my answer, and after
mucking around with irb I still haven’t figured it out.

I wrote a method that was intended to take all of the files in a given
directory and put them into an array. Here’s the code I wrote:

def getFiles(dir)
pwdFiles = Array.new
Dir.foreach(dir) do |entry|
    pwdFiles.push(entry) if File.file?(entry) == true
end
end

This works fine when called on the working directory — that is,
getFiles(Dir.pwd). However, when I call it on a subdirectory — for
example, getFiles(“doc”) — it only returns one file. I tested to
make sure that the files being ignored in the subdirectory actually
gave “true” when tested with File.file?(entry), which they do.

Anyone know what I’m doing wrong?

I’m guessing you need to add this line to the end of your method:

return pwdFiles

or just

pwdFiles

since the return value of a method is the value of the last expression
evaluated.

As an alternative, Dir[“etc/*”] builds an Array directly. So you could
have:

def get_files(dir)
Dir["#{dir}/*"].select { |x| File.file?(x) }
end

(Aside: getFiles works fine, but get_files is a more conventional Ruby
method name. Use CamelCase for constants/classes)

Kyle_Barbour · August 2, 2010, 3:06pm

Oh, and another hint: try things out in irb to find out what’s
happening.

/$ irb --simple-prompt

Dir.chdir("/")
=> 0

Dir.foreach(“etc”) { |x| puts x }
.
…
fstab
X11
<>
=> nil

Kyle_Barbour · August 2, 2010, 3:24pm

    pwdFiles.push(entry) if File.file?(entry) == true

While not an answer to your question, I thought I’d comment on the line
above – you don’t need to add the “== true”.

pwdFiles.push(entry) if File.file?(entry)

A little more:

Now, this is just me, but I also try to avoid writing negative logic…
if I wanted only the items where File.file?(entry) was false, I would
write:

pwdFiles.push(entry) unless File.file?(entry)

instead of

pwdFiles.push(entry) if not File.file?(entry)
or
pwdFiles.push(entry) if File.file?(entry) == false

Please note: this is all just personal preference – your code will work
just fine. I only mention it as you indicated you were new, and I
thought I’d share.

Kyle_Barbour · August 2, 2010, 3:27pm

James H.on wrote:

The symbolic constant for truth is True, not true.

That is incorrect, as 5 seconds with irb will show you.

$ irb --simple-prompt

true
=> true

True
NameError: uninitialized constant True
from (irb):2

(You may have been thinking of TrueClass and FalseClass, perhaps?)

In either case, because File.file? returns either True or False, you can
drop the comparison:

It returns true or false, but yes the comparison can be removed.

If this is truly a method in an object, though, bear in mind scoping
issues. pwdFiles is only available inside this method definition. When
you declare it in your intialize statement, prepend with a
scope-changing symbol. The most common in this case is @

def initialize
#will contain all file entries in the directory
@pwdFiles = []
end

@ is not a “scope changing symbol”. @pwdFiles is an instance variable of
the object, whereas pwdFiles is a local variable. They are completely
different concepts.

Since the OP wanted to return the value from the getFiles method, and
not set any instance variables in the object, then a local variable is
appropriate here.

Kyle_Barbour · August 2, 2010, 3:21pm

def getFiles(dir)
pwdFiles = Array.new

Dir.foreach(dir) do |entry|
pwdFiles.push(entry) if File.file?(entry) == true
end
end

The symbolic constant for truth is True, not true.

In either case, because File.file? returns either True or False, you can
drop the comparison:

def getFiles(dir)
pwdFiles = Array.new

Dir.foreach(dir) do |entry|
pwdFiles.push(entry) if File.file?(entry)
end
end

If this is truly a method in an object, though, bear in mind scoping
issues. pwdFiles is only available inside this method definition. When
you declare it in your intialize statement, prepend with a
scope-changing symbol. The most common in this case is @

def initialize
#will contain all file entries in the directory
@pwdFiles = []
end

def get_files(dir)
@pwdFiles = Array.new

Dir.foreach(dir) do |entry|
    @pwdFiles.push(entry) if File.file?(entry) == true
end

end

And pay attention to the advice about camelCase versus snake_case for
method names: because Ruby’s duck typed, it’s helped me out sometimes to
be able to glance at my code and tell what is a variable and what isn’t
very quickly.

Best

James

Kyle_Barbour · August 2, 2010, 3:54pm

If you’re going to write an email and be wrong, be wrong on every point,
that’s what I was always taught and I’m sticking by it goddamit!

On Aug 2, 2010, at 7:28 AM 8/2/10, Brian C. wrote:

from (irb):2

(You may have been thinking of TrueClass and FalseClass, perhaps?)

You’re 100% correct. Total brainfart.

In either case, because File.file? returns either True or False, you can
drop the comparison:

It returns true or false, but yes the comparison can be removed.

@ is not a “scope changing symbol”. @pwdFiles is an instance variable of
the object, whereas pwdFiles is a local variable. They are completely
different concepts.

Since the OP wanted to return the value from the getFiles method, and
not set any instance variables in the object, then a local variable is
appropriate here.

In future, I’ll have to stop myself whenever the thought “that mightn’t
be the right word” crops through my head. I figured it’d be close enough
to get the idea across, but given that a symbol is actually a language
construct I should’ve taken a few more seconds out.

In either case, you’ve hit something that interests me. Conceptually,
I’d understood that the difference between an @-prepended variable and a
non-@-prepended variable is that the former is global to the object, so
it’s not impacted by the change in scope introduced by defining a method
inside the object. So I figured that describing the @ character as
changing the internal scope of a variable from local to global inside
the object wouldn’t be entirely missing the mark. But it’s possible I’m
missing something.

Thanks for the clarifications and corrections, dude.

Best

James

Kyle_Barbour · August 2, 2010, 4:01pm

Kyle Barbour wrote:

def getFiles(dir)
pwdFiles = Array.new
Dir.foreach(dir) do |entry|
    pwdFiles.push(entry) if File.file?(entry) == true
end
end

That can’t work since Dir.foreach yields only the filenames of the files
in the directory to the block, not the directory the files are actually
in. So suppose you have:

dir/my_file
dir/my_second_file

By calling Dir.foreach(“dir”) you’ll get “my_file” and “my_second_file”.
Your File.file? statement then checks for the presence of the filenames
in the current working directory – where they don’t reside. Solution:
Append “dir” to the filename.

def get_files(dir)
pwd_files = Array.new

Dir.foreach(dir) do |entry|
path = File.join(dir, entry)
pwd_files.push(path) if File.file?(path)
end
pwd_files
end

And please use snake_case for your method and variable names.

Marvin

Kyle_Barbour · August 2, 2010, 4:15pm

James H.on wrote:

In either case, you’ve hit something that interests me. Conceptually,
I’d understood that the difference between an @-prepended variable and a
non-@-prepended variable is that the former is global to the object, so
it’s not impacted by the change in scope introduced by defining a method
inside the object. So I figured that describing the @ character as
changing the internal scope of a variable from local to global inside
the object wouldn’t be entirely missing the mark. But it’s possible I’m
missing something.

They are different in both scope and lifetime.

Instance variables (@foo) are a property of the object instance itself,
and once set, they remain for as long as the object exists, or until
reassigned. Calls to methods of the same object will see the same value
of @foo - even multiple concurrent calls in different threads.

Local variables are created in an activation record which is created
when the method is invoked, and so the values are not tied to the object
itself. If there are multiple calls to the method from multiple threads,
they will have their own activation records, and so independent local
variables. Normally the activation record is garbage-collected when the
method returns.

Now, I’m not sure I should be complicating things here, but activation
records may have extended lifetimes too, when closures are involved.

class Foo
def initialize(n)
@n = n
end
def make_adder(y)
lambda { |x| y + x }
end
end

f = Foo.new(1) # #<Foo @n=1>

a1 = f.make_adder(10)
a2 = f.make_adder(20)
a1.call(3) # 13
a1.call(4) # 14
a2.call(5) # 25

y is a local variable in make_adder, but it persists in the lambdas,
even after make_adder has returned. The two lambdas, a1 and a2, have
different values bound to y. But there is still only a single object of
class Foo.

My apologies if this has confused matters further But hopefully it
highlights how different these things are.

Regards,

Brian.

Kyle_Barbour · August 2, 2010, 10:10pm

Thanks for all the responses! I feel so thoroughly helped

On Aug 2, 7:01 am, Marvin Gülker [email protected] wrote:

By calling Dir.foreach(“dir”) you’ll get “my_file” and “my_second_file”.
Your File.file? statement then checks for the presence of the filenames
in the current working directory – where they don’t reside. Solution:
Append “dir” to the filename.

Martin - thanks for the diagnosis, that makes perfect sense. Awesome.
That works very well and solves my problem!

On Aug 2, 6:05 am, Brian C. [email protected] wrote:

As an alternative, Dir[“etc/*”] builds an Array directly. So you could
have:

def get_files(dir)
Dir[“#{dir}/*”].select { |x| File.file?(x) }
end

Interesting, thanks! Is there any reason to prefer either this
solution or the one Martin gave in the above post?

Also: everyone, thanks for the newbie friendliness and for the general
advice (snake_case, negative logic, etc.). Thoroughly appreciated.

Kyle

Kyle_Barbour · August 2, 2010, 10:55pm

Kyle Barbour wrote:

Interesting, thanks! Is there any reason to prefer either this
solution or the one Martin gave in the above post?

I’d prefer the Dir[] variant, since it’s more compact. And if you want
to search for files via wildcards, it’s the best way to go I think.

Oh, and one last note: My name is “Marvin”, not “Martin”. Note the v.

Vale,
Marvin