Handling of arrays

owc · May 12, 2008, 11:00am

A snippet of my code are as follows:
$selections = ["",""]
$file_exception = [“RiskViewer*”,"*.xls"]

$source = [“C:/Test”, “C:/Test”]

$dest = [“U:/Test”,“U:/Test”]

sd_a=$source.zip($dest,$selections,$file_exception)

sd_a.each do |sd|
$source, $destination, $selections, $file_exception = sd
src = File.join $source, $selections
puts src
d= $d1
dst= File.join $destination, d
test = File.join $source, $file_exception
src1 = Dir.glob(src) - Dir.glob(test)

   Dir.glob(src1) do |file|
   FileUtils.mv file, dst

I’m developing a script to move files to the dest paths. But however, i
only can put one exception for each file path, but sometimes on some
scenarios I’ll need to to have 2 exceptions in one file path, hence the
above code.
But the problem with the code is that it executes it twice and by doin
that moves everything already except for the riskviewer files. And on
the 2nd time it runs, all the files have been moved already.

So is there any way whereby the script will check for 2 or more
exceptions before executing the move command, presume that we still use
arrays? Thanks in advance for any help rendered!

owc · May 12, 2008, 2:32pm

On Mon, May 12, 2008 at 11:00 AM, Clement Ow
[email protected] wrote:

sd_a.each do |sd|
$source, $destination, $selections, $file_exception = sd
src = File.join $source, $selections
puts src
d= $d1
dst= File.join $destination, d
test = File.join $source, $file_exception

Untested but change this:

 src1 = Dir.glob(src) - Dir.glob(test)

to:

src1 = $file_exception.inject(Dir.glob(src)) {|result, ex| result -
Dir.glob(ex)}

Also, doing a Dir.glob for each exception can be a lot, why not use
regular expressions
to remove from the result of the first glob? You will have to change
the exceptions
a little bit, but it might be worth it:

[“.txt”, “.sql”].inject(Dir.glob(“/home/jesus/*”)) {|result, ex|
result.reject{|x| x =~ Regexp.new(ex)}}

This removes from my home folder all files that match “.txt” and “.sql”

Hope this helps,

Jesus.

owc · May 13, 2008, 11:42am

On Tue, May 13, 2008 at 5:22 AM, Clement Ow
[email protected] wrote:

 dst= File.join $destination, d
file exception?
If what you want is to associate different information to each source
path,
I would look into a hash of hashes or a hash of arrays, or a hash of
structs,
where you could have a complex object for the info related to an entity
of
your program. For example:

A hash of arrays, if you only need exceptions:

exceptions_by_path = {}
exceptions_by_path[“/home/jesus”] = [“.txt”, “.sql”]
exceptions_by_path[“/home/jesus/applications”] = [“.sh”, “.bin”]

[…]

and then

paths.each do |path|
files = exceptions_by_path.inject(Dir.glob(“#{path}/*”)) {|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end

If for a path you need several different things, I would go with a
Struct or a custom class:

FileInfo = Struct.new :exceptions, :other_value, :yet_another
paths_info = {}
paths_info[“/home/jesus”] = FileInfo.new([“.txt”, “.sql”], “other
value”, “another”)

and then access the exceptions array as
paths_info[“/home/jesus”].exceptions
and use it as before.

Hope this helps,

Jesus.

owc · May 13, 2008, 5:22am

JesÃºs Gabriel y GalÃ¡n wrote:

On Mon, May 12, 2008 at 11:00 AM, Clement Ow
[email protected] wrote:

sd_a.each do |sd|
$source, $destination, $selections, $file_exception = sd
src = File.join $source, $selections
puts src
d= $d1
dst= File.join $destination, d
test = File.join $source, $file_exception

Untested but change this:
 src1 = Dir.glob(src) - Dir.glob(test)
to:

src1 = $file_exception.inject(Dir.glob(src)) {|result, ex| result -
Dir.glob(ex)}

Also, doing a Dir.glob for each exception can be a lot, why not use
regular expressions
to remove from the result of the first glob? You will have to change
the exceptions
a little bit, but it might be worth it:

[“.txt”, “.sql”].inject(Dir.glob(“/home/jesus/*”)) {|result, ex|
result.reject{|x| x =~ Regexp.new(ex)}}

This removes from my home folder all files that match “.txt” and “.sql”

Hope this helps,

Jesus.

Hi Jesus,
nice slick code there using regexp just wondering if there would be a
possibility of having a different set of file exceptions for different
file paths. cause at the moment, there can only be one specific file
exceptions for every file path, which might be hard in the event where
we would need to cater to different source paths. Prolly a few arrays in
file exception?

owc · May 14, 2008, 6:38am

JesÃºs Gabriel y GalÃ¡n wrote:

Hope this helps,

Jesus.

thanks Jesus, that was really helpful! However I altered the code alil
cause ruby rendered an error to me saying that it cant convert Array
into string (hmmm, dunno if that’s normal) for this line(suppose it’s
for the arrays in the hash, exceptions_by_path):

files = exceptions_by_path.inject(Dir.glob("#{path}/*")) {|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}

So i decided to name my arays in the hash, file_exception[0],
file_exception[1] … and used this instead with an incremental value:

i = 0
src1 = $file_exception[i].inject(Dir.glob(src)) {|result, >>ex|result.reject{|x| x =~ Regexp.new(ex)}}
i = 1 + i

But i had the problem of accidentally keyin in the a wrong path name and
nothing was shown for the 2nd source path.(Only after troubleshooting
for 1 whole hr did i realise) So is there any way where by we can have a
condition something like, if result == nil puts “wrong pathname”(just an
idea, cos i tried and it doesnt work)

And btw dont get me wrong, Struct is sweet too, just that i didnt want
to have a fixed number of exceptions for each path.

Thanks again.

owc · May 14, 2008, 3:11pm

On Wed, May 14, 2008 at 6:38 AM, Clement Ow
[email protected] wrote:

files = exceptions_by_path.inject(Dir.glob(“#{path}/*”)) {|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}

Sorry, that’s what I get for not testing the code. I think (I’m a bit
dense right
now, so this might not work) that what I meant was this:

exceptions_by_path = {}
exceptions_by_path[“/home/jesus”] = [“.txt”, “.sql”]
exceptions_by_path[“/home/jesus/applications”] = [“.sh”, “.bin”]

[…]

and then

paths.each do |path|
files = exceptions_by_path[path].inject(Dir.glob(“#{path}/*”))
{|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end

so you get the exceptions for that path, which is an Array that should
work with
inject as I was expecting.

And btw dont get me wrong, Struct is sweet too, just that i didnt want
to have a fixed number of exceptions for each path.

FileInfo = Struct.new :exceptions, :other_value, :yet_another
paths_info = {}
paths_info[“/home/jesus”] = FileInfo.new([“.txt”, “.sql”], “other
value”, “another”)

The number of exceptions is not fixed, as you can see above,
I am storing an array in the :exceptions field, so each path can
have a different number of exception patterns.

Jesus.

owc · May 14, 2008, 3:13pm

On Wed, May 14, 2008 at 9:46 AM, Clement Ow
[email protected] wrote:

Also, I have problems making some file exceptions. For example, i ahve
some files that start with 2007 which goes something like, 20070131 and
a file which is called “Risk 20070131” but when i put 2007 in the
file_exception variable, it deselects any file that has 2007 in the
filename, which not what i want. I know that the regexp doesnt allow any
like 2007* to be put in the file_exception to deselect any file which
starts with 2007. Any ideas at all, anyone?

I’m not sure if I’m understanding you correctly, but you can tweak the
regexps so that they actually match what you want. If you want
to deselect only the files that start with 2007 you can do this:

irb(main):001:0> a = %w{20071212 20073445 risk20072341}
=> [“20071212”, “20073445”, “risk20072341”]
irb(main):003:0> a.reject {|x| x =~ /\A2007/}
=> [“risk20072341”]

Jesus.

owc · May 14, 2008, 9:46am

Also, I have problems making some file exceptions. For example, i ahve
some files that start with 2007 which goes something like, 20070131 and
a file which is called “Risk 20070131” but when i put 2007 in the
file_exception variable, it deselects any file that has 2007 in the
filename, which not what i want. I know that the regexp doesnt allow any
like 2007* to be put in the file_exception to deselect any file which
starts with 2007. Any ideas at all, anyone?

Regards

owc · May 15, 2008, 3:47am

Sorry, that’s what I get for not testing the code. I think (I’m a bit
dense right
now, so this might not work) that what I meant was this:

Nah, it’s fine, we all know it can get a lil tiring sometimes. As it is
with this script im writing lol.

files = exceptions_by_path[path].inject(Dir.glob("#{path}/*"))
{|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end

so you get the exceptions for that path, which is an Array that should
work with
inject as I was expecting.

Anw, I get what you mean, but because Im using a config file to hold all
the source paths, dest paths and the file exceptions, which is keyed in
by the user, I dont wanna make it too complicated to fill in the paths,
so i decided to just use numbers to name the arrays in the hash.

I’m not sure if I’m understanding you correctly, but you can tweak the
regexps so that they actually match what you want. If you want
to deselect only the files that start with 2007 you can do this:

irb(main):001:0> a = %w{20071212 20073445 risk20072341}
=> [“20071212”, “20073445”, “risk20072341”]
irb(main):003:0> a.reject {|x| x =~ /\A2007/}
=> [“risk20072341”]

Oh yea, i did try this but it doesnt work, somehow scratches head It
just shows all files to be moved, and it obviously did not carry out the
exceptions. But if it’s done this way, there wont be any point in keying
in the various different exceptions alr, hence i got stuck.

owc · May 15, 2008, 5:05am

I’m not sure if I’m understanding you correctly, but you can tweak the
regexps so that they actually match what you want. If you want
to deselect only the files that start with 2007 you can do this:

irb(main):001:0> a = %w{20071212 20073445 risk20072341}
=> [“20071212”, “20073445”, “risk20072341”]
irb(main):003:0> a.reject {|x| x =~ /\A2007/}
=> [“risk20072341”]

Oh yea, i did try this but it doesnt work, somehow scratches head It
just shows all files to be moved, and it obviously did not carry out the
exceptions. But if it’s done this way, there wont be any point in keying
in the various different exceptions alr, hence i got stuck.

Oh i realise what was wrong in the matching of this regexp, because in:

files = exceptions_by_path[path].inject(Dir.glob("#{path}/*"))
{|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end
After much playing around with this statement,result here is the whole
path name, which apparently doesnt match the filename itself, hence
making /\A2007/ not able to work. (prolly need to use
File.basename(result)) But can you do me a favour by explaining what
this statement means as I dunno how come some variables assigned to
certain commands etc.? Thanks!

owc · May 15, 2008, 9:45am

Clement Ow wrote:

Oh yea, i did try this but it doesnt work, somehow scratches head It
just shows all files to be moved, and it obviously did not carry out the
exceptions. But if it’s done this way, there wont be any point in keying
in the various different exceptions alr, hence i got stuck.

Oh i realise what was wrong in the matching of this regexp, because in:

files = exceptions_by_path[path].inject(Dir.glob("#{path}/*"))
{|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end
I found what is actually goin on in this statement. because x is the
whole path name eg. //sins1234/home/file_name the regexp, /\A2007/
doesnt match the beginning of the string. And also the exceptions array
must have “\A2007” when passing it into Regexp.new. So my code now
looks like this:

src1 = $file_exception[i].inject(Dir.glob(src)) {|result, ex|result.reject
{|x| File.basename(x) =~ Regexp.new(ex, Regexp::IGNORECASE)}}
Dir.glob(src1).each do |file|
#do sth
end
After much playing around with this statement,result here is the whole
path name, which apparently doesnt match the filename itself, hence
making /\A2007/ not able to work. (prolly need to use
File.basename(result)) But can you do me a favour by explaining what
this statement means as I dunno how come some variables assigned to
certain commands etc.? Thanks!

However, for education sake, do u mind explaining how the whole inject
statement works? thanks!

owc · May 16, 2008, 8:50am

On Thu, May 15, 2008 at 9:45 AM, Clement Ow
[email protected] wrote:

Clement Ow wrote:

However, for education sake, do u mind explaining how the whole inject
statement works? thanks!

Enumerable#inject is a very powerful iterator (in my opinion at
least). What it does is
iterate over all elements in an enumerable, yielding to the block and
accumulator
and the next element in the enumerable. The accumulator then gets
updated by
the result of the block, so the next iteration will be yielded that
value. If you specify
a parameter to inject, that will be the first accumulator. If not, the
first element
of the enumerable is used instead. Some examples:

irb(main):003:0> [1,2,3].inject(0) {|total,x| p [total, x]; total + x}
[0, 1]
[1, 2]
[3, 3]
=> 6
irb(main):004:0> [1,2,3].inject {|total,x| p [total, x]; total + x}
[1, 2]
[3, 3]
=> 6

Another one (although this is just to show how inject works, cause
the functionality would be better achieved by map):

irb(main):011:0> [1,2,3,4,5].inject([]) {|total,x| p [total,x]; total +
[x**2]}
[[], 1]
[[1], 2]
[[1, 4], 3]
[[1, 4, 9], 4]
[[1, 4, 9, 16], 5]
=> [1, 4, 9, 16, 25]

The p [total,x]; helps in showing what gets passed to the block each
time.
Just remember: the result of the block will be the next “total”.

In our case, the result of the block was the original array minus the
files
that matched the exceptions. So each time that array was injected (well,
a copy) along with the next exception, and the result of the block would
be another array with less elements, etc.

Hope this helps,

Jesus.

owc · May 16, 2008, 8:54am

On Fri, May 16, 2008 at 8:49 AM, Jesús Gabriel y
Galán[email protected] wrote:

The accumulator then gets updated by
the result of the block, so the next iteration will be yielded that
value.

I have realized that this sentence can be confusing: the accumulator
doesn’t
get updated. The next value for the accumulator will be the result of
the block,
not necesarily the same object.

I have read many times that you shouldn’t use the same accumulator by
applying destructive methods to it, but I can’t remember what the pros
and
cons were. So this should not be done:

irb(main):012:0> [1,2,3].inject([]) {|total,x| total << x**2}
=> [1, 4, 9]

Instead you should do this:

irb(main):013:0> [1,2,3].inject([]) {|total,x| total + [x**2]}
=> [1, 4, 9]

Maybe someone can chime in and explain this a little bit better?

Jesus.

owc · May 16, 2008, 9:19am

2008/5/16 Jesús Gabriel y Galán [email protected]:

On Fri, May 16, 2008 at 8:49 AM, Jesús Gabriel y Galán
[email protected] wrote:

The accumulator then gets updated by
the result of the block, so the next iteration will be yielded that
value.

I have realized that this sentence can be confusing: the accumulator doesn’t
get updated. The next value for the accumulator will be the result of the block,
not necesarily the same object.

Correct.

I have read many times that you shouldn’t use the same accumulator by
applying destructive methods to it, but I can’t remember what the pros and
cons were.

Do you remember where you read that?

Maybe someone can chime in and explain this a little bit better?
Sorry, but this is nonsense. It’s completely safe and even reasonable
to reuse an accumulator value. Your second solution creates new
Arrays all the time and then throws them away. It is much more
efficient to use Array#<< as in your first example.

If, of course the original accumulator value must not be changed
because side effects will do harm, then of course you cannot modify it
but need to create new objects. But in the scenario above, where the
Array is solely created for #inject it is the most reasonable thing to
directly append.

Kind regards

robert

owc · May 16, 2008, 9:41am

On Fri, May 16, 2008 at 9:18 AM, Robert K.
[email protected] wrote:

2008/5/16 Jesús Gabriel y Galán [email protected]:

I have read many times that you shouldn’t use the same accumulator by
applying destructive methods to it, but I can’t remember what the pros and
cons were.

Do you remember where you read that?

No, I probably misunderstood something.

Maybe someone can chime in and explain this a little bit better?

Sorry, but this is nonsense. It’s completely safe and even reasonable
to reuse an accumulator value. Your second solution creates new
Arrays all the time and then throws them away. It is much more
efficient to use Array#<< as in your first example.

Yep, I saw that and that’s why I refused to even try to explain it

If, of course the original accumulator value must not be changed
because side effects will do harm, then of course you cannot modify it
but need to create new objects.

This might be what I had in mind.

But in the scenario above, where the
Array is solely created for #inject it is the most reasonable thing to
directly append.

It’s clear that the example injecting a newly created array makes the
above explanation even worse :-).

Thanks !

Jesus.