Ruby Forum Ruby > Handling of arrays

Posted by Clement Ow (owc)
on 12.05.2008 11:00
A snippet of my code are as follows:
  $selections = ["*","*"]
  $file_exception = ["RiskViewer*","*.xls"]

  $source = ["C:/Test", "C:/Test"]

  $dest = ["U:/Test","U:/Test"]


  sd_a=$source.zip($dest,$selections,$file_exception)

  sd_a.each do |sd|
  $source, $destination, $selections, $file_exception = sd
     src = File.join $source, $selections
     puts src
     d= $d1
     dst= File.join $destination, d
     test = File.join $source, $file_exception
     src1 = Dir.glob(src) - Dir.glob(test)

       Dir.glob(src1) do |file|
       FileUtils.mv file, dst

I'm developing a script to move files to the dest paths. But however, i
only can put one exception for each file path, but sometimes on some
scenarios I'll need to to have 2 exceptions in one file path, hence the
above code.
But the problem with the code is that it executes it twice and by doin
that moves everything already except for the riskviewer files. And on
the 2nd time it runs, all the files have been moved already.

So is there any way whereby the script will check for 2 or more
exceptions before executing the move command, presume that we still use
arrays? Thanks in advance for any help rendered!
Posted by Jesús Gabriel y Galán (Guest)
on 12.05.2008 14:32
(Received via mailing list)
On Mon, May 12, 2008 at 11:00 AM, Clement Ow
<clement.ow@asia.bnpparibas.com> wrote:
>
>   sd_a.each do |sd|
>   $source, $destination, $selections, $file_exception = sd
>      src = File.join $source, $selections
>      puts src
>      d= $d1
>      dst= File.join $destination, d
>      test = File.join $source, $file_exception

Untested but change this:

>      src1 = Dir.glob(src) - Dir.glob(test)

to:

src1 = $file_exception.inject(Dir.glob(src)) {|result, ex| result -
Dir.glob(ex)}

Also, doing a Dir.glob for each exception can be a lot, why not use
regular expressions
to remove from the result of the first glob? You will have to change
the exceptions
a little bit, but it might be worth it:

["\.txt", "\.sql"].inject(Dir.glob("/home/jesus/*")) {|result, ex|
result.reject{|x| x =~ Regexp.new(ex)}}

This removes from my home folder all files that match ".txt" and ".sql"

Hope this helps,

Jesus.
Posted by Clement Ow (owc)
on 13.05.2008 05:22
Jesús Gabriel y Galán wrote:
> On Mon, May 12, 2008 at 11:00 AM, Clement Ow
> <clement.ow@asia.bnpparibas.com> wrote:
>>
>>   sd_a.each do |sd|
>>   $source, $destination, $selections, $file_exception = sd
>>      src = File.join $source, $selections
>>      puts src
>>      d= $d1
>>      dst= File.join $destination, d
>>      test = File.join $source, $file_exception
> 
> Untested but change this:
> 
>>      src1 = Dir.glob(src) - Dir.glob(test)
> 
> to:
> 
> src1 = $file_exception.inject(Dir.glob(src)) {|result, ex| result -
> Dir.glob(ex)}
> 
> Also, doing a Dir.glob for each exception can be a lot, why not use
> regular expressions
> to remove from the result of the first glob? You will have to change
> the exceptions
> a little bit, but it might be worth it:
> 
> ["\.txt", "\.sql"].inject(Dir.glob("/home/jesus/*")) {|result, ex|
> result.reject{|x| x =~ Regexp.new(ex)}}
> 
> This removes from my home folder all files that match ".txt" and ".sql"
> 
> Hope this helps,
> 
> Jesus.

Hi Jesus,
nice slick code there using regexp ;) just wondering if there would be a 
possibility of having a different set of file exceptions for different 
file paths. cause at the moment, there can only be one specific file 
exceptions for every file path, which might be hard in the event where 
we would need to cater to different source paths. Prolly a few arrays in 
file exception?
Posted by Jesús Gabriel y Galán (Guest)
on 13.05.2008 11:42
(Received via mailing list)
On Tue, May 13, 2008 at 5:22 AM, Clement Ow
<clement.ow@asia.bnpparibas.com> wrote:
>  >>      dst= File.join $destination, d
>  >
>  >
>  file exception?
If what you want is to associate different information to each source 
path,
I would look into a hash of hashes or a hash of arrays, or a hash of 
structs,
where you could have a complex object for the info related to an entity 
of
your program. For example:

A hash of arrays, if you only need exceptions:

exceptions_by_path = {}
exceptions_by_path["/home/jesus"] = ["\.txt", "\.sql"]
exceptions_by_path["/home/jesus/applications"] = ["\.sh", "\.bin"]
# [...]

and then

paths.each do |path|
   files = exceptions_by_path.inject(Dir.glob("#{path}/*")) {|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end

If for a path you need several different things, I would go with a
Struct or a custom class:

FileInfo = Struct.new :exceptions, :other_value, :yet_another
paths_info = {}
paths_info["/home/jesus"] = FileInfo.new(["\.txt", "\.sql"], "other
value", "another")

and then access the exceptions array as 
paths_info["/home/jesus"].exceptions
and use it as before.

Hope this helps,

Jesus.
Posted by Clement Ow (owc)
on 14.05.2008 06:38
Jesús Gabriel y Galán wrote:
> 
> Hope this helps,
> 
> Jesus.

thanks Jesus, that was really helpful! However I altered the code alil 
cause ruby rendered an error to me saying that it cant convert Array 
into string (hmmm, dunno if that's normal) for this line(suppose it's 
for the arrays in the hash, exceptions_by_path):
>files = exceptions_by_path.inject(Dir.glob("#{path}/*")) {|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}

So i decided to name my arays in the hash, file_exception[0], 
file_exception[1] ... and used this instead with an incremental value:

 >>i = 0
>>  src1 = $file_exception[i].inject(Dir.glob(src)) {|result, >>ex|result.reject{|x| x =~ Regexp.new(ex)}}
>> i = 1 + i

But i had the problem of accidentally keyin in the a wrong path name and 
nothing was shown for the 2nd source path.(Only after troubleshooting 
for 1 whole hr did i realise) So is there any way where by we can have a 
condition something like, if result == nil puts "wrong pathname"(just an 
idea, cos i tried and it doesnt work)

And btw dont get me wrong, Struct is sweet too, just that i didnt want 
to have a fixed number of exceptions for each path. ;)

Thanks again.
Posted by Clement Ow (owc)
on 14.05.2008 09:46
Also, I have problems making some file exceptions. For example, i ahve 
some files that start with 2007 which goes something like, 20070131 and 
a file which is called "Risk 20070131" but when i put 2007 in the 
file_exception variable, it deselects any file that has 2007 in the 
filename, which not what i want. I know that the regexp doesnt allow any 
like 2007* to be put in the file_exception to deselect any file which 
starts with 2007. Any ideas at all, anyone?

Regards
Posted by Jesús Gabriel y Galán (Guest)
on 14.05.2008 15:11
(Received via mailing list)
On Wed, May 14, 2008 at 6:38 AM, Clement Ow
<clement.ow@asia.bnpparibas.com> wrote:
>
> >files = exceptions_by_path.inject(Dir.glob("#{path}/*")) {|result,
>  ex| result.reject{|x| x =~ Regexp.new(ex)}}

Sorry, that's what I get for not testing the code. I think (I'm a bit
dense right
now, so this might not work) that what I meant was this:

exceptions_by_path = {}
exceptions_by_path["/home/jesus"] = ["\.txt", "\.sql"]
exceptions_by_path["/home/jesus/applications"] = ["\.sh", "\.bin"]
# [...]

and then

paths.each do |path|
  files = exceptions_by_path[path].inject(Dir.glob("#{path}/*")) 
{|result,
ex| result.reject{|x| x =~ Regexp.new(ex)}}
end

so you get the exceptions for that path, which is an Array that should
work with
inject as I was expecting.

>  And btw dont get me wrong, Struct is sweet too, just that i didnt want
>  to have a fixed number of exceptions for each path. ;)

FileInfo = Struct.new :exceptions, :other_value, :yet_another
paths_info = {}
paths_info["/home/jesus"] = FileInfo.new(["\.txt", "\.sql"], "other
value", "another")

The number of exceptions is not fixed, as you can see above,
I am storing an array in the :exceptions field, so each path can
have a different number of exception patterns.

Jesus.
Posted by Jesús Gabriel y Galán (Guest)
on 14.05.2008 15:13
(Received via mailing list)
On Wed, May 14, 2008 at 9:46 AM, Clement Ow
<clement.ow@asia.bnpparibas.com> wrote:
> Also, I have problems making some file exceptions. For example, i ahve
>  some files that start with 2007 which goes something like, 20070131 and
>  a file which is called "Risk 20070131" but when i put 2007 in the
>  file_exception variable, it deselects any file that has 2007 in the
>  filename, which not what i want. I know that the regexp doesnt allow any
>  like 2007* to be put in the file_exception to deselect any file which
>  starts with 2007. Any ideas at all, anyone?

I'm not sure if I'm understanding you correctly, but you can tweak the
regexps so that they actually match what you want. If you want
to deselect only the files that start with 2007 you can do this:

irb(main):001:0> a = %w{20071212 20073445 risk20072341}
=> ["20071212", "20073445", "risk20072341"]
irb(main):003:0> a.reject {|x| x =~ /\A2007/}
=> ["risk20072341"]

Jesus.
Posted by Clement Ow (owc)
on 15.05.2008 03:47
>Sorry, that's what I get for not testing the code. I think (I'm a bit
>dense right
>now, so this might not work) that what I meant was this:

Nah, it's fine, we all know it can get a lil tiring sometimes. As it is 
with this script im writing lol.

> files = exceptions_by_path[path].inject(Dir.glob("#{path}/*")) 
>{|result,
>ex| result.reject{|x| x =~ Regexp.new(ex)}}
>end

>so you get the exceptions for that path, which is an Array that should
>work with
>inject as I was expecting.

Anw, I get what you mean, but because Im using a config file to hold all 
the source paths, dest paths and the file exceptions, which is keyed in 
by the user, I dont wanna make it too complicated to fill in the paths, 
so i decided to just use numbers to name the arrays in the hash.

> I'm not sure if I'm understanding you correctly, but you can tweak the
> regexps so that they actually match what you want. If you want
> to deselect only the files that start with 2007 you can do this:
> 
> irb(main):001:0> a = %w{20071212 20073445 risk20072341}
> => ["20071212", "20073445", "risk20072341"]
> irb(main):003:0> a.reject {|x| x =~ /\A2007/}
> => ["risk20072341"]

Oh yea, i did try this but it doesnt work, somehow *scratches head* It 
just shows all files to be moved, and it obviously did not carry out the 
exceptions. But if it's done this way, there wont be any point in keying 
in the various different exceptions alr, hence i got stuck. :/
Posted by Clement Ow (owc)
on 15.05.2008 05:05
>> I'm not sure if I'm understanding you correctly, but you can tweak the
>> regexps so that they actually match what you want. If you want
>> to deselect only the files that start with 2007 you can do this:
>> 
>> irb(main):001:0> a = %w{20071212 20073445 risk20072341}
>> => ["20071212", "20073445", "risk20072341"]
>> irb(main):003:0> a.reject {|x| x =~ /\A2007/}
>> => ["risk20072341"]
> 
> Oh yea, i did try this but it doesnt work, somehow *scratches head* It 
> just shows all files to be moved, and it obviously did not carry out the 
> exceptions. But if it's done this way, there wont be any point in keying 
> in the various different exceptions alr, hence i got stuck. :/

Oh i realise what was wrong in the matching of this regexp, because in:
>files = exceptions_by_path[path].inject(Dir.glob("#{path}/*")) 
>{|result,
>ex| result.reject{|x| x =~ Regexp.new(ex)}}
>end
After much playing around with this statement,result here is the whole 
path name, which apparently doesnt match the filename itself, hence 
making /\A2007/ not able to work. (prolly need to use 
File.basename(result)) But can you do me a favour by explaining what 
this statement means as I dunno how come some variables assigned to 
certain commands etc.? Thanks!
Posted by Clement Ow (owc)
on 15.05.2008 09:45
Clement Ow wrote:

>> Oh yea, i did try this but it doesnt work, somehow *scratches head* It 
>> just shows all files to be moved, and it obviously did not carry out the 
>> exceptions. But if it's done this way, there wont be any point in keying 
>> in the various different exceptions alr, hence i got stuck. :/
> 
> Oh i realise what was wrong in the matching of this regexp, because in:
>>files = exceptions_by_path[path].inject(Dir.glob("#{path}/*")) 
>>{|result,
>>ex| result.reject{|x| x =~ Regexp.new(ex)}}
>>end
I found what is actually goin on in this statement. because x is the 
whole path name eg. //sins1234/home/file_name the regexp, /\A2007/ 
doesnt match the beginning of the string. And also the exceptions array 
must have "\\A2007" when passing it into Regexp.new. So my code now 
looks like this:

>>>src1 = $file_exception[i].inject(Dir.glob(src)) {|result, ex|result.reject
>>>{|x| File.basename(x) =~ Regexp.new(ex, Regexp::IGNORECASE)}}
>>>Dir.glob(src1).each do |file|
>>> #do sth
>>>end
> After much playing around with this statement,result here is the whole 
> path name, which apparently doesnt match the filename itself, hence 
> making /\A2007/ not able to work. (prolly need to use 
> File.basename(result)) But can you do me a favour by explaining what 
> this statement means as I dunno how come some variables assigned to 
> certain commands etc.? Thanks!

However, for education sake, do u mind explaining how the whole inject 
statement works? thanks! ;)
Posted by Jesús Gabriel y Galán (Guest)
on 16.05.2008 08:50
(Received via mailing list)
On Thu, May 15, 2008 at 9:45 AM, Clement Ow
<clement.ow@asia.bnpparibas.com> wrote:
> Clement Ow wrote:

> However, for education sake, do u mind explaining how the whole inject
> statement works? thanks! ;)

Enumerable#inject is a very powerful iterator (in my opinion at
least). What it does is
iterate over all elements in an enumerable, yielding to the block and
accumulator
and the next element in the enumerable. The accumulator then gets 
updated by
the result of the block, so the next iteration will be yielded that
value. If you specify
a parameter to inject, that will be the first accumulator. If not, the
first element
of the enumerable is used instead. Some examples:

irb(main):003:0> [1,2,3].inject(0) {|total,x| p [total, x]; total + x}
[0, 1]
[1, 2]
[3, 3]
=> 6
irb(main):004:0> [1,2,3].inject {|total,x| p [total, x]; total + x}
[1, 2]
[3, 3]
=> 6

Another one (although this is just to show how inject works, cause
the functionality would be better achieved by map):

irb(main):011:0>  [1,2,3,4,5].inject([]) {|total,x| p [total,x]; total + 
[x**2]}
[[], 1]
[[1], 2]
[[1, 4], 3]
[[1, 4, 9], 4]
[[1, 4, 9, 16], 5]
=> [1, 4, 9, 16, 25]

The p [total,x]; helps in showing what gets passed to the block each 
time.
Just remember: the result of the block will be the next "total".

In our case, the result of the block was the original array minus the 
files
that matched the exceptions. So each time that array was injected (well,
a copy) along with the next exception, and the result of the block would
be another array with less elements, etc.

Hope this helps,

Jesus.
Posted by Jesús Gabriel y Galán (Guest)
on 16.05.2008 08:54
(Received via mailing list)
On Fri, May 16, 2008 at 8:49 AM, Jesús Gabriel y 
Galán<jgabrielygalan@gmail.com> wrote:

> The accumulator then gets updated by
> the result of the block, so the next iteration will be yielded that
> value.

I have realized that this sentence can be confusing: the accumulator 
doesn't
get updated. The next value for the accumulator will be the result of 
the block,
not necesarily the same object.

I have read many times that you shouldn't use the same accumulator by
applying destructive methods to it, but I can't remember what the pros 
and
cons were. So this should not be done:

irb(main):012:0> [1,2,3].inject([]) {|total,x| total << x**2}
=> [1, 4, 9]

Instead you should do this:

irb(main):013:0> [1,2,3].inject([]) {|total,x| total + [x**2]}
=> [1, 4, 9]

Maybe someone can chime in and explain this a little bit better?

Jesus.
Posted by Robert Klemme (Guest)
on 16.05.2008 09:19
(Received via mailing list)
2008/5/16 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:
> On Fri, May 16, 2008 at 8:49 AM, Jesús Gabriel y Galán
> <jgabrielygalan@gmail.com> wrote:
>
>> The accumulator then gets updated by
>> the result of the block, so the next iteration will be yielded that
>> value.
>
> I have realized that this sentence can be confusing: the accumulator doesn't
> get updated. The next value for the accumulator will be the result of the block,
> not necesarily the same object.

Correct.

> I have read many times that you shouldn't use the same accumulator by
> applying destructive methods to it, but I can't remember what the pros and
> cons were.

Do you remember where you read that?

> Maybe someone can chime in and explain this a little bit better?
Sorry, but this is nonsense.  It's completely safe and even reasonable
to reuse an accumulator value.  Your second solution creates new
Arrays all the time and then throws them away.  It is much more
efficient to use Array#<< as in your first example.

If, of course the original accumulator value must not be changed
because side effects will do harm, then of course you cannot modify it
but need to create new objects.  But in the scenario above, where the
Array is solely created for #inject it is the most reasonable thing to
directly append.

Kind regards

robert
Posted by Jesús Gabriel y Galán (Guest)
on 16.05.2008 09:41
(Received via mailing list)
On Fri, May 16, 2008 at 9:18 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> 2008/5/16 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

>> I have read many times that you shouldn't use the same accumulator by
>> applying destructive methods to it, but I can't remember what the pros and
>> cons were.
>
> Do you remember where you read that?

No, I probably misunderstood something.

>> Maybe someone can chime in and explain this a little bit better?
>
> Sorry, but this is nonsense.  It's completely safe and even reasonable
> to reuse an accumulator value.  Your second solution creates new
> Arrays all the time and then throws them away.  It is much more
> efficient to use Array#<< as in your first example.

Yep, I saw that and that's why I refused to even try to explain it :-)

> If, of course the original accumulator value must not be changed
> because side effects will do harm, then of course you cannot modify it
> but need to create new objects.

This might be what I had in mind.

> But in the scenario above, where the
> Array is solely created for #inject it is the most reasonable thing to
> directly append.

It's clear that the example injecting a newly created array makes the
above explanation even worse :-).

Thanks !

Jesus.