How to identify unpaired files in a list

paul · July 18, 2009, 11:21pm

Hello there. I have a problem where I am trying to identify unpaired
files in a directory.

The directory may have files like the following:

abcd0001.txt
abcd0001.def.bak
abcd0002.txt
abcd0003.txt
abcd0003.ghi.bak
abcd0004.xyz.bak

What I’d like to do is identify the unpaired files ‘abcd0002.txt’ and
‘abcd0004.xyz.bak’ in the list above.

I’ve tried bringing all the filenames into a single array but I’m not
sure how to delete items that have duplicate root names. Deleting
duplicates is trivial, of course, but the extensions on the 2nd files
always change.

I also tried creating 2 separate arrays (one for each file type) so
that I can compare the root filenames, but then I’m left with looping
through one array many times… which will have a huge performance hit
when the filenames number in the 1,000’s.

Does anyone have any good suggestions that they can offer?

Please let me know. Thanks. Paul.

paul · July 18, 2009, 11:36pm

I also tried creating 2 separate arrays (one for each file type) so
that I can compare the root filenames, but then I’m left with looping
through one array many times… which will have a huge performance hit
when the filenames number in the 1,000’s.

Does anyone have any good suggestions that they can offer?

Perhaps nested hashes.
=r

paul · July 18, 2009, 11:39pm

Perhaps nested hashes.

Oops I meant hashes of arrays, i.e. final data structure:

{‘abcd0001’ => [‘abcd0001.txt’, ‘abcd0001.bak.txt’], ‘abcd0003’ =>
[‘abcd0003’]}

then iterate through looking for arrays with length 1 only.
GL.
=r
filename[’

paul · July 19, 2009, 2:05am

On Jul 18, 5:39 pm, Roger P. [email protected] wrote:

Perhaps nested hashes.

Oops I meant hashes of arrays, i.e. final data structure:

{‘abcd0001’ => [‘abcd0001.txt’, ‘abcd0001.bak.txt’], ‘abcd0003’ =>
[‘abcd0003’]}

then iterate through looking for arrays with length 1 only.
GL.
=r

That’s cool. I’ll give it a try. Thanks.

paul · July 19, 2009, 12:34am

I had the same requirement and ended up using Roger’s method except
with an array of arrays:
[[‘abcd0001.txt’, ‘abcd0001.bak.txt’],[‘abcd0003’], etc.]

How to identify unpaired files in a list

The directory may have files like the following:

abcd0001.txt abcd0001.def.bak abcd0002.txt abcd0003.txt abcd0003.ghi.bak abcd0004.xyz.bak

abcd0001.txt
abcd0001.def.bak
abcd0002.txt
abcd0003.txt
abcd0003.ghi.bak
abcd0004.xyz.bak