Array of strings - finding letter combinations

dubstep · May 2, 2012, 7:52am

Hi All,

I am new to ruby and I want to learn by examples and here is one from my
everyday work. I have an array of strings where each string contains a
combination of an ampersand and a letter (&e). The trick is, this
combination may only appear once in the whole array, so none of the
other strings may contain the same combination.

I need to solve this problem so that ruby would suggest a possible and
not yet taken combination where duplicates occur. In case where no more
possible combinations exist, an error would be raised.

My question is: could someone give me some general pointers how would I
achieve this (just enough to get me going:)

Thank you kindly.

regards
seba

sebastjan_h · May 2, 2012, 8:26am

Sebastjan H. wrote in post #1059172:

The way I am doing this manually (yes, it is a tedious job:) is that I
compare the letters used with the alphabet (which could be defined as
another array). That gives me the result of taken and free letters. Then
I start the reassignment process.

regards,
seba

Hi All,

I am new to ruby and I want to learn by examples and here is one from my
everyday work. I have an array of strings where each string contains a
combination of an ampersand and a letter (&e). The trick is, this
combination may only appear once in the whole array, so none of the
other strings may contain the same combination.

I need to solve this problem so that ruby would suggest a possible and
not yet taken combination where duplicates occur. In case where no more
possible combinations exist, an error would be raised.

My question is: could someone give me some general pointers how would I
achieve this (just enough to get me going:)

Thank you kindly.

regards
seba

sebastjan_h · May 2, 2012, 11:55am

Hi,

Sebastjan H. wrote in post #1059172:

I am new to ruby and I want to learn by examples and here is one from my
everyday work. I have an array of strings where each string contains a
combination of an ampersand and a letter (&e). The trick is, this
combination may only appear once in the whole array, so none of the
other strings may contain the same combination.

You could start off by selecting the letter after “&” in a string. This
can be done with a regex:

str = ‘abc&exyz’
str[/(?<=&)[a-z]/]

Then you iterate over the array, collecting the duplicate strings and
the used characters:

strings = [’&a’, ‘&b’, ‘&c’, ‘&a’]
used_chars = []
duplicates = []
strings.each_with_index do |str, i|
char = str[/(?<=&)[a-z]/]
duplicates << i if used_chars.include? char
used_chars << char
end

And after this you iterate over the duplicate strings and suggest and
unused character (or raise in error, if there are none left):

duplicates.each do |index|
unused = (‘a’…‘z’).find {|char| not used_chars.include? char}
if unused.nil?
raise ‘all letters taken’
else
puts “combination #{strings[index][/&[a-z]/]} already taken.”,
“Use &#{unused} instead?”
used_chars << unused
end
end

The actual processing of the user input has to be added, of course.

sebastjan_h · May 2, 2012, 2:26pm

Sebastjan H. wrote in post #1059226:

Hi Jan,

thank you so much for this. I went through the code and I think I
understand most of it. I am only having hard time understanding the
beginning:

str = ‘abc&exyz’
str[/(?<=&)[a-z]/]

What do these two lines define. I guess the second one is the regex,
what about the first one. And moreover, how are these two related?

The first line simply defines a test string of the format you described
in your first post (it contains the combination “&e”). The other
characters are just gibberish to show that the substring is found
inpedendent of the position or whatever.

The regex basically says: Look for a lowercase letter from a to z, which
is preceeded by an ampersand. Using this regex in the “[]” method will
return the first matching substring (or character in this case).

sebastjan_h · May 2, 2012, 8:50pm

Ok, thx. Just so I understand correctly, these two lines are just
examples and are not part of the actual code. Since the array of strings
to be searched in is actually strings in the code below. Right?

Furthermore, when I used the regex as you put it, I got this error:

duplicates.rb:5: undefined (?..) sequence: /(?<=&)[a-z]/

So I changed it to /&[a-z]/ and the error was gone.

Since the ampersand can also be followed by a capital letter I tried
this:
/&[a-z]i/ - I searched the net for modifiers, but I guess this would
only work on a specific letter? So how to add the possibility of capital
letter in the regex?

Finally, I ran this in the terminal and I got the message that &a is
taken and should a be used instead, which doesn’t make any sense.

I also putsed the used_chars and duplicates to check it and it’s not
clear to me why the puts duplicates returns: a 3. Is 3 the index?

thx and kind regards,
seba

Jan E. wrote in post #1059230:

Sebastjan H. wrote in post #1059226:

Hi Jan,

thank you so much for this. I went through the code and I think I
understand most of it. I am only having hard time understanding the
beginning:

str = ‘abc&exyz’
str[/(?<=&)[a-z]/]

What do these two lines define. I guess the second one is the regex,
what about the first one. And moreover, how are these two related?

The first line simply defines a test string of the format you described
in your first post (it contains the combination “&e”). The other
characters are just gibberish to show that the substring is found
inpedendent of the position or whatever.

The regex basically says: Look for a lowercase letter from a to z, which
is preceeded by an ampersand. Using this regex in the “[]” method will
return the first matching substring (or character in this case).

sebastjan_h · May 2, 2012, 9:32pm

2012/5/2 Sebastjan H. [email protected]:

Ok, thx. Just so I understand correctly, these two lines are just
examples and are not part of the actual code. Since the array of strings
to be searched in is actually strings in the code below. Right?

Furthermore, when I used the regex as you put it, I got this error:

duplicates.rb:5: undefined (?..) sequence: /(?<=&)[a-z]/

So I changed it to /&[a-z]/ and the error was gone.

It looks like you’re using an old version of Ruby; this regex won’t
work on Ruby 1.8 or earlier.

Since the ampersand can also be followed by a capital letter I tried
this:
/&[a-z]i/ - I searched the net for modifiers, but I guess this would
only work on a specific letter? So how to add the possibility of capital
letter in the regex?

That would be /&[a-z]/i - note that “i” is outside of “//”.

– Matma R.

sebastjan_h · May 2, 2012, 2:15pm

Hi Jan,

thank you so much for this. I went through the code and I think I
understand most of it. I am only having hard time understanding the
beginning:

str = ‘abc&exyz’
str[/(?<=&)[a-z]/]

What do these two lines define. I guess the second one is the regex,
what about the first one. And moreover, how are these two related?

I apologise if my questions are puzzling…

I plan to have the file with strings uploaded and then execute this code
and probably print the results in a file as well.

regards,
seba

Jan E. wrote in post #1059205:

Hi,

Sebastjan H. wrote in post #1059172:

I am new to ruby and I want to learn by examples and here is one from my
everyday work. I have an array of strings where each string contains a
combination of an ampersand and a letter (&e). The trick is, this
combination may only appear once in the whole array, so none of the
other strings may contain the same combination.

You could start off by selecting the letter after “&” in a string. This
can be done with a regex:

str = ‘abc&exyz’
str[/(?<=&)[a-z]/]

Then you iterate over the array, collecting the duplicate strings and
the used characters:

strings = [’&a’, ‘&b’, ‘&c’, ‘&a’]
used_chars = []
duplicates = []
strings.each_with_index do |str, i|
char = str[/(?<=&)[a-z]/]
duplicates << i if used_chars.include? char
used_chars << char
end

And after this you iterate over the duplicate strings and suggest and
unused character (or raise in error, if there are none left):

duplicates.each do |index|
unused = (‘a’…‘z’).find {|char| not used_chars.include? char}
if unused.nil?
raise ‘all letters taken’
else
puts “combination #{strings[index][/&[a-z]/]} already taken.”,
“Use &#{unused} instead?”
used_chars << unused
end
end

The actual processing of the user input has to be added, of course.

sebastjan_h · May 2, 2012, 9:59pm

Sebastjan H. wrote in post #1059285:

Ok, thx. Just so I understand correctly, these two lines are just
examples and are not part of the actual code. Since the array of strings
to be searched in is actually strings in the code below. Right?

Yes.

Furthermore, when I used the regex as you put it, I got this error:

duplicates.rb:5: undefined (?..) sequence: /(?<=&)[a-z]/

So I changed it to /&[a-z]/ and the error was gone.

Yes, but this regex will include the ampersand. To get the actual
letter, you have to write something like str[/&[a-z]/][1].

Anyway, you should definitely update your Ruby, if possible. Version 1.8
is more or less dead.

I also putsed the used_chars and duplicates to check it and it’s not
clear to me why the puts duplicates returns: a 3. Is 3 the index?

Yes, but you might as well save the actual strings in the array.

sebastjan_h · May 2, 2012, 9:48pm

On 02.05.2012 21:32, Bartosz Dziewoński wrote:

2012/5/2 Sebastjan H.[email protected]:
| Since the ampersand can also be followed by a capital letter I tried
| this:
| /&[a-z]i/ - I searched the net for modifiers, but I guess this would
| only work on a specific letter? So how to add the possibility of capital
| letter in the regex?

That would be /&[a-z]/i - note that “i” is outside of “//”.

Alternatively, you could use /&[[:alpha:]]/, which will also match
non-English letters.

More information here:

sebastjan_h · May 2, 2012, 11:25pm

for some reason the mailing list server rejected this post, so i’ll
paste it in here:

One way is:

If there are more than 26 entries, return an error and exit - you
clearly can’t assign a unique letter to each
Go through the whole array, counting how many times you see each
letter (use a hash that starts off with {“a” => 0, “b” => 0, …}
Iterate through the hash, examining each letter and its count

if the count is exactly 1, ignore it
if the count is 0, add it to an “unused” array
if the count is 2 or more, add it to a “duplicates” hash, along with
the count

You should now have the following two structures:

unused = [“e”, “h”, “l”, p", …]
duplicates = { “a” => 2, “f” => 3, “q” => 2, …}

Now iterate through your original array again. For each entry,
check to see if it is in the duplicates hash. If it is, remove one
letter from unused (see the array.pop method) and use it as a
replacement. Decrease the original letter’s count in the duplicates
hash by 1, and if its count is now 1 remove it from the hash

So for instance if your original array was

["&f", “&a”, “&f”, “&q”, … ]

you’d see the first “&f”, note that it was in duplicates with a count
of 3, replace it by “e” (the first letter from unused) and reduce its
count. you would now have

array = ["&e", “&a”, “&f”, “&q”, …]
unused = [“h”, “l”, “p”, …]
duplicates = { “a” => 2, “f” => 2, “q” => 2, …}

When you are done, all the duplicates will have been replaced by unused
letters.

martin

sebastjan_h · May 3, 2012, 10:15am

Thank you all for your kind help. I’ll try to complete this task and
when I am done, I’ll post back if maybe someone else can benefit from
it.

regards,
seba

sebastjan_h · May 17, 2012, 2:14pm

Sebastjan H. wrote in post #1059386:

Thank you all for your kind help. I’ll try to complete this task and
when I am done, I’ll post back if maybe someone else can benefit from
it.

regards,
seba

Hello again,
It seems I’m stuck:)

After rethinking what I actually need, I’ve come with the script below.
For some reason, it only works when there are 26 or more strings in the
array. I’ve been looking like crazy, but I can’t find the error. Also
any optimisation or pointing out additional errors is most welcome. Keep
in mind, I am a complete beginner:)

Note: I only need a list of unused characters, which may be used for
assignments.

1. Import strings from a file and add them to the array;

if there are more than 26 strings imported from the file,

an alert is given and user decision is required

User may decide to continue since there may still be a char

unused if others ore used more than once.

file = ARGV[0]
strings = []
used_chars = []
duplicates = []
all = ["&a", “&b”, “&c”, “&d”, “&e”, “&f”, “&g”, “&h”, “&i”, “&j”, “&k”,
“&l”,
“&m”, “&n”, “&o”, “&p”, “&r”, “&s”, “&t”, “&u”, “&v”, “&w”, “&x”, “&y”,
“&z”]

input_file = File.open(file, “a+”)
input_file.each_line do |line|
strings << line
end

strings.map!{|c| c.downcase.strip} # Eliminates the whole upcase issue.

if strings.length >= 26
puts "There are more than 26 strings, unique assignments are not
       possible."

 puts "Do you still want to continue with the process? Enter y or n"
 print ">"
 answer = STDIN.gets.chomp()
   if answer == "n"
   puts "Process aborted by user."
   exit
   else

2. Finds the ampersand combinations; adds them to used and duplicates;

checks for number of occurences ---- still under consideration if

needed

strings.each do|str|
char = str[/&[a-z]/]
duplicates << char if used_chars.include? char
used_chars << char
end

3. Compares the all and used arrays and adds the list of unused

chars to the file

unused = [(all)-(used_chars)]

input_file.write(“Here are the unused characters:” “\n”)
unused.each do |char|
input_file.write(char)
input_file.write("\n")
end
end
end
input_file.close()

ToDo:

1. The automatic suggestion would only be possible if the unused

characters

would be compared with the characters in the strings with duplicates.

If the unused character exists in the string, suggest the character

from the

unused array, otherwise move on to the next. If no match is found,

move to

the next string with duplicates.

2. use Shoes for GUI.

thx
kind regards
seba

sebastjan_h · May 17, 2012, 4:34pm

Ok, I hope this does it:)

   if answer == "n"
   puts "Process aborted by user."
   exit
   else
   end
   else
end

regards,
seba

sebastjan_h · May 18, 2012, 3:45am

On Thu, May 17, 2012 at 10:34 PM, Sebastjan H. [email protected]
wrote:

Ok, I hope this does it:)

  if answer == "n"
  puts "Process aborted by user."
  exit
  else
  end
  else

end

the sequence

else
end
else
end

are very doubting…
kind regards -botp

sebastjan_h · May 17, 2012, 2:21pm

I think I found it:

   if answer == "n"
   puts "Process aborted by user."
   exit
   END -> this one was at the bottom instead of here
   else

Any other mistakes I’ve made?

regards,
seba

sebastjan_h · May 18, 2012, 9:01am

botp wrote in post #1061230:

On Thu, May 17, 2012 at 10:34 PM, Sebastjan H. [email protected]
wrote:
Ok, I hope this does it:)
  if answer == "n"
  puts "Process aborted by user."
  exit
  else
  end
  else
end
the sequence

else
end
else
end

are very doubting…
kind regards -botp

I know it looks odd, but as I understand each IF block has to have an
END and it seems to me this is the only way the script works.

But since I am new to this I posted here so maybe someone will find
errors I made.

regards,
seba

Array of strings - finding letter combinations

Note: I only need a list of unused characters, which may be used for assignments.

1. Import strings from a file and add them to the array;

if there are more than 26 strings imported from the file,

an alert is given and user decision is required

User may decide to continue since there may still be a char

unused if others ore used more than once.

2. Finds the ampersand combinations; adds them to used and duplicates;

checks for number of occurences ---- still under consideration if

3. Compares the all and used arrays and adds the list of unused

chars to the file

ToDo:

1. The automatic suggestion would only be possible if the unused

would be compared with the characters in the strings with duplicates.

If the unused character exists in the string, suggest the character

unused array, otherwise move on to the next. If no match is found,

the next string with duplicates.

2. use Shoes for GUI.

Ok, I hope this does it:)

Ok, I hope this does it:)

end

I think I found it:

Ok, I hope this does it:)

end

Note: I only need a list of unused characters, which may be used for
assignments.