Split a string at a certain character

Dobai-Pataky_BSSSSl · December 4, 2010, 8:27pm

I am writig an app where the user will enter a string and then I need to
split that string into two parts and then I can process the two parts
from there and show them some output.

I need help with the best Ruby way to split the string into the two
parts…

Here are examples of some typical user inputs, and then what it must be
split into for further processing. Hopefully you will see a pattern to
it and can help me. (Note: sometimes there will be a space between the
two parts, but some user may omit the space between [part 1] and [part
2], so I must handle both cases)

(They will not enter the quotes, I just did it for clarity)

Examples (several to help you see the pattern):

“80e6” must split to “80” and “e6”

“80 e6” must split to “80” and “e6”

“12.5H7” must split to “12.5” and "H7

“120 JS11” must split to “120” and “JS11”

“20.8a11” must split to “20.8” and “a11”

“45.50 h2” must split to “45.50” and “h2”

“90.2F3” must split to “90.2” and “F3”

“45js4” must split to “45” and “js4”

Here is the basic pattern, in words:

[part 1] followed by [part 2]

which is to say:

[part 1 = an integer or floating point number] followed by [part 2 = a
single or double set of letters (a-z or A-Z), which is then folled by an
integer]

If there is a space between [part 1] and [part 2], it needs to be
ignored.

If you are really interested in what all this is for, you can read on…
(it may help you see the overall picture.

Eventually, [part 1] will be converted to a floating point number, and
[part 2] will be used to look up some other floating point number in a
database table which is then used as a variance amount that will be
applied to [part 1].

matthew · December 4, 2010, 9:16pm

On Sat, Dec 4, 2010 at 8:31 PM, Matt S. [email protected]
wrote:

[part 1] followed by [part 2]

which is to say:

[part 1 = an integer or floating point number] followed by [part 2 = a
single or double set of letters (a-z or A-Z), which is then folled by an
integer]

If there is a space between [part 1] and [part 2], it needs to be
ignored.

Pick Axe page 70-75:

ruby-1.9.2-head > s = ‘12.7 AB36’
=> “12.7 AB36”
ruby-1.9.2-head > pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
=> /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
ruby-1.9.2-head > s.match pattern
=> #<MatchData “12.7 AB36” 1:“12.7” 2:“AB36”>
ruby-1.9.2-head > $1
=> “12.7”
ruby-1.9.2-head > $2
=> “AB36”

HTH,

Peter

matthew · December 4, 2010, 9:30pm

On Sat, Dec 4, 2010 at 1:31 PM, Matt S.
[email protected]wrote:

two parts, but some user may omit the space between [part 1] and [part
“12.5H7” must split to “12.5” and "H7

applied to [part 1].

–
Posted via http://www.ruby-forum.com/.

This meets all of your cases, though none of them include negative signs
at
the beginning, and you haven’t specified how it should behave for bad
data
(ie either malformed, or not as prestine as the inputs you’ve supplied).

require ‘test/unit’

def mysplit(str)
str[/^(\d+(?:.\d+)?)\s*([^$]*)$/]
return $1 , $2
end

class MysplitTester < Test::Unit::TestCase
def test_1
assert_equal [ “80” , “e6” ] , mysplit( “80e6” )
end
def test_2
assert_equal [ “80” , “e6” ] , mysplit( “80 e6” )
end
def test_3
assert_equal [ “12.5” , “H7” ] , mysplit( “12.5H7” )
end
def test_4
assert_equal [ “120” , “JS11” ] , mysplit( “120 JS11” )
end
def test_5
assert_equal [ “20.8” , “a11” ] , mysplit( “20.8a11” )
end
def test_6
assert_equal [ “45.50” , “h2” ] , mysplit( “45.50 h2” )
end
def test_7
assert_equal [ “90.2” , “F3” ] , mysplit( “90.2F3” )
end
def test_8
assert_equal [ “45” , “js4” ] , mysplit( “45js4” )
end
end

matthew · December 4, 2010, 11:13pm

Pick Axe page 70-75:

ruby-1.9.2-head > s = ‘12.7 AB36’
=> “12.7 AB36”
ruby-1.9.2-head > pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
=> /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
ruby-1.9.2-head > s.match pattern
=> #<MatchData “12.7 AB36” 1:“12.7” 2:“AB36”>
ruby-1.9.2-head > $1
=> “12.7”
ruby-1.9.2-head > $2
=> “AB36”

HTH,

Peter

Peter!!! You are THE man!!! I actually just bought the 1.9.2 version
of the book for just $10. Regular Expressions are now on page 97
(Chapter 7). I’m an FoxPro programmer and just beginning my work in
Ruby.

I threw your pattern and match command into a Controller action in my
Rails app, and guess what… It works!!

Here’s my code, after just 1 minute of work, thanks to you. (Some
refinements are now needed for stupid user input, but I’ll get.

I’m so excited to see this come to life so easily. The Ruby community is
awesome.

def create
@conversion = Conversion.new(params[:conversion])
pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
user_input = @conversion.shaft_size
@size = user_input.match pattern

respond_to do |format|
  format.html { render :show }
end

end

Now I can access the @size[1] and @size[2] to get what I need done.

How many times can I say “Thanks”???

matthew · December 4, 2010, 11:20pm

Josh C. wrote in post #966218:

def mysplit(str)
str[/^(\d+(?:.\d+)?)\s*([^$]*)$/]
return $1 , $2
end

Josh - Well, this looks both simple and powerful as well. You all are
obviously some smart Ruby experts.

I’m gonna test your regular expression and compare it to the one from
Peter to see if I can see the fine points between them both.

Thanks for taking the time to respond.

I especially like that testing thing you showed me. I need to study that
as well.

matthew · December 5, 2010, 1:52pm

On 04.12.2010 23:14, Matt S. wrote:

ruby-1.9.2-head> $2
=> “AB36”

   format.html { render :show }
 end
end

Now I can access the @size[1] and @size[2] to get what I need done.

How many times can I say “Thanks”???

There are a few things to say about this solution though:

The float part does not exactly match floating point numbers but it
would also match these sequences: 127.0.0.1, 1…10, 4…3 etc.

“[0-9]” can be replaced by “\d”.

Assigning the pattern to a variable and then using that for matching is
generally less efficient than directly using the pattern.

The pattern “\Z” allows for a newline at the end of the string. This
may be OK or not (you can also use #chomp on the input to remove
trailing line terminators before handing the input to the method).

I would rather go with a combination of Peter’s and Josh’s solution:

def parse(str)
if /\A(\d+(?:.\d+)?)\s*([a-z]{1,2}\d+)\z/i =~ str
return Float($1), $2
else
raise “Invalid input: %p” % str
end
end

If you need to cope with negative floats and signs in general you can
change the initial part to

\A([-+]?\d+(?:.\d+)?)

Few notes and explanations:

The code does the conversion inside the method which may be desirable or
not - depending on your context.

I picked Float() for added robustness, it will raise an exception if the
matching was flawed (which should not be the case here).

I added error checking with exception.

Btw, your original description of the sequence is pretty good for direct
translation into a regular expression.

Kind regards

robert

matthew · December 5, 2010, 9:57pm

On Sun, Dec 5, 2010 at 1:50 PM, Robert K.
[email protected] wrote:
…

There are a few things to say about this solution though:

Indeed … I was showing a lazily coded quick first approach (and a
hint to the documentation). Thanks for all the proposed improvements.

Peter

matthew · December 15, 2010, 4:21pm

Matt S. wrote in post #968427:

However, I’ve notice that the server console is spitting out this
message below with every request hit that comes in:

warning: nested repeat operator + and ? was replaced with '’:
/\A((?:\d+)?(?:.\d+)?)\s([a-z]{1,2}\d+)\z/

The whole process works fine, and but I am confused as to what this
means.

\d+ means “a digit, one or more times”

(?: … ) makes a non-capturing group

(?: … )? means “this group 0 or 1 times”

So the part of your regexp where you have

(?:\d+)?

says “one or more digits, 0 or more times”. Ruby is pointing out that
this is a convoluted way of saying “0 or more digits”, and is optimising
it to:

\d*

(where ‘*’ means 0 or more times)

matthew · December 15, 2010, 4:45pm

\d+ means “a digit, one or more times”

(?: … ) makes a non-capturing group

(?: … )? means “this group 0 or 1 times”

So the part of your regexp where you have

(?:\d+)?

says “one or more digits, 0 or more times”. Ruby is pointing out that
this is a convoluted way of saying “0 or more digits”, and is optimising
it to:

\d*

(where ‘*’ means 0 or more times)

I see now. I made this change and the message has gone away.

Thanks fo taking the time to give such a thorough explanation.

matthew · December 14, 2010, 11:15pm

I would rather go with a combination of Peter’s and Josh’s solution:

def parse(str)
if /\A(\d+(?:.\d+)?)\s*([a-z]{1,2}\d+)\z/i =~ str
return Float($1), $2
else
raise “Invalid input: %p” % str
end
end

robert

Thanks, all… This has greatly solved many problems for me.

However, I’ve notice that the server console is spitting out this
message below with every request hit that comes in:

warning: nested repeat operator + and ? was replaced with '’:
/\A((?:\d+)?(?:.\d+)?)\s([a-z]{1,2}\d+)\z/

The whole process works fine, and but I am confused as to what this
means.