Method to groom a string to floating point representation

alexd · April 13, 2010, 4:34am

I have a program that asks for the user to enter a string that
represents a floating point number. Everytime a new character is typed
I want a method that checks to make sure the string makes sense as a
floating point number, and if not, deletes any bad characters. For
instance, if the user enters ‘4.5e+6.7’ I want the method to delete the
extra decimal place and return ‘4.5e+67’. Or, if the user enters
something like ‘4.5+e7’ it deletes the misplaced plus sign and returens
‘4.7e7’. In short, I want the method to only allow correct
representations of floating point numbers, but I want it to remain as a
string. Anything other than a number or +, -, ., or e or E, should be
deleted.

I wrote a method that works like I want (attached), but it is long and
cumbersome. I’m wondering if anyone has a shorter, better way to do
this.

–Alex

alexd · April 13, 2010, 8:41am

On Mon, Apr 12, 2010 at 9:34 PM, Alex DeCaria
<[email protected]

wrote:

deleted.

Posted via http://www.ruby-forum.com/.

It would probably be easier if you provided a set of tests we could
check
our function against, where we could be confident our function was
correct
once it passed all the tests.

alexd · April 13, 2010, 1:20pm

Josh C. wrote:

It would probably be easier if you provided a set of tests we could
check
our function against, where we could be confident our function was
correct
once it passed all the tests.

Here are some examples of what it should do:

Delete any characters other than digits, +, -, e, E, or .:
'-24.5fge4x’5 => ‘-24.5e45’

Delete any extra decimals:
‘2.4.5’ => ‘2.45’
‘2…45’ => ‘2.45’

Delete any decimals in an exponent:
‘245e7.6’ => ‘2.45e76’

Delete any extra or misplaced + or â€“ signs:
‘+45-68+e+45-’ => ‘4568e+45’

Delete any extra or misplaced â€˜eâ€™ or â€˜Eâ€™ characters (first occurance of
‘e’ or ‘E’ has precedence unless it doesn’t make sense):
‘4.67e6e-7’ => ‘4.67e67’
‘+e4.67e-7’ => ‘+4.67e-7’

The motivation for this is for a GUI input textbox, so that if the user
enters a bad string it automatically corrects it to a valid
floating-point representation in string form before converting to a
floating-point for calculations. I toyed with just doing
str = str.to_f.to_s
and letting Ruby figure out the floating point respesentation, but I’d
like more control over how the string is converted to floating point
representation. For example, I want
‘2…45e9’ => ‘2.45e9’, whereas ‘2…45e9’.to_f.to_s => ‘2.0’

–Alex

alexd · April 14, 2010, 4:37am

On Tue, Apr 13, 2010 at 6:20 AM, Alex DeCaria
<[email protected]

wrote:

Delete any decimals in an exponent:
‘245e7.6’ => ‘2.45e76’

Where did the dot in between 2 and 4 come from? Am I interpreting the
String
or just cleaning it?

Delete any extra or misplaced + or â€“ signs:
‘+45-68+e+45-’ => ‘4568e+45’

Delete any extra or misplaced â€˜eâ€™ or â€˜Eâ€™ characters (first occurance of
‘+e4.67e-7’ => ‘+4.67e-7’

Why does the plus in front of 45 in the first one go away, but the plus
in
front of the e in the second one stays?

This is what I have so far, please check and correct any tests that
should
be different

def clean_string( str , options = Hash.new )
str =~ /\A([-+]?)([^eE.].?)([^eE])((?:[eE][±]?)?)([^Z]*)\Z/
posneg , prepre , postpre , e , post = $1 , $2 , $3 , $4 , $5
posneg + prepre + postpre.gsub(/[^0-9]/,’’) + e +
post.gsub(/[^0-9]/,’’)
end

require ‘test/unit’
class TestCleanString < Test::Unit::TestCase
def test_delete_chars
assert_equal ‘-24.5e45’ , clean_string(’-24.5fge4x5’)
end
def test_delete_extra_decimal
assert_equal ‘2.45’ , clean_string(‘2.4.5’)
assert_equal ‘2.45’ , clean_string(‘2…45’)
assert_equal ‘2.45’ , clean_string(‘2…45’)
end
def test_delete_extra_decimal_in_exponent
assert_equal ‘245e76’ , clean_string(‘245e7.6’) # you said this
should
be ‘2.45e76’ , but where did first dot come from?
end
def test_delete_extra_or_misplaced_pos_and_neg_signs
assert_equal ‘4568e+45’ , clean_string(’+45-68+e+45-’)
end
def test_delete_extra_or_misplaced_e_or_E
assert_equal ‘4.67e67’ , clean_string(‘4.67e6e-7’)
assert_equal ‘+4.67e-7’ , clean_string(’+e4.67e-7’)
end
end

alexd · April 14, 2010, 2:06pm

Delete any decimals in an exponent:
‘245e7.6’ => ‘2.45e76’

Where did the dot in between 2 and 4 come from? Am I interpreting the
String
or just cleaning it?

This was a typo on my part. It should have read:
‘245e7.6’ => ‘245e76’

Delete any extra or misplaced + or â€“ signs:
‘+45-68+e+45-’ => ‘4568e+45’

Delete any extra or misplaced â€˜eâ€™ or â€˜Eâ€™ characters (first occurance of
‘+e4.67e-7’ => ‘+4.67e-7’

Why does the plus in front of 45 in the first one go away, but the plus
in
front of the e in the second one stays?

Again, a typo on my part. It should have been:
‘+45-68+e+45-’ => ‘+4568e+45’

This is what I have so far, please check and correct any tests that
should
be different

Thank! I’ll check the code you gave me and see how it does.

–Alex

alexd · April 14, 2010, 8:47am

Hello,

2010/4/14 Josh C. [email protected]:

On Tue, Apr 13, 2010 at 6:20 AM, Alex DeCaria <[email protected]

wrote:

Delete any decimals in an exponent:
‘245e7.6’ => ‘2.45e76’

Where did the dot in between 2 and 4 come from? Am I interpreting the String
or just cleaning it?

As said Josh, here you are interpreting the string rather than
cleaning it. 245e76 is a valid float, just not in the usual 2.45e78
form.

BTW, I would rather not do any cleaning under the hood: let the user
correct its input himself. For example, give the input to Float() and
if an error is raised (which Float does as opposed to to_f which never
raise an error), rescue it by giving feedback to the user (where you
could use your method to propose an alternative if you want) but do
not continue without letting the user know he has made a mistake and
giving him the ability to change his mind.

Cheers,

alexd · April 14, 2010, 2:13pm

BTW, I would rather not do any cleaning under the hood: let the user
correct its input himself. For example, give the input to Float() and
if an error is raised (which Float does as opposed to to_f which never
raise an error), rescue it by giving feedback to the user (where you
could use your method to propose an alternative if you want) but do
not continue without letting the user know he has made a mistake and
giving him the ability to change his mind.

Cheers,

I didn’t realize the difference between Float() and .to_f. Thanks for
the suggestion.

The user is still aware if they entered an incorrect string, since they
are entering it into a GUI textbox, and the string cleaning is done
after each character is entered. Thus, if they try to enter a misplaced

sign or another bad character, they won’t see it appear in the
textbox, which should cause them to notice it.

–Alex

alexd · April 14, 2010, 3:40pm

Hello Alex,

The user is still aware if they entered an incorrect string, since they
are entering it into a GUI textbox, and the string cleaning is done
after each character is entered. Thus, if they try to enter a misplaced

sign or another bad character, they won’t see it appear in the
textbox, which should cause them to notice it.

Well, then you can’t use the Float() trick because 1.0e3 is a valid
but 1.0e is not.
Then there will be a lot of strings your user won’t be able to type
even if they are valid in the end.

Cheers,

alexd · April 14, 2010, 3:33pm

Josh C. wrote:

This is what I have so far, please check and correct any tests that
should
be different

Josh,

Your code works great! I knew there had to be a more elegant way to do
this rather than my brute force method.

The only test it didn’t seem to work on was eliminating extra + or -
signs, such as ‘+45-2+8’ => ‘+4528’, but now that I see what you are
doing I can probably figure out how to do that. I definitely need to
learn more about regular expressions!

Thanks for your time and effort.

–Alex

alexd · April 14, 2010, 3:48pm

Jean-Julien F. wrote:

Hello Alex,

The user is still aware if they entered an incorrect string, since they
are entering it into a GUI textbox, and the string cleaning is done
after each character is entered. ï¿½Thus, if they try to enter a misplaced

sign or another bad character, they won’t see it appear in the
textbox, which should cause them to notice it.

Well, then you can’t use the Float() trick because 1.0e3 is a valid
but 1.0e is not.
Then there will be a lot of strings your user won’t be able to type
even if they are valid in the end.

Cheers,

Yes, there has to be some additional logic to allow a trailing ‘e’ with
the assumption that the user will next enter a valid character
afterward. That’s what makes it a little complicated (and fun) to
figure out. The goal is, as the user is entering data, to not allow
them to enter anything that is obviously not going to work as a floating
point representation.

–Alex

alexd · April 14, 2010, 4:08pm

Hello Alex,

Yes, there has to be some additional logic to allow a trailing ‘e’ with
the assumption that the user will next enter a valid character
afterward. That’s what makes it a little complicated (and fun) to
figure out. The goal is, as the user is entering data, to not allow
them to enter anything that is obviously not going to work as a floating
point representation.

Sure, fun it is :o)
But that’s exactly the kind of software that could drive me mad (as a
user). You assume that your user is making a typo but what if he is
not ? What if he truly believe what he is writing is a perfectly
correct float ? He will retry again, and again and again untill he
decide that the whole software is just a fraud :o) So IMHO, it is more
efficient to let your user know what kind of error he is (possibly
repetitively) doing and propose an alternative rather than erase what
he believe could be right.

Cheers,

alexd · April 14, 2010, 4:59pm

On Wed, Apr 14, 2010 at 8:33 AM, Alex DeCaria
<[email protected]

wrote:

The only test it didn’t seem to work on was eliminating extra + or -

It wasn’t done, because I wanted clarification on the tests first.

Anyway, this one passes all tests.

def clean_string(str)
str =~ /\A([-+]?)([eE]?)([^eE.].?)([^eE])((?:[eE][±]?)?)([^Z]*)\Z/
posneg , misplaced_e , before_dec , after_dec , e , exponent = $1 , $2
,
$3 , $4 , $5 , $6
posneg + before_dec.gsub(/[^0-9.]/,’’) + after_dec.gsub(/[^0-9]/,’’) +
e +
exponent.gsub(/[^0-9]/,’’)
end

require ‘test/unit’
class TestCleanString < Test::Unit::TestCase
def test_delete_chars
assert_equal ‘-24.5e45’ , clean_string(’-24.5fge4x5’)
end
def test_delete_extra_decimal
assert_equal ‘2.45’ , clean_string(‘2.4.5’)
assert_equal ‘2.45’ , clean_string(‘2…45’)
assert_equal ‘2.45’ , clean_string(‘2…45’)
end
def test_delete_extra_decimal_in_exponent
assert_equal ‘245e76’ , clean_string(‘245e7.6’)
end
def test_delete_extra_or_misplaced_pos_and_neg_signs
assert_equal ‘+4568e+45’ , clean_string(’+45-68+e+45-’)
end
def test_delete_extra_or_misplaced_e_or_E
assert_equal ‘4.67e67’ , clean_string(‘4.67e6e-7’)
assert_equal ‘+4.67e-7’ , clean_string(’+e4.67e-7’)
end
end

alexd · April 14, 2010, 5:09pm

Jean-Julien F. wrote:

Sure, fun it is :o)
But that’s exactly the kind of software that could drive me mad (as a
user). You assume that your user is making a typo but what if he is
not ? What if he truly believe what he is writing is a perfectly
correct float ? He will retry again, and again and again untill he
decide that the whole software is just a fraud :o) So IMHO, it is more
efficient to let your user know what kind of error he is (possibly
repetitively) doing and propose an alternative rather than erase what
he believe could be right.

Cheers,

I can’t argue with the point you are making. I will continue to use the
automatic string grooming, but will probably include a message to the
user letting them know why what they are typing isn’t showing up in the
textbox.

–Alex

alexd · April 14, 2010, 5:06pm

Josh C. wrote:

On Wed, Apr 14, 2010 at 8:33 AM, Alex DeCaria
<[email protected]

wrote:

The only test it didn’t seem to work on was eliminating extra + or -

It wasn’t done, because I wanted clarification on the tests first.

Anyway, this one passes all tests.

Thanks again, Josh! May I use your code in my (non-commercial,
educational-use-only) app?

–Alex

alexd · April 14, 2010, 5:19pm

On Wed, Apr 14, 2010 at 10:06 AM, Alex DeCaria <
[email protected]> wrote:

Thanks again, Josh! May I use your code in my (non-commercial,
educational-use-only) app?

–Alex

Posted via http://www.ruby-forum.com/.

Sure, go ahead and throw the wtfpl on there, if you feel more
comfortable
with that. http://sam.zoy.org/wtfpl/

And I guarantee that it does nothing other than pass the set of tests it
was
posted with, on my machine, with the settings that were used at the time
of
testing. So no warranty of any kind.

Have fun

alexd · April 14, 2010, 5:47pm

On Wed, Apr 14, 2010 at 9:58 AM, Josh C. [email protected]
wrote:

Your code works great! I knew there had to be a more elegant way to do

$3 , $4 , $5 , $6
def test_delete_extra_decimal
def test_delete_extra_or_misplaced_e_or_E
assert_equal ‘4.67e67’ , clean_string(‘4.67e6e-7’)
assert_equal ‘+4.67e-7’ , clean_string(‘+e4.67e-7’)
end
end

Found a bug, the [^Z] in the last caputre group should be a [^\Z] (or
you
prefer, you could just swap it out with .* I don’t know if it makes a
difference, I just usually try to match based on the next thing I want
to
hit, in this case it’s the end of the string).

Here is another version, it does the same thing, but I think it’s
prettier.
I swapped out the plusses for << because they’re much quicker when you
don’t
need a new object.

def digits_only(str)
str.gsub /[^0-9]/ , ‘’
end

def clean_string(str)
str =~
/\A([-+]?)([eE]?)([^eE.])(.?)([^eE])((?:[eE][±]?)?)([^\Z]*)\Z/
$1 << digits_only($3) << $4 << digits_only($5) << $6 <<
digits_only($7)
end

And here is the same thing, but it assigns them to variables first. It’s
uglier, but if you have to sort through it later, it can be nice to know
what the regex is supposed to be capturing.

def digits_only(str)
str.gsub /[^0-9]/ , ‘’
end

def clean_string(str)
str =~
/\A([-+]?)([eE]?)([^eE.])(.?)([^eE])((?:[eE][±]?)?)([^\Z]*)\Z/
posneg , misplaced_e , before_dec , dec , after_dec , e ,
exponent =
$1 , $2 , digits_only($3) , $4 , digits_only($5) , $6 ,
digits_only($7)
$1 << digits_only($3) << $4 << digits_only($5) << $6 <<
digits_only($7)
end