Data type for a string value

when humans look at a string, they will know whether it is a date, a
float, a percentage, or a currency, or simply a string.

for example, when you see, 3,232, you will know it is meant to be a
number.

I wonder if there is an easy way in ruby to figure out the data type for
a
string. I guess I could let the string go through a list of regular
expressions – but if I have a lot of strings to process, it is likely
to
be costly.

any ideas on a good solution?

On Mon, Oct 18, 2010 at 3:53 AM, Jack Su [email protected] wrote:

when humans look at a string, they will know whether it is a date, a
float, a percentage, or a currency, or simply a string.

for example, when you see, 3,232, you will know it is meant to be a
number.

While true, what sort of number? Is it “three point two three two”, or
“three thousand two hundred thirty 3”?

(Germany uses “.” to format large numbers into three digit chunks,
while the UK and US use the “,”. Floats use the “,” in Germany, while
the US and UK use the “.”.)

The type of data we see gets interpreted depending on context. In the
us a 2x4 has different dimensions than in almost every other nation in
the world.

So, you are already stumbling into localization issues, which can be
compounded by standards, the use of SI units vs Imperial, vs
colloquial units, and so on, and so forth. :wink:

Parsing your data, number, e&. data with the help of a localization
library would get you quite a ways, already.

I wonder if there is easy way in ruby to figure out the data type for a
string. I guess I could let the string go through a list of the regular
expression – but if I have a lot of strings to process, it is likely to
be costly.

It also breaks if you get data in formats that you didn’t anticipate.

And I don’t have a good solution for this, apart from forcing
particular data entry, but then you already know which field you get
contains which datatype.


Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

This problem is too generic. Are we talking only numbers vs strings or
are there more data types involved? If the former, do you need to tell
integers from floats? Are the conventions on number formatting known
beforehand?

Xavier N. wrote in post #955025:

This problem is too generic. Are we talking only numbers vs strings or
are there more data types involved?

as I mentioned, all data types, including dates, data times, percentage,
currency, etc. for example, there can be “$ 1.02”. basically values you
can see in a spreadsheet.

If the former, do you need to tell
integers from floats?

No.

Are the conventions on number formatting known
beforehand?

No, and Yes. there are a set of them, such as ‘1,234’ and ‘2010-09-09’
and ‘09/09/2010’. That is the point. a human can see it right away, but
a computer will need to figure this out.

On 10/17/2010 08:00 PM, Jack Su wrote:

No, and Yes. there are a set of them, such as ‘1,234’ and ‘2010-09-09’
and ‘09/09/2010’. That is the point. a human can see it right away, but
a computer will need to need to figure this out.

Regexes may not be as costly as you assume for simple formats like
these. Or did you run benchmarks?

On 10/18/2010 05:00 AM, Jack Su wrote:

Xavier N. wrote in post #955025:

Are the conventions on number formatting known
beforehand?

No, and Yes. there are a set of them, such as ‘1,234’ and ‘2010-09-09’
and ‘09/09/2010’. That is the point. a human can see it right away, but
a computer will need to need to figure this out.

The usual solution to this is to expect a number of types (possibly only
one), parse them and deal with them as needed. Even if there would be a
magical mechanism which would determine arbitrary types you would still
need code to process each type properly.

What kind of problem are you really trying to solve?

Kind regards

robert

On Tue, Oct 19, 2010 at 12:06 AM, Jack Su [email protected] wrote:

The usual solution to this is to expect a number of types (possibly only
one), parse them and deal with them as needed. Even if there would be a
magical mechanism which would determine arbitrary types you would still
need code to process each type properly.

it would be nice to have a method String#parse.

‘12/31/1999’.parse() => a date object
‘$1.23’.parse() => a float object
‘32.1%’.parse() => a float object

No, that would be a catastrophe! Class String is responsible for
string handling in general. Everybody has different requirements for
parsing strings (if at all) so there would be no way to implement this
in the standard library in any reasonable way. Just think about types
which have different representations (e.g. Time and DateTime, Float
and BigDecimal) let alone the question how many types are detected.
As Xavier said already: the problem is too generic.

What kind of problem are you really trying to solve?

for parsing data from spreadsheets (or csvs for that matter) to sort
data based on their types, e.g. 21 > 3 not ‘21’ < ‘3’.

If you directly pull it from a spreadsheet there seems to be a minimal
chance that you can obtain type information from it. If you pull it
from CSV you need to define which types you want to recognize and code
accordingly.

Kind regards

robert

The usual solution to this is to expect a number of types (possibly only
one), parse them and deal with them as needed. Even if there would be a
magical mechanism which would determine arbitrary types you would still
need code to process each type properly.

it would be nice to have a method String#parse.

‘12/31/1999’.parse() => a date object
‘$1.23’.parse() => a float object
‘32.1%’.parse() => a float object

What kind of problem are you really trying to solve?

for parsing data from spreadsheets (or csvs for that matter) to sort
data based on their types, e.g. 21 > 3 not ‘21’ < ‘3’.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs