External to internal conversion

wpvanpaassen · January 16, 2007, 4:17pm

Hi,

I have been following this list off and on but have not found the
solution to my problem. I want to convert data supplied by a file into
their corresponding internal values.

I have an external file with data that are in the form “variable name”,
tab, followed by their value. The 'value 'can be a single number, an
array of numbers, or a matrix of numbers, For example, the file might
look like this:
a 2.4
b 1,0, 3.0, 5.0
c 5, 6, 11
d [1, 2], [5, 6], [45,55]
e 2*[7, 8]

I can get a, b, and c to convert to single values and arrays, thanks to
this forum, but I cannot figure out a way to convert d into an array of
arrays. I haven’t taclked e yet. Does anyone have any suggestions?

Ideally I would like to match the variable name in the file to the
corresponding and same name already coded in the program. Any ideas how
to do that?

Thanks

Pete Versteegen

wpvanpaassen · January 16, 2007, 7:45pm

Hello All,

I like to rephrase my problem below by the following interactive
session:

x = “1, 12, 500”
=> “1, 12, 500”

a = x.split(",").map {|k| k.to_i}
=> [1, 12, 500]

y = “[25.0, 26], [30, 31]”
=> “[25.0, 26], [30, 31]”

b = y.split(",").map {|k| k.to_i}
=> [0, 26, 0, 31]

I get what I want in the 4th line, an array of numbers
I don’t get, nor did I expect to, what I want in the last line, i.e., an
array of arrays.
How can I convert to an array of arrays?

Thanks,

Peter V. wrote:

Hi,

I have been following this list off and on but have not found the
solution to my problem. I want to convert data supplied by a file into
their corresponding internal values.

I have an external file with data that are in the form “variable name”,
tab, followed by their value. The 'value 'can be a single number, an
array of numbers, or a matrix of numbers, For example, the file might
look like this:
a 2.4
b 1,0, 3.0, 5.0
c 5, 6, 11
d [1, 2], [5, 6], [45,55]
e 2*[7, 8]

I can get a, b, and c to convert to single values and arrays, thanks to
this forum, but I cannot figure out a way to convert d into an array of
arrays. I haven’t taclked e yet. Does anyone have any suggestions?

Ideally I would like to match the variable name in the file to the
corresponding and same name already coded in the program. Any ideas how
to do that?

Thanks

Pete Versteegen

wpvanpaassen · January 16, 2007, 7:52pm

On 1/16/07, Peter V. [email protected] wrote:

b = y.split(“,”).map {|k| k.to_i}
=> [0, 26, 0, 31]

I get what I want in the 4th line, an array of numbers
I don’t get, nor did I expect to, what I want in the last line, i.e., an
array of arrays.
How can I convert to an array of arrays?

input = “[25.0, 26], [30, 31]”
=> “[25.0, 26], [30, 31]”
input.scan(/\d+.?\d*/)
=> [“25.0”, “26”, “30”, “31”]

Something like that?

wpvanpaassen · January 16, 2007, 9:51pm

On 1/16/07, Peter V. [email protected] wrote:

Wilson,

I was looking more for an output like [[25.0, 26], [30, 31]], an aray
of two arrays.

Thanks,

Probably easiest to do it in two steps, since bracket-matching is a
real chore in a regular expression.

input = “[25.0, 26], [30, 31]”
=> “[25.0, 26], [30, 31]”
groups = input.scan(/[.?]/)
=> [“[25.0, 26]”, “[30, 31]”]
output = groups.map {|g| g.scan(/\d+.?\d/).map {|n| n.to_i} }
=> [[25, 26], [30, 31]]

Do you really need 25.0 instead of 25? Mixing integers and floats can
be a bad idea. Adjust the ‘to_i’ call accordingly.

wpvanpaassen · January 16, 2007, 11:35pm

If your data really is in that format, you can add an extra set of
brackets around the whole thing to read it in as an array literal.
Just be sure that the data source is trustworthy and you if you’re
going to eval it! If you aren’t generating or strongly filtering the
inputs this is probably a bad idea.

irb> input = “[25.0, 26], [30, 31]”
=> “[25.0, 26], [30, 31]”
irb> input = “[” + input + “]”
=> “[[25.0, 26], [30, 31]]”
irb> eval input
=> [[25.0, 26], [30, 31]]
irb> result = eval input
=> [[25.0, 26], [30, 31]]
irb> result[0][1]
=> 26

But maybe you were just simplifying the input data format to present
the problem?

wpvanpaassen · January 17, 2007, 12:03am

Thanks Wilson,

That did it for me. I know not to mix integers and floats; that was a
typing error. I know very little about regular expressions. Could you
put in words what you did?

Thanks again.

Wilson B. wrote:

On 1/16/07, Peter V. [email protected] wrote:

Wilson,

I was looking more for an output like [[25.0, 26], [30, 31]], an aray
of two arrays.

Thanks,

Probably easiest to do it in two steps, since bracket-matching is a
real chore in a regular expression.

input = “[25.0, 26], [30, 31]”
=> “[25.0, 26], [30, 31]”
groups = input.scan(/[.?]/)
=> [“[25.0, 26]”, “[30, 31]”]
output = groups.map {|g| g.scan(/\d+.?\d/).map {|n| n.to_i} }
=> [[25, 26], [30, 31]]

Do you really need 25.0 instead of 25? Mixing integers and floats can
be a bad idea. Adjust the ‘to_i’ call accordingly.

wpvanpaassen · January 16, 2007, 9:29pm

Wilson,

I was looking more for an output like [[25.0, 26], [30, 31]], an aray
of two arrays.

Thanks,

Pete Versteegen

wpvanpaassen · January 17, 2007, 12:21am

Thanks Jim for another approach.

I saw the use of “eval” in Jim Fulton’s “The Ruby Way” but didn’t quite
understand it. I do plan to filter the data to make sure that the
numbers are there and the format is consistent.

The input data format is a natural and easy way to specify the problem
I’m trying to solve, so I’ll keep the format but do the evaluation
inside the code.

Thanks again.

Pete Versteegen

Jim L. wrote:

If your data really is in that format, you can add an extra set of
brackets around the whole thing to read it in as an array literal.
Just be sure that the data source is trustworthy and you if you’re
going to eval it! If you aren’t generating or strongly filtering the
inputs this is probably a bad idea.

irb> input = “[25.0, 26], [30, 31]”
=> “[25.0, 26], [30, 31]”
irb> input = “[” + input + “]”
=> “[[25.0, 26], [30, 31]]”
irb> eval input
=> [[25.0, 26], [30, 31]]
irb> result = eval input
=> [[25.0, 26], [30, 31]]
irb> result[0][1]
=> 26

But maybe you were just simplifying the input data format to present
the problem?

wpvanpaassen · January 17, 2007, 6:16am

On 1/16/07, Peter V. [email protected] wrote:

Thanks Wilson,

That did it for me. I know not to mix integers and floats; that was a
typing error. I know very little about regular expressions. Could you
put in words what you did?

Sure.

groups = input.scan(/[.*?]/)

The forward slashes just delimit the regular expression
[ and ] are characters with special meaning in regexps; they create
‘character classes’. [a-z] means, for example, every lowercase letter
The backslashes escape them, because we mean literal brackets

.* means ‘any character at all, zero or more times’
The ? after it means, ‘match that as FEW times as possible’.
Without that, we would get everything up to the last ], when what we
want is everything up to the NEXT ].

Together that expression means ‘an opening bracket, anything
(or nothing), and then a closing bracket’

Next we take the array of strings we just built, and hand scan it again.
This time, the pattern is: \d+.?\d*
\d means “a digit”, and is shorthand for [0-9], which would be the
class of all digits, 0 to 9.

after an expression means ‘1 or more times’. (Note the difference
between this and , which means 0 or more.)
‘.’ is a reserved character meaning ‘any single character’, so we
again escape it here with a backslash.
.? means ‘a period, zero or one times’
Finally, we have: \d, which means ‘zero or more consecutive digits’

Combined, \d+.?\d* means ‘a series of digits, optionally followed by
a decimal point and more digits’. Arguably that pattern needs to be
enhanced, because at the moment it would match “25.”
You only get so much from a free email, though. Heh

To sum up; first scan for “[ stuff ]” and then scan the contents of
those for things that look like numbers.