Parsing Data from Row using Regexp

Sadaf_N · July 30, 2013, 5:13pm

Hi All,

The following line is a row from one of my parsing routines:

137Tulsa138Bowling Green State-110-110

Here’s another example as well:

307Illinois State308Ball State-120105

As a breakdown of what these should look like as far as information
goes:

137
Tulsa
138
Bowling Green State
-110
-110

307
Illinois State
308
Ball State
-120
105

I’ve been trying to use regexp to break these down but I’m not having
much luck. The schema for these are:

Rotation Number Team 1
Team 1 Name
Rotation Number Team 2
Team 2 Name
Moneyline Team 1
Moneyline Team 2

What suggestion would you offer me on trying to parse through this line
to segment out the information I need?

Thanks.

elricstorm · July 30, 2013, 5:28pm

If your numeric digits are always 3, then this would work:

/^(\d{3})([a-z\s]+)(-?\d{3})([a-z\s]+)(-?\d{3})(-?\d{3})$/i

elricstorm · July 30, 2013, 5:52pm

Joel P. wrote in post #1117153:

If your numeric digits are always 3, then this would work:
Rubular: ^(\d{3})([a-z\s]+)(\-?\d{3})([a-z\s]+)(\-?\d{3})(\-?\d{3})$

/^(\d{3})([a-z\s]+)(-?\d{3})([a-z\s]+)(-?\d{3})(-?\d{3})$/I

I was very close but your regexp is accurate. Thank you so much.

elricstorm · July 30, 2013, 6:12pm

To account for multiple books and moneylines, I did:

@rotation_line = /^(\d{3})([a-z\s]+)(-?\d{3})([a-z\s]+)
(-?\d{3})(-?\d{3})(-?\d{3})?(-?\d{3})?(-?\d{3})?(-?\d{3})?
(-?\d{3})?(-?\d{3})?(-?\d{3})?(-?\d{3})?(-?\d{3})?(-?\d{3})?
(-?\d{3})?(-?\d{3})?(-?\d{3})?(-?\d{3})?$/I

Which should cover up to 10 games and show some empty matches for lines
that aren’t updated yet.

Thanks again.

elricstorm · July 30, 2013, 6:30pm

That’s why I used “^” and “$”, so the pattern will repeat for as many
lines as necessary when using String#scan
You can make parts of the expression optional if you need to.

That looks like a hideous way to store and reference data. I’d look into
alternatives if possible.

elricstorm · July 30, 2013, 7:39pm

I added some optional entries and so far it’s testing out fine. If I
could find an alternative way I would but scraping this particular site
is pretty hideous unto itself. The good news is I did add in some
optional parameters and it’s moving along fairly fast.

I do not normally use this long of a regexp but in this particular case,
it’s a temporary bandaid until other alternatives are found.