Newbie regexp question

Hello,

I’m trying to split a formatted text file into four separate columns.
The data is comprised of lines of text that are bundled into four
distinct columns, corresponding to a “Required versus Optional”
variable, a requirement number, a requirement classification (R1=Rev 1,
F=Future, I=Internal), and a textual description of the requirement.

My raw data looks like this in the input text file:

R [01] R1 The system shall support “emergency call processing”
R [02] R1 The system shall support “local call processing”
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call is
active.

I’ve set up a loop to process each line in the input file, and what I’d
like to get is four separate variables containing on a line-by-line
basis the data corresponding to the four distinct columns. The problem
is my regexp experience is next to nothing, and I can’t figure out how
to extract the data I want since my fourth column contains whitespace
(I’d have used that as my column separator otherwise).

Here’s my loop:

File.open(textfile, “r”) do |input_file|
while line = input_file.gets
output_file << line
end
end

What can I replace the simple copy statement (output_file << line) with
in order to get what I want?

Thanks in advance, I hope this question makes some sense.

James

James C. wrote:

R [01] R1 The system shall support “emergency call processing”
R [02] R1 The system shall support “local call processing”
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call is
active.

try this one

open(“file”).read.scan(/(\w)\s+(.+?)\s+(\w+)\s+(.*?)\n?$/){|req,num,cls,dsc|
…}

lopex

On Sep 14, 2006, at 6:37 PM, James C. wrote:

What can I replace the simple copy statement (output_file << line)
with
in order to get what I want?

My wife, Dana Gray, is still learning Ruby so I gave her this problem
as a test. :wink: She suggests the code below.

James Edward G. II

DATA.each do |line|
line =~ /^(\w)\s+(\S+)\s+(\S+)\s+(.+)/
p [$1, $2, $3, $4]
end

END
R [01] R1 The system shall support “emergency call processing”
R [02] R1 The system shall support “local call processing”
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call
is active.

Marcin MielżyÅ?ski wrote:

Ooops,

the newline in regexp is not needed…

try this one

open(“file”).read.scan(/(\w)\s+(.+?)\s+(\w+)\s+(.*?)$/){|req,num,cls,dsc|
…}

lopex

lopex

On 14-Sep-06, at 7:37 PM, James C. wrote:

active.
Here’s my loop:

Thanks in advance, I hope this question makes some sense.

You have a number of options - if your data is tab delimited (i.e.
the first “two” coluumns are really one):

s = ‘R [01] R1 The system shall support “emergency call processing”’
p s.split(/\t/)

=> [“R [01]”, “R1”, “The system shall support "emergency call
processing"”]

or you can just split on whitespace and specify a limit on the number
of fields:

s = ‘R [01] R1 The system shall support “emergency call processing”’
p s.split(/\s+/, 4)

=> [“R”, “[01]”, “R1”, “The system shall support "emergency call
processing"”]

Or you can use a regex (ick :wink:

Hope this helps,

Mike

Mike S. [email protected]
http://www.stok.ca/~mike/

The “`Stok’ disclaimers” apply.

I suck at regex too, I tried this as an exercise and came up with the
below. It’s less concise than previous solutions, but it works as far
as I can tell:

Row = Struct.new(:col1, :col2, :col3, :col4)
rows = Array.new()
regex = /([A-Z])\s([[0-9]+])\s([A-Z1-9]+)\s(.+)/

File.open(“file.txt”) do |file|
while (line = file.gets)
m = line.match(regex)
rows << Row.new(m[1], m[2], m[3], m[4])
end
end

puts rows.flatten

#output =>

#<struct Row col1=“R”, col2="[01]", col3=“R1”, col4=“The system shall
support “emergency call processing””>
#<struct Row col1=“R”, col2="[02]", col3=“R1”, col4=“The system shall
support “local call processing””>
#<struct Row col1=“R”, col2="[08]", col3=“F”, col4=“The system shall
provide a command-line user interface”>
#<struct Row col1=“R”, col2="[723]", col3=“F”, col4=“The system shall
provide 6 10/100/1000 Ethernet interfaces”>
#<struct Row col1=“R”, col2="[11]", col3=“F”, col4=“The system shall
support VoIP networks”>
#<struct Row col1=“R”, col2="[398]", col3=“R1”, col4=“The system shall
contain 2 control boards”>
#<struct Row col1=“O”, col2="[327]", col3=“I”, col4=“The system should
support hotswapping of all internal boards”>
#<struct Row col1=“R”, col2="[19]", col3=“I”, col4=“The system shall be
able to detect transmission errors”>
#<struct Row col1=“R”, col2="[631]", col3=“F”, col4=“The system shall
continue processing data as long as a call is active.”>

-Steven