I have some text documents containing series of questions and answers.
I need to extract all questions and answers to load a database.
No problem in reading the text document or writing to the database.
I need help with the regexps to parse the document (please note that
some questions and answers extend on two or more lines).
This is an example of the documents I have to deal with:
*1) Is this the first question ?
a. Yes
b. No
*2) Is this question composed
of two lines ?
a. Yes, indeed
b. Maybe
c. I dont’ know
Many thanks in advance,
Bruno
On May 1, 2008, at 3:41 AM, Brubix wrote:
I have some text documents containing series of questions and answers.
I need to extract all questions and answers to load a database.
No problem in reading the text document or writing to the database.
I need help with the regexps to parse the document (please note that
some questions and answers extend on two or more lines).
I would probably do it without leaning on regular expressions in this
case:
#!/usr/bin/env ruby -wKU
DATA.each("") do |qna|
answers = qna.to_a
question = answers.shift
question << answers.shift until question.strip[-1] == ??
puts "Question: #{question}"
puts "Answers:"
puts answers
end
END
*1) Is this the first question ?
a. Yes
b. No
*2) Is this question composed
of two lines ?
a. Yes, indeed
b. Maybe
c. I dont’ know
If you really want the regular expression though, this seems to work:
#!/usr/bin/env ruby -wKU
DATA.read.scan(/^(*\d+) [\s\S]+??) *\n((?:^[a-z]. .+?\n)+)/m) do
|q, a|
puts “Question: #{q}”
puts “Answers:”
puts a.to_a
end
END
*1) Is this the first question ?
a. Yes
b. No
*2) Is this question composed
of two lines ?
a. Yes, indeed
b. Maybe
c. I dont’ know
Hope that helps.
James Edward G. II
On May 1, 3:41 am, Brubix [email protected] wrote:
I have some text documents containing series of questions and answers.
I need to extract all questions and answers to load a database.
No problem in reading the text document or writing to the database.
I need help with the regexps to parse the document (please note that
some questions and answers extend on two or more lines).
This is an example of the documents I have to deal with:
qa = "*1) Is this the first question ?
a. Yes
b. No
*2) Is this question composed
of two lines ?
a. Yes, indeed
b. Maybe
c. I dont’ know"
qa.scan(/(*\d+).??.?)([^])/m).map do |q, ans|
[q, ans.scan(/\n([a-z]..*)/).flatten]
end
=> [[“*1) Is this the first question ?”, [“a. Yes”, “b. No”]], [“*2)
Is this question composed\nof two lines ?”, [“a. Yes, indeed”, “b.
Maybe”, "c. I dont’ know "]]]
That makes various assumptions about the appearance of ‘*’ and how
answers start/end, but it should be a start.
All the proposed solutions work perfectly !
Thanks to both of you.