Forum: Ruby how to extract something in between a pattern

Abd9008cc291fdbbed08595979598b90?d=identicon&s=25 Kn Ta (horizon)
on 2017-02-17 12:55
Was wondering how to extract certain data from a text file using Ruby.
For example, my text file has:

"Response by <a
href="https://www.helloworld.com/body/test_service_ltd">... Service
Limited</a> to <a href="https://www.helloworld.com/user/Joe_Bloggs_3">Jo
Bloggs</a> on <time datetime="2016-09-13T14:43:42+01:00"
title="2016-09-13 14:43:42 +0100">13 September 2016</time>.

Follow up sent to <a
href="https://www.helloworld.com/body/test_service_ltd">... Service
Limited</a> by <a href="https://www.helloworld.com/user/Jane_doe_4">Jane
Doe</a> on <time datetime="2017-02-03T16:48:38+00:00" title="2017-02-03
16:48:38 +0000"> 3 February 2017</time>."

How can I extract 'Joe_Bloggs_3' and the date '2016-09-13', and
'Jane_doe_4' and '2017-02-03' and so on..

I need to write the extracted data to an output file. So the output is:

Joe_Bloggs_3, 2016-09-13
Jane_doe_4, 2017-02-03
0fa73332c8e4a3b06ea439fd3f034322?d=identicon&s=25 Ronald Fischer (rovf)
on 2017-02-20 11:02
There are several ways to do it with regular expressions, but in any
case, the patterns you want to extract needs to be enclosed in
parentheses (which makes them capturing groups).

One way would then be to use String#match on your input string (see
http://ruby-doc.org/core-1.9.3/String.html#method-i-match), which
returns an object of type MatchData. The example in the aforementioned
URL shows how you can extract the matched strings from the MatchData
object.
B078cb4f4fb473c7a54d1fc36d10c70e?d=identicon&s=25 Regis d'Aubarede (raubarede)
on 2017-02-20 16:42
Kn Ta wrote in post #1185565:

txt=<<EEND
Response by <a
href="https://www.helloworld.com/body/test_service_ltd">... Service
Limited</a> to <a href="https://www.helloworld.com/user/Joe_Bloggs_3">Jo
Bloggs</a> on <time datetime="2016-09-13T14:43:42+01:00"
title="2016-09-13 14:43:42 +0100">13 September 2016</time>.

Follow up sent to <a
href="https://www.helloworld.com/body/test_service_ltd">... Service
Limited</a> by <a href="https://www.helloworld.com/user/Jane_doe_4">Jane
Doe</a> on <time datetime="2017-02-03T16:48:38+00:00" title="2017-02-03
16:48:38 +0000"> 3 February 2017</time>.
EEND

txt.gsub(/\r?\n/,"").scan(/\b(href|datetime)="(.*?)"/).
 each_slice(2) do |(k,v),(k1,v1)|
  p [v.split('/').last,v1.split('T').first] if k=="href" &&
k1=="datetime"
end

==
["Joe_Bloggs_3", "2016-09-13"]
["Jane_doe_4", "2017-02-03"]
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.