Marc H. wrote:
I have a slight problem. I have strings with some tags such as
‘name:</>’
I need to match “name:” and “lightblue”
In other words:
- What is between <> </>
and
- What is inside the first <> right next to “name:”
The following regex does not work:
‘name:</>’ =~ /<([a-zA-Z]+)>(.+?)</>/
$1 # => “b”
This is your string:
‘name:</>’
and the first part of your regex says to look for a ‘<’, followed by one
or more characters, followed by a ‘>’. That certainly describes the
string ‘’.
$2 # => "name:
This is your string again:
’ <–already matched this
name:</>’
The second part of your regex says to look for a ‘<’, followed by any
character one or more times, followed by ‘</>’. That certainly
describes the string ‘name</>’.
Note that since the characters ‘</>’ only appear once in your string,
the non-greedy qualifier has no effect. By default, regex’s are greedy,
so if your string looked like this:
‘name:</>xxxxxxxxxxxxxxx</>’
then the greedy version of your regex:
/>(.+)</>/ <----(no ‘?’)
would match:
name:</>xxxxxxxxxxxxxxx</>
That’s because the portion:
name:</>xxxxxxxxxxxxxxx
is interpreted as “any character(.) one or more times(+)”.
On the other hand, your non-greedy regex(i.e. with the ‘?’) would match:
name:</>
If you examine your string again:
‘name:</>’
the ‘lightblue’ substring is preceded by the characters ‘><’, and that
is different from what precedes ‘b’. You can use that fact to get
‘lightblue’ instead of ‘b’. This regex will get ‘lightblue’:
<([^>]+)
That says to look for ‘><’ followed by one or more characters that are
not a ‘>’. That will match:
‘><lightblue’
To get ‘name:’, you can do something similar. This is the rest of the
string after ‘lightblue’:
‘>name:</>’
Here is a regex to get ‘name:’:
([^<]+)
That says to look for a ‘>’, followed by one or more characters that are
not a ‘<’. Here it is altogether:
pattern = /><([^>]+)>([^<]+)/
str = “name:</>”
match_obj = pattern.match(str)
puts match_obj[1]
puts match_obj[2]
–output:–
lightblue
name: