Returning text between two markers without markers included

scudco · March 16, 2008, 9:02pm

Hi if have

str = “ruby is ((Great))”

how do i use regex to find text between the start marker (( and end
marker ))?

Im new to regex but I tried this

((([^((|^))]*)))

but it includes the markers in the capture i.e. it gives ((Great)) when
i just want Great.

Ive tried searching the board and googling but couldnt find anything
suitable.

scudco · March 16, 2008, 9:24pm

Thanks Xavier,

I haven’t seen the /mx term before, is that reponsible for not including
the markers themselves?

scudco · March 16, 2008, 9:18pm

On Mar 16, 2008, at 21:02 , Adam A. wrote:

((([^((|^))]*)))

but it includes the markers in the capture i.e. it gives ((Great))
when
i just want Great.

The fact that the parens are delimiters obscures the regexp a bit.
Assuming “))” ends the text to extract unconditionally you can simply
use .*? like this

irb(main):001:0> "ruby is ((Great))".match(/ \(\( (.*?) \)\) /mx)[1]
=> "Great"

– fxn

scudco · March 16, 2008, 10:03pm

ahh just noticed the inclusion of the ? to make it restricted. I
understand your use of groups there with [1] but not to sure of that
syntax. I was thinking of using string.slice or string.scan to return
the text but not sure how i would do so with groups. How would i go
about doing this???

scudco · March 16, 2008, 10:22pm

On Mar 16, 2008, at 21:23 , Adam A. wrote:

I haven’t seen the /mx term before, is that reponsible for not
including
the markers themselves?

Those are two regexp modifiers stacked together:

With /m the dot matches newlines. I couldn’t assume the text to
extract doesn’t contain newlines so I added it just in case.
With /x literal whitespace in the regexp are ignored. Since the
regexp uses so many backslashes that gives some air for readability.

As for the other mail String#match returns a MatchData object. Those
objects support indexing by [], and the first capture is at index 1.

scudco · March 16, 2008, 10:59pm

I just wondered, if I had multiple marked sections in a string how would
i capture all of them?

So if my sentance was

“a bannana is a type of ((fruit)) and a dog is a type of ((animal))” how
could I store fruit and animal for later use via regex?

scudco · March 16, 2008, 11:27pm

On Mar 16, 2008, at 4:59 PM, Adam A. wrote:

I just wondered, if I had multiple marked sections in a string how
would
i capture all of them?

So if my sentance was

“a bannana is a type of ((fruit)) and a dog is a type of ((animal))”
how
could I store fruit and animal for later use via regex?

irb(main):004:0> str = “a bannana is a type of ((fruit)) and a dog is
a type of ((animal))”
=> “a bannana is a type of ((fruit)) and a dog is a type of ((animal))”

irb(main):005:0> str.scan(/(((.*?))
)/) => [[“fruit”],
[“animal”]]
irb(main):006:0>

scudco · March 16, 2008, 10:29pm

ahhh thats great. I like the readability one. ill be using that a lot
from now on and ill use match instead of scan and slice for this
particular problem. Thanks Xavier.

scudco · March 17, 2008, 12:37am

great, closer to solving my problem. Though i realised that this regex
wouldnt work if the marked text was split by a newline so i went away
and modified it so that if it were split it would still be picked up. I
did it with this

({2}(?s)(.*?)(?s)){2}

Im wondeirng if theres a neater way of sayinig “ignore any newlines that
split the marked text up”

is there an operator that tells it to ignore newlines and is the above
robust?

Thanks so much for the help so far.

scudco · March 16, 2008, 11:43pm

On Sun, Mar 16, 2008 at 4:59 PM, Adam A. [email protected]
wrote:

I just wondered, if I had multiple marked sections in a string how would
i capture all of them?

So if my sentance was

“a bannana is a type of ((fruit)) and a dog is a type of ((animal))” how
could I store fruit and animal for later use via regex?

There’s #scan…

(str.scan /({2} (.*?) ){2}/x).flatten
=> [“fruit”, “animal”]

Todd

scudco · March 17, 2008, 1:04am

On Sun, Mar 16, 2008 at 6:37 PM, Adam A. [email protected]
wrote:

is there an operator that tells it to ignore newlines and is the above
robust?

I don’t know about robustness, but throwing an ‘m’ after the regex
like Xavier did might do the trick…

(str.scan /({2} (.*?) ){2}/mx).flatten

…which could also be written as…

str.scan(/({2} (.*?) ){2}/mx).flatten

If you are worried about spaces on either side of the

Notice the m and x after the last forward slash. If you are concerned
about there being spaces on either side of the string between (( and
)), then…

str.scan(/{2} \s*(.?)\s /){2}/mx).flatten

Todd

scudco · March 17, 2008, 4:18am

Hi –

On Mon, 17 Mar 2008, Xavier N. wrote:

With /x literal whitespace in the regexp are ignored. Since the regexp uses
so many backslashes that gives some air for readability.

I know I’m in the minority, but I’ll just mention that I find most
regexes that make use of /x very hard to read. The reason is that I’ve
trained my brain how to read a pattern, so if I encounter this:

/ string (.*) another ? string /x

it’s a considerable effort to “not see” the spaces. I think it’s
better to stick to the basic pattern language, which after all is the
only set of rules that we all learn and all share.

I would recommend saving /x for cases where you want to break a regex
out into multiple lines and include comments:

re = /
( # opening paren
\d{3} # area code
) # closing paren

etc. (Not a great example of an obscure pattern that’s made more clear
by /x but you get the idea.)

David