Gathering Links

Hello,
I am looking for some help on a regex expression. I would like a regexp
that matches against Html Links. I have tried, but I can’t seem to get
anything working. I would appreciate help.

Thanks
Joey

joey__ wrote:

Hello,
I am looking for some help on a regex expression. I would like a regexp
that matches against Html Links. I have tried, but I can’t seem to get
anything working. I would appreciate help.

You might just want to run the HTML through htmltidy
to generate an XML document and parse that or then use
the htree library for the same purpose, it would probably
be the more robust solution.

On the other hand, if you want to use regexps,
something like this would work (though not tested).

First you have to match the beginning tag
(there might be some whitespace:

/<\s*a

Next, gather any attributes in the opening tag:

([^>]*)>

The link text comes next:

(.*?)

The text section is ended by the closing anchor
tag (no other tags are appropriate):

<\s*/\sa[^>]>

Finally, we want to match case-insensitively
(A vs. a) and over multiple lines:

/im

So, $1 will be the attributes and $2 the link text.

Thanks
Joey

E