On Oct 18, 2007, at 6:28 AM, Ranieri T. wrote:
Hi, I need a concise regexp pattern to validate a complete http
URL. Can someone help me in this?
Thanks,
–
Ranieri Barros Teixeira
Graduando em Ciência da Computação - Faculdade de Computação -
Universidade Federal do Pará (UFPA)
Grupo de Pesquisa LABES.UFPA - http://www.labes.ufpa.br/ranieri
Well, I don’t know if “concise” is the right word, but a regexp in
http://www.ietf.org/rfc/rfc2396.txt (page 29) will help you:
The following line is the regular expression for breaking-down a URI
reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e.,
each
paired parenthesis). We refer to the value matched for
subexpression
as $. For example, matching the above expression to
http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related
where <undefined> indicates that the component is not present, as is
the case for the query component in the above example.
Therefore, we
can determine the value of the four components and fragment as
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
and, going in the opposite direction, we can recreate a URI
reference
from its components using the algorithm in step 7 of Section 5.2.
Rob B. http://agileconsultingllc.com
[email protected]
+1 513-295-4739
Skype: rob.biedenharn