Boyd, Todd M. wrote: >> -----Original Message----- >> From: farn...@googlemail.com [mailto:farn...@googlemail.com] On Behalf >> Of Edmund Hertle >> Sent: Thursday, January 15, 2009 4:13 PM >> To: PHP - General >> Subject: [PHP] Parsing HTML href-Attribute >> >> Hey, >> I want to "parse" a href-attribute in a given String to check if there >> is a >> relative link and then adding an absolute path. >> Example: >> $string = '<a class="sample" [...additional attributes...] >> href="/foo/bar.php" >'; >> >> I tried using regular expressions but my knowledge of RegEx is very >> limited. >> Things to consider: >> - $string could be quite long but my concern are only those href >> attributes >> (so working with explode() would be not very handy) >> - Should also work if href= is not using quotes or using single quotes >> - link could already be an absolute path, so just searching for href= >> and >> then inserting absolute path could mess up the link >> >> Any ideas? Or can someone create a RegEx to use? > > Just spitballing here, but this is probably how I would start: > > RegEx pattern: /<a.*? href=(.+?)>/ig > > Then, using the capture group, determine if the href attribute uses quotes > (single or double, doesn't matter). If it does, you don't need to worry about > splitting the capture group at the first white space. If it doesn't, then you > must assume the first whitespace is the end of the URL and the beginning of > additional attributes, and just grab the URL up to (but not including) the > first whitespace. > > So... > > <?php > > # here is where $anchorText (text for the <a> tag) would be assigned > # here is where $curDir (text for the current directory) would be assigned > > # find the href attribute > $matches = Array(); > preg_match('#<a.*? href=(.+?)>#ig', $anchorText, $matches); > > # determine if it has surrounding quotes > if($matches[1][0] == '\'' || $matches[1][0] == '"') > { > # pull everything but the first and last character > $anchorText = substr($anchorText, 1, strlen($anchorText) - 3); > } > else > { > # pull up to the first space (if there is one) > $spacePos = strpos($anchorText, ' '); > if($spacePos !== false) > $anchorText = substr($anchorText, 0, strpos($anchorText, ' ')) > } > > # now, check to see if it is relative or absolute > # (regex pattern searches for protocol spec (i.e., http://), which will be > # treated as an absolute path for the purpose of this algorithm) > if($anchorText[0] != '/' && preg_match('#^\w+://#', $anchorText) == 0) > { > # add current directory to the beginning of the relative path > # (nothing is done to absolute paths or URLs with protocol spec) > $anchorText = $curDir . '/' . $anchorText; > } > > echo $anchorText; > > ?> > > ...UNTESTED. > > HTH, > > > // Todd
Wow, that's alot! This should work with or without quotes and assumes no spaces in the URL: $prefix = "http://example.com/"; $html = preg_replace("|(href=['\"]?)(?!$prefix)([^>'\"\s]+)(\s)?|", "$1$prefix$2$3", $html); -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php