Sure, I'll break it apart a little:
'{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is'
$regex = '{' . // opening delimeter
'(?=' . // positive lookahead: match the beginning of a position
// that matches the following pattern:
'<p' . // first part of an opening <p> tag
'(?:' . // non-capturing parenthesis (same as normal
// parenthesis, but a bit faster since we don't
// need to capture what they match for use later
'>|\s' . // match a closing > or a space
')' . // end capturing paranthesis
'(?!' . // negative lookahead: the match will fail if the
//following pattern matches from the current position
'.*' . // match until the end of the string
'<p(?:>|\s)' . // same as above - look for another <p> tag
')' . // end negative lookahead
')' . // end positive lookahead
'}is'; // ending delimeter, and use modifiers s and i
About the modifiers: i makes it case-insensitive, and s turns on
dot-matches-all-mode (including newlines)--otherwise, the . would only match
until the next newline.
The regex has two parts: matching a <p> tag, and then making sure there
aren't any more <p> tags in the string following it. The positive lookahead
is (hopefully) pretty straightforward. The negative lookahead works by using
a greedy (regular) .*, which forces the regex engine to match all the way to
the end of the haystack. Then it encounters the <p(?:>\s) part, forcing it
to backtrack until it finds a <p> tag. If it doesn't find one before
returning to the 'current' position (directly after the <p> tag we just
matched), then we know we have found the last <p> tag.
The positive and negative lookahead are 'zero-width' requirements, which
means they don't advance the regex engine's pointer in the haystack string.
Since the entire regex is zero-width, the replacement string gets inserted
at the matched position.
I hope that made at least a little bit of sense :) If you're doing a lot of
regex work, I would strongly recommend reading the book Mastering Regular
Expressions by Jeffrey Friedl... it's very well written and very helpful.
-Brian
-----Original Message-----
From: Dotan Cohen [mailto:[EMAIL PROTECTED]
Sent: Monday, August 27, 2007 3:45 PM
To: Brian Rue
Cc: [email protected]
Subject: Re: [PHP] Adding text before last paragraph
On 27/08/07, Brian Rue <[EMAIL PROTECTED]> wrote:
> Dotan, try this:
>
> $text="<p>First paragraph</p>\n<p>More text</p>\n<p>Some more
> text</p>\n<p>End of story</p>";
>
> $story = preg_replace('{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is', "<p>new
> paragraph goes here</p>\n", $text);
>
> This matches a position that has an opening <p> tag (with or without
> parameters), which is NOT followed anywhere in $text by another opening
<p>
> tag. The replacement string will be inserted at the matched position,
which
> will be directly before the last <p> tag. Not sure if this is the most
> efficient regex, but it should get the job done. Let me know how it
goes...
> I'd also be interested to hear any comments on that regex's efficiency.
>
> -Brian Rue
>
Thank you Brian. This most certainly works. I'm having a very hard
time decyphering your regex, as I'd like to learn from it. I'm going
over PCRE again, but I think that I may hit google soon. Thank you
very, very much for the working code. As usual, I have another night
of regex waiting for me...
Dotan Cohen
http://lyricslist.com/
http://what-is-what.com/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php