Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.

Justin French Wed, 28 Aug 2002 08:30:11 -0700

Either the ereg_replace, eregi_replace, or preg_replace has a full working
script that does this, returning pretty much plain text.


There's also the strip_tags()/striptags() function which strips out all PHP
and HTML tags -- perhaps not enough, nice you'd want to remove *some* other
stuff maybe, but it's a good start, and may be used in conjunction with
other stuff.

You haven't said if you want:

- all the stuff between the body tags OR
- all the stuff that isn't tags (would include the title, and perhaps other
stuff


As per usual, specifically asking for what you want helps, but there is
HEAPS of ways of doing this.


More than likely you'll find/build the components you need in different
places:

- recursively run through a directory for each HTML file
- stripping each HTML file
- possibly presenting the raw text in a TEXTAREA for previewing/modifying
- adding the text to the DB, probably assigning the ID based on the original
filename, or something

Etc etc


Good luck,

Justin



on 28/08/02 11:58 PM, Charles Fowler ([EMAIL PROTECTED]) wrote:

> This may be an interesting challenge for someone or has it been done
> before
> 
> Can some one help me.
> 
> I am looking for a laboursaving method of extracting the contents of a
> web page (Text only) and dumping the rest of the html code.
> 
> I need the contents to rework the pages and put the contents into flat
> file database. Large but only two columns of data. Simple to work with
> (no need for DB) - They are just alot of links on a links page.
> 
> Scripts would be welcome.
> 
> Ciao, Carlos
> 
> 
> 


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.

Reply via email to