Either the ereg_replace, eregi_replace, or preg_replace has a full working script that does this, returning pretty much plain text.
There's also the strip_tags()/striptags() function which strips out all PHP and HTML tags -- perhaps not enough, nice you'd want to remove *some* other stuff maybe, but it's a good start, and may be used in conjunction with other stuff. You haven't said if you want: - all the stuff between the body tags OR - all the stuff that isn't tags (would include the title, and perhaps other stuff As per usual, specifically asking for what you want helps, but there is HEAPS of ways of doing this. More than likely you'll find/build the components you need in different places: - recursively run through a directory for each HTML file - stripping each HTML file - possibly presenting the raw text in a TEXTAREA for previewing/modifying - adding the text to the DB, probably assigning the ID based on the original filename, or something Etc etc Good luck, Justin on 28/08/02 11:58 PM, Charles Fowler ([EMAIL PROTECTED]) wrote: > This may be an interesting challenge for someone or has it been done > before > > Can some one help me. > > I am looking for a laboursaving method of extracting the contents of a > web page (Text only) and dumping the rest of the html code. > > I need the contents to rework the pages and put the contents into flat > file database. Large but only two columns of data. Simple to work with > (no need for DB) - They are just alot of links on a links page. > > Scripts would be welcome. > > Ciao, Carlos > > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php