John Gilmore wrote:
> 
> [I think we need software for automatically extracting the words from PDF
> and MS-Word documents so they can be found in web searches.  

        Oh what will those nutty hackers think of next :-)

John, actually Adobe has just such a tool, its called pdfsearch, the url
escapes me but I've used it many times before months ago, I'll dig
around.

I don't think that this is insidious, its just a usual tendency of rigid
large institutions to come up with wacky ideas, "well we have these
files that we need to share with that workgroup over at Andrews AFB, why
not save them all as MS Word files and put them on our web site so that
they will be easier for them to get to yuk yuk"



> the bad guys are deliberately putting lots of interesting stuff in PDF
> to make it hard to find and read.    --gnu]

        NAah, not hard to read - though this might be an interesting side
effect.  PDF is a good format for dumping scanned documents in, it
preserves formatting and layout pretty well.  It is printer ready, like
postscript.

Interesting link.  I'd hate to see such a spider publicized since this
would cause a lot of interesting data to disappear.


--
 "It is the responsibility of intellectuals to speak the truth and to 
 expose lies."  -Noam Chomsky

Reply via email to