How can I change it to read from <segment>/parse_text instead of <segment>/content ?
On 5/31/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > Hi, > > On 5/31/07, Manoharam Reddy <[EMAIL PROTECTED]> wrote: > > Some confusions regarding plugins.includes > > > > 1. I find a "parse-oo" in the plugins folder. What is that for? > > Plugin parse-oo has something to do with parsing OpenOffice.org > documents, I am not sure what exactly. > > > > > 2. I have enabled "parse-pdf" by including in "plugins.include" of > > nutch-site.xml. The pages now come in the search result. But when I > > visit the cached page of the result. It shows a message like this:- > > > > The cached content has mime type "application/pdf", click this link to > > download it directly. > > > > Is it not possible to display the parsed content of the PDF instead of > > this message? > > > > As its name implies, cached content shows url's content:) . What you > want to see is its parse text. Nutch doesn't do this but it is simple > to change it so that it reads from <segment>/parse_text instead of > <segment>/content . > > -- > Doğacan Güney > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
