How can I change it to read from <segment>/parse_text instead of
<segment>/content ?

On 5/31/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On 5/31/07, Manoharam Reddy <[EMAIL PROTECTED]> wrote:
> > Some confusions regarding plugins.includes
> >
> > 1. I find a "parse-oo" in the plugins folder. What is that for?
>
> Plugin parse-oo has something to do with parsing OpenOffice.org
> documents, I am not sure what exactly.
>
> >
> > 2. I have enabled "parse-pdf" by including in "plugins.include" of
> > nutch-site.xml. The pages now come in the search result. But when I
> > visit the cached page of the result. It shows a message like this:-
> >
> > The cached content has mime type "application/pdf", click this link to
> > download it directly.
> >
> > Is it not possible to display the parsed content of the PDF instead of
> > this message?
> >
>
> As its name implies, cached content shows url's content:) . What you
> want to see is its parse text. Nutch doesn't do this but it is simple
> to change it so that it reads from <segment>/parse_text instead of
> <segment>/content .
>
> --
> Doğacan Güney
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to