page number as a
> > > payload on each term?
> > >
> > > James Dyer
> > > Ingram Content Group
> > > (615) 213-4311
> > >
> > > -Original Message-
> > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.
e-
> > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> > Sent: Thursday, February 28, 2013 3:33 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Get page number of searchresult of a pdf in solr
> >
> > My guess is the best way
bi...@appinions.com]
> Sent: Thursday, February 28, 2013 3:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Get page number of searchresult of a pdf in solr
>
> My guess is the best way to do this is to index each page separately
> and to store a link to the PDF/page in each doc
@lucene.apache.org
Subject: Re: Get page number of searchresult of a pdf in solr
My guess is the best way to do this is to index each page separately
and to store a link to the PDF/page in each document.
That would probably require you to preprocess the PDFs to turn each
one into a single page per
Is it possible to write a plugin that is converting each page
separately with Tika and saving all pages in one document (maybe in a
dynamic field like "page_*")? I would like to have only one document
stored in SOLR for each pdf (it fit's better to the way my web
application is managing the
You can get the paragraph of the search result via highlights. You'd have to
mark your field as stored (re-indexing required) and then specify it in the
highlighting parameters.
http://wiki.apache.org/solr/HighlightingParameters#hl
As for getting the page number, I am not sure if there is more
My guess is the best way to do this is to index each page separately
and to store a link to the PDF/page in each document.
That would probably require you to preprocess the PDFs to turn each
one into a single page per PDF, or to extract the text per page
another way.
Michael Della Bitta