Can you index every page as a separate doc (they can share a docID across all pages, the Solr ID field is just docID+pageno), then use highlighting to to get snippets, and use result grouping to group docs based on their docid. That'll mean you'll have all pages from a single document grouped together.
You could use the page number in a page_* dynamic field, but then you'd have to query against page_1, page_2, page_3...page_n for every query, which wouldn't work too well. Upayavira On Sat, Mar 2, 2013, at 03:59 PM, Anirudha Jadhav wrote: > if you increase the granularity of your document in index to a single > page > instead of an entire pdf; it becomes an easy problem. > > Your description states that you are not searching for a terms in a pdf > but > instead you are searching for a term in a page from a pdf. > > I assume you load the pdf externally for rendering. > > Not sure why you need the combined doc. Search against the document > pages, > and use faceting on the filenameID to return unique docs matched per > search > > > > > On Sat, Mar 2, 2013 at 1:46 AM, Aloke Ghoshal <alghos...@gmail.com> > wrote: > > > Hi, > > > > We are going about solving this problem by splitting a N-page document in > > to N separate documents (one per page, type=Page) + 1 additional combined > > document (that has all the pages, type=Combined). All the N+1 documents > > have the same doc_id. > > > > The search is initially performed against the combined document > > (type=Combined) to identify documents that match. For each search result a > > second search is performed against the separate pages (type=Page AND > > doc_id) to idetify the pages from within that document that match. > > > > Keen to know how others have solved this. > > > > Regards, > > Aloke > > > > On Fri, Mar 1, 2013 at 8:51 PM, Dyer, James <james.d...@ingramcontent.com > > >wrote: > > > > > Is there an easy (enough) way to do this, storing the page number as a > > > payload on each term? > > > > > > James Dyer > > > Ingram Content Group > > > (615) 213-4311 > > > > > > -----Original Message----- > > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] > > > Sent: Thursday, February 28, 2013 3:33 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Get page number of searchresult of a pdf in solr > > > > > > My guess is the best way to do this is to index each page separately > > > and to store a link to the PDF/page in each document. > > > > > > That would probably require you to preprocess the PDFs to turn each > > > one into a single page per PDF, or to extract the text per page > > > another way. > > > > > > Michael Della Bitta > > > > > > ------------------------------------------------ > > > Appinions > > > 18 East 41st Street, 2nd Floor > > > New York, NY 10017-6271 > > > > > > www.appinions.com > > > > > > Where Influence Isn't a Game > > > > > > > > > On Thu, Feb 28, 2013 at 3:26 PM, <d...@geschan.de> wrote: > > > > Hello, > > > > > > > > I'm building a web application where users can search for pdf documents > > > and > > > > view them with pdf.js. I would like to display the search results with > > a > > > > short snippet of the paragraph where the search term where found and a > > > link > > > > to open the document at the right page. > > > > > > > > So what I need is the page number and a short text snippet of every > > > search > > > > result. > > > > > > > > I'm using SOLR 4.1 for indexing pdf documents. The indexing itself > > works > > > > fine but I don't know how to get the page number and paragraph of a > > > search > > > > result. I only get the document where the search term was found in. > > > > > > > > -Gesh > > > > > > > > > > > > > > > > > > > -- > Anirudha P. Jadhav