Re: Get page number of searchresult of a pdf in solr

dev Fri, 01 Mar 2013 02:13:12 -0800

Is it possible to write a plugin that is converting each pageseparately with Tika and saving all pages in one document (maybe in adynamic field like "page_*")? I would like to have only one documentstored in SOLR for each pdf (it fit's better to the way my webapplication is managing these documents and I would like to use thesame id as unique identifier).

To be honest, I can't understand why SOLR is not able to find thepages where the search term was found in. It's a quite common task inmy opinion.


-Gesh

Zitat von Michael Della Bitta <michael.della.bi...@appinions.com>:

My guess is the best way to do this is to index each page separately
and to store a link to the PDF/page in each document.

That would probably require you to preprocess the PDFs to turn each
one into a single page per PDF, or to extract the text per page
another way.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn?t a Game


On Thu, Feb 28, 2013 at 3:26 PM,  <d...@geschan.de> wrote:

Hello,

I'm building a web application where users can search for pdf documents and
view them with pdf.js. I would like to display the search results with a
short snippet of the paragraph where the search term where found and a link
to open the document at the right page.

So what I need is the page number and a short text snippet of every search
result.

I'm using SOLR 4.1 for indexing pdf documents. The indexing itself works
fine but I don't know how to get the page number and paragraph of a search
result. I only get the document where the search term was found in.

-Gesh

Re: Get page number of searchresult of a pdf in solr

Reply via email to