Re: Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
hanks for the replies! - Mensaje original - De: "Upayavira" Para: solr-user@lucene.apache.org Enviados: Martes, 5 de Febrero 2013 9:05:58 Asunto: Re: Indexing several parts of PDF file This would involve you querying against every page in your document, which will be too many

Re: Indexing several parts of PDF file

2013-02-05 Thread VIGNESH S
Yes.. I also think the same..Better Index each Page as Documents On Tue, Feb 5, 2013 at 7:35 PM, Upayavira wrote: > This would involve you querying against every page in your document, > which will be too many fields and will break quickly. > > The best way to do it is to index pages as documents

Re: Indexing several parts of PDF file

2013-02-05 Thread Upayavira
This would involve you querying against every page in your document, which will be too many fields and will break quickly. The best way to do it is to index pages as documents. You can use field collapsing to group pages from the same document together. Upayavira On Tue, Feb 5, 2013, at 02:00 PM

Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Hi: I'm working on a search engine for several PDF documents, right now one of the requirements is that we can provide not only the documents matching the search criteria but the page that match the criteria. Normally tika only extracts the text content and does not do this distinction, but usi