hanks for the replies!
- Mensaje original -
De: "Upayavira"
Para: solr-user@lucene.apache.org
Enviados: Martes, 5 de Febrero 2013 9:05:58
Asunto: Re: Indexing several parts of PDF file
This would involve you querying against every page in your document,
which will be too many
Yes.. I also think the same..Better Index each Page as Documents
On Tue, Feb 5, 2013 at 7:35 PM, Upayavira wrote:
> This would involve you querying against every page in your document,
> which will be too many fields and will break quickly.
>
> The best way to do it is to index pages as documents
This would involve you querying against every page in your document,
which will be too many fields and will break quickly.
The best way to do it is to index pages as documents. You can use field
collapsing to group pages from the same document together.
Upayavira
On Tue, Feb 5, 2013, at 02:00 PM
Hi:
I'm working on a search engine for several PDF documents, right now one of the
requirements is that we can provide not only the documents matching the search
criteria but the page that match the criteria. Normally tika only extracts the
text content and does not do this distinction, but usi