Yes.. I also think the same..Better Index each Page as Documents On Tue, Feb 5, 2013 at 7:35 PM, Upayavira <u...@odoko.co.uk> wrote: > This would involve you querying against every page in your document, > which will be too many fields and will break quickly. > > The best way to do it is to index pages as documents. You can use field > collapsing to group pages from the same document together. > > Upayavira > > On Tue, Feb 5, 2013, at 02:00 PM, Jorge Luis Betancourt Gonzalez wrote: >> Hi: >> >> I'm working on a search engine for several PDF documents, right now one >> of the requirements is that we can provide not only the documents >> matching the search criteria but the page that match the criteria. >> Normally tika only extracts the text content and does not do this >> distinction, but using some custom library this could be achieve, but my >> question is how to structure the schema. For what I've seen one approach >> could be the use dynamic fields: >> >> <dynamicField name="page_*" type="text" indexed="true" stored="true"/> >> >> So at query time I could extract the page number from the fields name. Is >> this the best approach? Is there any form of storing the number page into >> an attribute and not using the dynamic fields? >> >> Thanks in advance! >> >> Greetings >> -- >> "It is only in the mysterious equation of love that any >> logical reasons can be found." >> "Good programmers often confuse halloween (31 OCT) with >> christmas (25 DEC)"
-- Thanks and Regards Vignesh Srinivasan 9739135640