Hi: I'm working on a search engine for several PDF documents, right now one of the requirements is that we can provide not only the documents matching the search criteria but the page that match the criteria. Normally tika only extracts the text content and does not do this distinction, but using some custom library this could be achieve, but my question is how to structure the schema. For what I've seen one approach could be the use dynamic fields:
<dynamicField name="page_*" type="text" indexed="true" stored="true"/> So at query time I could extract the page number from the fields name. Is this the best approach? Is there any form of storing the number page into an attribute and not using the dynamic fields? Thanks in advance! Greetings -- "It is only in the mysterious equation of love that any logical reasons can be found." "Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)"