Hi:

I'm working on a search engine for several PDF documents, right now one of the 
requirements is that we can provide not only the documents matching the search 
criteria but the page that match the criteria. Normally tika only extracts the 
text content and does not do this distinction, but using some custom library 
this could be achieve, but my question is how to structure the schema. For what 
I've seen one approach could be the use dynamic fields:

<dynamicField name="page_*" type="text" indexed="true"  stored="true"/>

So at query time I could extract the page number from the fields name. Is this 
the best approach? Is there any form of storing the number page into an 
attribute and not using the dynamic fields?

Thanks in advance!

Greetings
--
"It is only in the mysterious equation of love that any 
logical reasons can be found."
"Good programmers often confuse halloween (31 OCT) with 
christmas (25 DEC)"

Reply via email to