Well, given the structure of an inverted index, how would you have a clue what
page the hit was on? You could conceivably index enough data with payloads and
the like, but that’d cause a lot more bloat than just indexing each page.
Using grouping would allow you to show, say, the top three pages
Hi!
I'm looking for some guidance on engineering a solution for searching
individual pages of PDF documents. I currently have a SolrCloud setup that uses
an external tika server to extract text data from PDFs. I'd like to be able to
search individual pages for search results and for the overall