Here is my use case:
I have a large number of HTML documents, sizes in the 0.5K-50M range, most around, say, 10M. I want to be able to present the user with the formatted HTML document, with the hits tagged, so that he may iterate through them, and see them in the context of the document, with the document looking as it would be presented by a browser; that is, fully formatted, with its tables and italics and font sizes and all. This is something that the user would explicitly request from within a set of search results, not something I’d expect to have returned from an initial search – the initial search merely returns the snippets around the hits. But if the user wants to dive into one of the returned results and see them in context, I need to be able to go get that. We are currently solving this problem by using an entirely separate search engine (dtSearch), which performs the tagging of the hits in the HTML just fine. But the solution is unsatisfactory because there are Solr searches that dtSearch’s capabilities cannot reasonably match. Can anyone suggest a good way to use Solr/Lucene for this instead? I’m thinking a separate core for this purpose might make sense, so as not to burden the primary search core with the full contents of the document. But after that, I’m stuck. How can I get Solr to express the highlighting in the context of the formatted HTML document? If Solr does not do this currently, and anyone can suggest ways to add the feature, any tips on how this might best be incorporated into the implementation would be welcome. Thanks, -- Bryan