Here is my use case:


I have a large number of HTML documents, sizes in the 0.5K-50M range, most
around, say, 10M.



I want to be able to present the user with the formatted HTML document, with
the hits tagged, so that he may iterate through them, and see them in the
context of the document, with the document looking as it would be presented
by a browser; that is, fully formatted, with its tables and italics and font
sizes and all.



This is something that the user would explicitly request from within a set
of search results, not something I’d expect to have returned from an initial
search – the initial search merely returns the snippets around the hits. But
if the user wants to dive into one of the returned results and see them in
context, I need to be able to go get that.



We are currently solving this problem by using an entirely separate search
engine (dtSearch), which performs the tagging of the hits in the HTML just
fine. But the solution is unsatisfactory because there are Solr searches
that dtSearch’s capabilities cannot reasonably match.



Can anyone suggest a good way to use Solr/Lucene for this instead? I’m
thinking a separate core for this purpose might make sense, so as not to
burden the primary search core with the full contents of the document. But
after that, I’m stuck. How can I get Solr to express the highlighting in the
context of the formatted HTML document?



If Solr does not do this currently, and anyone can suggest ways to add the
feature, any tips on how this might best be incorporated into the
implementation would be welcome.



Thanks,



-- Bryan

Reply via email to