In order to take our Solr usage to the next step, we really need to improve its highlighting abilities. What I'm trying to do is to be able to write a new component that can return the fields that matched the search (including numeric fields) and the start/end positions for the alphanumeric matches.
I see three different approaches take, either way will require making some modifications to the lucene/solr parts, as it just does not appear to be doable as a completely stand alone component. 1) At initial search time. This seemed like a good approach. I can follow IndexSearcher creating the TermContext that parses through AtomicReaderContexts to see if it contains a match and then adds it to the contexts available for later. However, at this point, inside SegmentTermsEnum.seekExact() it seems like Solr is not really looking for matching terms as such, it's just scanning what looks like the raw index. So, I don't think I can easily extract term positions at this point. 2) Write a odified HighlighterComponent. We have managed to get phrases to highlight properly, but it seems like getting the full field matches would be more difficult in this module, however, because it does its highlighting oblivious to any other criteria, we can't use it as is. For example, this search: (body:large+AND+user_id:7)+OR+user_id:346 Will highlight "large" in records that have user_id = 346 when technically (for our purposes at least) it should not be considered a hit because the "large" was accompanied by the user_id = 7 criteria. It's not immediately clear to me how difficult it would be to change this. 3) Make a modified DebugComponent and enhance the existing explain() methods (in the query types we require it at least) to include more information such as the start/end positions of the term that was hit. I'm exploring this now, but I don't easily see how I can figure out what those positions might be from the explain() information. Any pointers on how, at the point that TermQuery.explain() is being called that I can figure out which indexed token was the actual hit on? Craig Longman C++ Developer iCONECT Development, LLC 519-645-1663 This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.