In order to take our Solr usage to the next step, we really need to
improve its highlighting abilities.  What I'm trying to do is to be able
to write a new component that can return the fields that matched the
search (including numeric fields) and the start/end positions for the
alphanumeric matches.



I see three different approaches take, either way will require making
some modifications to the lucene/solr parts, as it just does not appear
to be doable as a completely stand alone component.



1) At initial search time.

This seemed like a good approach.  I can follow IndexSearcher creating
the TermContext that parses through AtomicReaderContexts to see if it
contains a match and then adds it to the contexts available for later.
However, at this point, inside SegmentTermsEnum.seekExact() it seems
like Solr is not really looking for matching terms as such, it's just
scanning what looks like the raw index.  So, I don't think I can easily
extract term positions at this point.



2) Write a odified HighlighterComponent.  We have managed to get phrases
to highlight properly, but it seems like getting the full field matches
would be more difficult in this module, however, because it does its
highlighting oblivious to any other criteria, we can't use it as is.
For example, this search:



  (body:large+AND+user_id:7)+OR+user_id:346



Will highlight "large" in records that have user_id = 346 when
technically (for our purposes at least) it should not be considered a
hit because the "large" was accompanied by the user_id = 7 criteria.
It's not immediately clear to me how difficult it would be to change
this.



3) Make a modified DebugComponent and enhance the existing explain()
methods (in the query types we require it at least) to include more
information such as the start/end positions of the term that was hit.
I'm exploring this now, but I don't easily see how I can figure out what
those positions might be from the explain() information.  Any pointers
on how, at the point that TermQuery.explain() is being called that I can
figure out which indexed token was the actual hit on?





Craig Longman

C++ Developer

iCONECT Development, LLC
519-645-1663





This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.

Reply via email to