Re: Advice on highlighting

Ramkumar R. Aiyengar Sat, 13 Sep 2014 23:09:13 -0700

https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878
provides lucene API what you are trying to do, it's not yet in though.
There's a fork which has the change in
https://github.com/flaxsearch/lucene-solr-intervals
On 12 Sep 2014 21:24, "Craig Longman" <clong...@iconect.com> wrote:


> In order to take our Solr usage to the next step, we really need to
> improve its highlighting abilities.  What I'm trying to do is to be able
> to write a new component that can return the fields that matched the
> search (including numeric fields) and the start/end positions for the
> alphanumeric matches.
>
>
>
> I see three different approaches take, either way will require making
> some modifications to the lucene/solr parts, as it just does not appear
> to be doable as a completely stand alone component.
>
>
>
> 1) At initial search time.
>
> This seemed like a good approach.  I can follow IndexSearcher creating
> the TermContext that parses through AtomicReaderContexts to see if it
> contains a match and then adds it to the contexts available for later.
> However, at this point, inside SegmentTermsEnum.seekExact() it seems
> like Solr is not really looking for matching terms as such, it's just
> scanning what looks like the raw index.  So, I don't think I can easily
> extract term positions at this point.
>
>
>
> 2) Write a odified HighlighterComponent.  We have managed to get phrases
> to highlight properly, but it seems like getting the full field matches
> would be more difficult in this module, however, because it does its
> highlighting oblivious to any other criteria, we can't use it as is.
> For example, this search:
>
>
>
>   (body:large+AND+user_id:7)+OR+user_id:346
>
>
>
> Will highlight "large" in records that have user_id = 346 when
> technically (for our purposes at least) it should not be considered a
> hit because the "large" was accompanied by the user_id = 7 criteria.
> It's not immediately clear to me how difficult it would be to change
> this.
>
>
>
> 3) Make a modified DebugComponent and enhance the existing explain()
> methods (in the query types we require it at least) to include more
> information such as the start/end positions of the term that was hit.
> I'm exploring this now, but I don't easily see how I can figure out what
> those positions might be from the explain() information.  Any pointers
> on how, at the point that TermQuery.explain() is being called that I can
> figure out which indexed token was the actual hit on?
>
>
>
>
>
> Craig Longman
>
> C++ Developer
>
> iCONECT Development, LLC
> 519-645-1663
>
>
>
>
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>
>

Re: Advice on highlighting

Reply via email to