zacharymorn commented on pull request #16:
URL: https://github.com/apache/lucene/pull/16#issuecomment-798974587


   No problem Dawid and thanks for the feedback! I agree that re-analyzing the 
token stream does seems to be wasteful if the index already has position and 
offset information. I think the central issue here is that the current 
`Intervals#extend` API here only takes before / after values based on position 
instead of offset. But on the other hand, additional API that takes in / 
provides offset based before / after values may not be very meaningful to most 
client applications (I think), as offset varies based on each token length.  So 
highlighting offset range computed based on non-positional APIs may always 
require adjustment here if the before / after values are provided based on 
position (this may also include `OffsetsFromTokens` and `OffsetsFromTokens` 
strategies as well). 
   
   
https://github.com/apache/lucene/blob/471f38c0310671f9856ad86ad98c4238d2b770dc/lucene/queries/src/java/org/apache/lucene/queries/intervals/Intervals.java#L251-L252
   
   However, I'm also wondering if the position to / from offset mappings were 
already computed / stored for other purposes and readily available in 
OffsetRetrievalStrategy classes? If so, we could skip the re-analysis as well.
   
   Alan, could you please let me know if you have any pointer here? I'm happy 
to try out different approaches to find the best solution here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to