[PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check that the "closed" condition) [lucene]

2023-10-21 Thread via GitHub


uschindler opened a new pull request, #12705:
URL: https://github.com/apache/lucene/pull/12705

   See the dev thread by @msokolov  @ 
https://lists.apache.org/thread/qts8wvrjs54gkgz04pk4p93fg0wjbq3o
   
   The handling of NPE is very special in ByteBufferIndexInput and also 
MemorySegmentIndexInput: To signal a closed input we set the buffers to NULL so 
any code trying to work on the inputs hits an NullPointerException. This is to 
avoid any null checks or isOpen checks everywhere in the code, which might be 
expensive, as the variable/field is not constant.
   
   But it must on the other hand be avoided that the NPE gets visible outside 
of the IndexInputs, because it looks like a buggy null checks and may cause 
support issues. So we have to hide it by all means, as the NPE is *not* an 
error but a signal that the IndexInput was closed.
   
   There are a few cases where a real NPE can still happen, e.g., when somebody 
accidentally passes a null array to one of the read  methods. In that case the 
NPE is important and should be thrown from the visibility of the caller.
   
   The workaround here to not add the NPE as a cause (and making it visible to 
call er outside code) is to do a safety check: If the NPE is catched, the code 
now checks that the "closed" condition applies (buffers are null or the byte 
buffer guard is invalided). If and only if the closed check is right, it throws 
AlreadyCosedException. In all other cases it rethrows the original NPE.
   
   I alos added a test which failed before my change.
   
   The changes were applied to all MemorySegmentIndexInput variants in the MR 
JAR and the original ByteBufferIndexInput (+ its guard).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-21 Thread via GitHub


uschindler commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773761875

   Hi,
   I was also thinking about this but came to a bit different setup. My problem 
here is that it is directly linking the code in the Java 20+ code to each other 
and adding instanceof checks.
   
   My idea was to have a new method on IndexInput, returning a ByteBuffer slice 
on RandomAccessInput for the vector that can be passed and also handled in a 
typesafe way without hacking "shortcut" paths between the various components 
removing abstractions which were introduced to separate IO from Lucene logic.
   
   I am out of office the next week, I'd like to participate in the discussion; 
we should not rush anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-21 Thread via GitHub


mikemccand commented on PR #12690:
URL: https://github.com/apache/lucene/pull/12690#issuecomment-1773767253

   Thanks @dungba88 -- looks great, I'll merge soon!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-21 Thread via GitHub


mikemccand merged PR #12653:
URL: https://github.com/apache/lucene/pull/12653


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-21 Thread via GitHub


mikemccand commented on PR #12653:
URL: https://github.com/apache/lucene/pull/12653#issuecomment-1773768806

   I merged to `main` and `9.x` (9.9)!  Thanks @shubhamvishu.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-21 Thread via GitHub


mikemccand merged PR #12690:
URL: https://github.com/apache/lucene/pull/12690


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-21 Thread via GitHub


mikemccand commented on PR #12690:
URL: https://github.com/apache/lucene/pull/12690#issuecomment-1773775196

   Thanks @dungba88 -- I'll wait to backport this until after backporting 
https://github.com/apache/lucene/pull/12633


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-21 Thread via GitHub


stefanvodita commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1773790152

   I've rebased #12506. I like having a separate class for slice allocation, 
but if there's disagreement over that, I can put the code back in 
`TermsHashPerField`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-21 Thread via GitHub


stefanvodita commented on PR #12506:
URL: https://github.com/apache/lucene/pull/12506#issuecomment-1773789994

   The last commit is a large rebase + conflict resolution after #12625 got 
merged. What this PR does hasn't really changed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-21 Thread via GitHub


mikemccand commented on PR #12506:
URL: https://github.com/apache/lucene/pull/12506#issuecomment-1773797296

   Thanks @stefanvodita -- I'll try to have a look soon!  And thank you for 
gracefully handling the "two people made very similar changes" situation :)
   
   This happens often in open source, but it's actually a good thing since you 
get two very different perspectives and the final solution is best of both.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-21 Thread via GitHub


mikemccand commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1773797762

   Thanks @stefanvodita -- I'll try to have a look soon at your rebased PR 
#12506.
   
   And thank you for gracefully handling the "two people made very similar 
changes" situation :)
   
   This happens often in open source, but it's actually a good thing since you 
get two very different perspectives and the final solution is best of both.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-21 Thread via GitHub


ChrisHegarty commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773837768

   > I am out of office the next week, I'd like to participate in the 
discussion; we should not rush anything.
   
   Take your time. Your input and ideas are very much welcome. We will continue 
to iterate here and try out different things, but ultimately will not finalize 
until you have had time to spend on it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Random access term dictionary [lucene]

2023-10-21 Thread via GitHub


bruno-roustant commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1773923204

   This is some code I wrote a long time ago. It has been tested and used, so
   I'm confident on the functional aspect, and it might benefit from a
   benchmark for perf.
   
   Le ven. 20 oct. 2023 à 19:20, Tony-X ***@***.***> a écrit :
   
   > Thanks @bruno-roustant  ! If you're
   > okay to share it feel free to share it here.
   >
   > I'm in the process of baking my own specific implementation (which
   > internally uses a single long as bit buffer), but I might absorb some
   > interesting ideas from your impl.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-21 Thread via GitHub


rmuir commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1773935712

   Should we just do more tests and start writing indexes without patching? 
Only a 4 percent disk savings? It is a lot of complexity, especially to 
vectorize. A runtime option is more expensive because then we have to make sure 
indexes encoded both ways can be read, it only adds more complexity imo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid object construction when linear searching arcs [lucene]

2023-10-21 Thread via GitHub


gf2121 commented on PR #12692:
URL: https://github.com/apache/lucene/pull/12692#issuecomment-1773995253

   Nightly benchmark shows fuzzy queries are a bit happy for this change: 
https://home.apache.org/~mikemccand/lucenebench/2023.10.19.18.03.18.html.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org