Re: [PR] Fix 9.12.0 backcompat break (Lucene 9.12.0 cannot read 9.11.x indices written with quantized HNSW, `Lucene99HnswScalarQuantizedVectorsFormat`) [lucene]

2024-10-14 Thread via GitHub


javanna commented on PR #13874:
URL: https://github.com/apache/lucene/pull/13874#issuecomment-2412274304

   Heya, I am working on generating the backwards indices after releasing 
Lucene 10, and here are my observations:
   
   - I think that we need to forward port this change to main as well?
   - We need to update the tooling to generate the backwards indices 
accordingly (python code and java code are no longer in sync)
   
   I am happy to work on this, I think I understand what needs to happen but I 
wanted to confirm first. Perhaps the forward port is just a cherry-pick to 
main, hopefully? And once the branches are aligned, I can adjust the tooling 
and test them direclty as I generate the needed bwc indices for 10.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Make dynamic range facets value collection and sorting faster [lucene]

2024-10-14 Thread via GitHub


HoustonPutman commented on issue #13760:
URL: https://github.com/apache/lucene/issues/13760#issuecomment-2412577287

   @timgrein , I've posted a PR for my idea that @stefanvodita mentioned. If 
you have the JMH benchmark, I'd love to test it out on mine as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix 9.12.0 backcompat break (Lucene 9.12.0 cannot read 9.11.x indices written with quantized HNSW, `Lucene99HnswScalarQuantizedVectorsFormat`) [lucene]

2024-10-14 Thread via GitHub


javanna commented on PR #13874:
URL: https://github.com/apache/lucene/pull/13874#issuecomment-2412365840

   I did find above a diff that is exactly the same as the PR I opened to 
update AddBackcompatindices :)  (#13911) . I also opened a PR against main to 
cherry-pick this change and make the necessary adjustments: #13912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Align TestGenerateBwcIndices.java with AddBackcompatindices.py [lucene]

2024-10-14 Thread via GitHub


javanna opened a new pull request, #13911:
URL: https://github.com/apache/lucene/pull/13911

   We updated TestGenerateBwcIndices to create int7 HNSW indices instead of 
int8 with #13874. The corresponding python code part of the release wizard 
needs to be updated accordingly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-10-14 Thread via GitHub


vsop-479 commented on PR #12868:
URL: https://github.com/apache/lucene/pull/12868#issuecomment-2412698518

   BTW, why `StringHelper` is a abstract class? Can we make it final? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Add an S3-based directory. [lucene]

2024-10-14 Thread via GitHub


atris commented on issue #13868:
URL: https://github.com/apache/lucene/issues/13868#issuecomment-2412914455

   @jpountz Interestingly, I have been spending some time on making this work - 
if its ok, I will self assign this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

2024-10-14 Thread via GitHub


atris commented on issue #13675:
URL: https://github.com/apache/lucene/issues/13675#issuecomment-2412918581

   I have recently been interested in this direction and plan on spending non 
trivial amount of time on this over the next few weeks. Assuming we haven't 
started dev on this, I am assigning it to myself.
   
   What I have been hacking on is impact sorted prioritisation of docIDs during 
the ranking phase (especially for custom rankers), so that would probably be 
the first thing that comes out of this ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-14 Thread via GitHub


msokolov commented on PR #13910:
URL: https://github.com/apache/lucene/pull/13910#issuecomment-2412421807

   I wonder whether we should backport the fixes to the 
`Lucene90HnswVectorsReader`? I tend to think we ought to, although the usage 
might be tiny to nonexistent


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-14 Thread via GitHub


mdmarshmallow opened a new issue, #13913:
URL: https://github.com/apache/lucene/issues/13913

   ### Description
   
   This is related to https://github.com/apache/lucene/issues/13883. The idea 
is to allow users to specify the RAM usage once and it will be automatically 
spread across N IndexWriter's so they will each have 1/N RAM allocated to them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2024-10-14 Thread via GitHub


HoustonPutman opened a new pull request, #13914:
URL: https://github.com/apache/lucene/pull/13914

   Resolves #13760
   
   ### Description
   
   
   
   This is using a similar approach to how Solr used to compute multiple 
percentiles at a single time. Basically utilize the quick select method, but 
instead of following a single path, follow a path for each of the `k`s that is 
requested. Multi-quickselect.
   
   That's what I originally made, until I realized that the `DynamicRangeUtil` 
is weighted, so I refactored it to choose by weights instead, and also capture 
the running-value-total and running-weight-total, because that information is 
used in the `DynamicRangeInfo`.
   
   My goal was to add this as a generic capability of the `Selector` (or 
`IntroSelector`) class, but because of the limitations above, it is currently a 
separate class to handle this. If there's any suggestions on how to make this 
generic enough to be put in the generic class, that would be great. But it 
might not be worth the effort if it wouldn't be used anywhere else.
   
   As for the original multi-quickSelect algorithm I mentioned, I looked for 
other multi-select use cases across Lucene, but I only found one instance (The 
quantized does two select calls in succession). If there's more instances we 
can find, I would be happy to add multiSelect as an option on the `Selector` 
class, and implement it in all provided classes.
   
   ### To-Do
   
   - The code needs to be cleaned up and better documented, this is just a POC
   - Benchmarks comparing this to the full-sorting implementation.
   
   ### Caveat
   
   The implement is slightly different, as it will pick the groups according to 
"The first value for which the running weight is <= weight-range-boundary". The 
old logic would start counting again after a weight range was complete, which 
removes information from the overflow of previous weight-ranges. I'm not sure 
either approach is right or wrong, but I wanted to explicitly state how the 
results would be different and why I had to alter a unit test to pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove synchronization from IndexWriter.isClosed [lucene]

2024-10-14 Thread via GitHub


github-actions[bot] commented on PR #13834:
URL: https://github.com/apache/lucene/pull/13834#issuecomment-2412576598

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use RandomAccessInput instead of seeking in Lucene90DocValuesProducer [lucene]

2024-10-14 Thread via GitHub


uschindler commented on PR #13894:
URL: https://github.com/apache/lucene/pull/13894#issuecomment-2410749511

   I dont think this is a real slowdown caused by the commit here. It is more 
caused by the Hotspot optimizer misinterpreting something. We should get the 
assembly code from the benchmark and compare them to figure out why it gets 
worse.
   Actually the non-random access modifies global state (moves position), so 
actually optimizer does a harder job, so the slowdown is very strange.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-10-14 Thread via GitHub


shubhamvishu commented on PR #12868:
URL: https://github.com/apache/lucene/pull/12868#issuecomment-2410748497

   @jpountz I simplified the expression now. Let me know if the change looks 
good? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-10-14 Thread via GitHub


shubhamvishu commented on code in PR #12868:
URL: https://github.com/apache/lucene/pull/12868#discussion_r1799172633


##
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java:
##
@@ -150,9 +149,10 @@ private FuzzySet(FixedBitSet filter, int bloomSize, int 
hashCount) {
* @return NO or MAYBE
*/
   public ContainsResult contains(BytesRef value) {
-long hash = hashFunction.hash(value);
-int msb = (int) (hash >>> Integer.SIZE);
-int lsb = (int) hash;
+long[] hash = StringHelper.murmurhash3_x64_128(value);
+
+int msb = (((int) hash[0] >>> Integer.SIZE) >>> 1) + (((int) hash[1] >>> 
Integer.SIZE) >>> 1);
+int lsb = ((int) hash[0] >>> 1) + ((int) hash[1] >>> 1);

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use RandomAccessInput instead of seeking in Lucene90DocValuesProducer [lucene]

2024-10-14 Thread via GitHub


uschindler commented on PR #13894:
URL: https://github.com/apache/lucene/pull/13894#issuecomment-2410803086

   I was a bit afraid that this might have been caused by missing "prefetch()" 
implementation, but for MemorySegmentIndexInput also the random access slice 
uses the correct implementation.
   
   My only idea could be that there's a difference between the inlining 
decisions for *interface* `RandomAccessInput` and the class `IndexInput` (both 
implemented by MemorySegmentIndexInput, but the first is using 
`invokeinterface` which the latter is using `invokevirtual` bytecode). This 
could make a difference here, so we need a dump of which methods were inlined 
before/after.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2024-10-14 Thread via GitHub


javanna commented on issue #13898:
URL: https://github.com/apache/lucene/issues/13898#issuecomment-2410385807

   Yes to this! It would be great to combine this with setting the milestone :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries (#13845) [lucene]

2024-10-14 Thread via GitHub


javanna commented on code in PR #13891:
URL: https://github.com/apache/lucene/pull/13891#discussion_r1798961357


##
lucene/CHANGES.txt:
##
@@ -47,6 +52,9 @@ API Changes
   the entire segment should be scored. Subclasses that override the method 
should instead override its replacement.
   (Luca Cavanna)
 
+* GITHUB#13757: For similarities, provide default computeNorm implementation 
and remove remaining discountOverlaps setters.
+  (Christine Poerschke, Adrien Grand, Robert Muir)

Review Comment:
   I am afraid is is now too late to modify the changelog for 9.12, as it's 
already been released?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-14 Thread via GitHub


bugmakerr commented on PR #13763:
URL: https://github.com/apache/lucene/pull/13763#issuecomment-2410922885

   hi @jpountz ,since we have moved to lucene 10, should we merge this and add 
back #13686?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use RandomAccessInput instead of seeking in Lucene90DocValuesProducer [lucene]

2024-10-14 Thread via GitHub


jpountz commented on PR #13894:
URL: https://github.com/apache/lucene/pull/13894#issuecomment-2410306410

   Looks like there was a slowdown on some taxo facets tasks (which use binary 
doc values to store the taxonomy). `OrHighMedDayTaxoFacets`, 
`AndHighHighDayTaxoFacets` and `MedTermDayTaxoFacets` had a slowdown with a low 
p-value. On the other hand, `BrowseMonthTaxoFacets` went a bit faster (p-value 
is not so low, but this is the best data point in a long time, which suggests 
it's not noise). https://benchmarks.mikemccandless.com/2024.10.12.18.20.19.html
   
   We should not revert, but I'll annotate. I'm curious if someone has ideas on 
the cause. Also cc @stefanvodita @gsmiller since you have interest in facets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Better handle dynamic pruning when the leading clause has a single impact block. [lucene]

2024-10-14 Thread via GitHub


jpountz opened a new pull request, #13904:
URL: https://github.com/apache/lucene/pull/13904

   `BlockMaxConjunctionBulkScorer` only checks if it can early exit based on 
impacts once per window, and windows are computed using impact blocks of the 
leading clause. So this logic is defeated if the leading clause produces a 
single block (e.g. `ConstantScoreQuery`). This commit addresses this problem by 
artificially lowering the window size to contain ~128 docs of the leading 
clause.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use RandomAccessInput instead of seeking in Lucene90DocValuesProducer [lucene]

2024-10-14 Thread via GitHub


original-brownbear commented on PR #13894:
URL: https://github.com/apache/lucene/pull/13894#issuecomment-2410525472

   Looks to me like a few inlining decisions changed (judging by the profiling 
and the relative weight of inlined versions vs non-inlined versions of some 
method). I'm not even entirely sure it's because of this change and not one of 
the other changes. The std-deviation for `BrowseMonthTaxoFacets` also dropped 
from 20%+ to 0.1%+, which has me wondering if maybe we have too little warmup 
time here before measuring?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Added support for highlighting `IndexOrDocValuesQuery` [lucene]

2024-10-14 Thread via GitHub


jpountz merged PR #13902:
URL: https://github.com/apache/lucene/pull/13902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make MaxScoreBulkScorer repartition scorers when the min competitive increases. [lucene]

2024-10-14 Thread via GitHub


jpountz commented on PR #13800:
URL: https://github.com/apache/lucene/pull/13800#issuecomment-243632

   Here's the fix for conjunctions: https://github.com/apache/lucene/pull/13904.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Test changelog verifier [lucene]

2024-10-14 Thread via GitHub


stefanvodita opened a new pull request, #13908:
URL: https://github.com/apache/lucene/pull/13908

   test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add changelog verifier [lucene]

2024-10-14 Thread via GitHub


stefanvodita opened a new pull request, #13909:
URL: https://github.com/apache/lucene/pull/13909

   ### Description
   
   This runs along the checks we already have for PR creation/update and warns 
us if there is no CHANGES.txt entry.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2024-10-14 Thread via GitHub


stefanvodita commented on issue #13898:
URL: https://github.com/apache/lucene/issues/13898#issuecomment-2411705472

   @prudhvigodithi - I opened #13898 inspired by your PR, tested it on my fork, 
and it's working correctly.
   @javanna  - I like that idea! Maybe we can do that as a follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [SOLR-11191] SolrIndexSplitter: Init support for routing docs by _root_ when available [lucene-solr]

2024-10-14 Thread via GitHub


dsmiley commented on code in PR #2686:
URL: https://github.com/apache/lucene-solr/pull/2686#discussion_r1799803611


##
solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java:
##
@@ -130,9 +131,26 @@ public SolrIndexSplitter(SplitIndexCommand cmd) {
 }
 routeFieldName = cmd.routeFieldName;
 if (routeFieldName == null) {
-  field = searcher.getSchema().getUniqueKeyField();
+  // To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+  // otherwise use the unique key field
+  field = searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+  if(field == null) {
+field = searcher.getSchema().getUniqueKeyField();
+  }
 } else  {
-  field = searcher.getSchema().getField(routeFieldName);
+  SchemaField uniqueField = searcher.getSchema().getUniqueKeyField();
+  if (uniqueField.getName().equals(routeFieldName)) {
+// Explicitly routing based on unique field
+// To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+// otherwise use the unique key field
+field = 
searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+if (field == null) {
+  field = searcher.getSchema().getUniqueKeyField();
+}
+  } else {
+// Custom routing
+field = searcher.getSchema().getField(routeFieldName);
+  }

Review Comment:
   Notice it's conditional based on the router having a "field" in the JSON.  
It's rare to use that.
   
   Also if we don't support something (child docs + whatever), we ought to 
throw an exception instead of do the wrong thing.  See 
`org.apache.solr.schema.IndexSchema#isUsableForChildDocs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-14 Thread via GitHub


jpountz commented on code in PR #13906:
URL: https://github.com/apache/lucene/pull/13906#discussion_r1799801328


##
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -563,6 +563,13 @@ public final MemorySegmentIndexInput slice(String 
sliceDescription, long offset,
 return buildSlice(sliceDescription, offset, length);
   }
 
+  public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
+if (offset == 0 && length == this.length) {
+  return this;

Review Comment:
   I'm curious if this code path is properly tested, I would assume most call 
sites of `randomAccessSlice` to only use a subset of the file/clone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Test changelog verifier [lucene]

2024-10-14 Thread via GitHub


stefanvodita closed pull request #13908: Test changelog verifier
URL: https://github.com/apache/lucene/pull/13908


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Test changelog verifier [lucene]

2024-10-14 Thread via GitHub


stefanvodita commented on PR #13908:
URL: https://github.com/apache/lucene/pull/13908#issuecomment-2411670171

   Testing a Github workflow, not intended to merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-14 Thread via GitHub


original-brownbear commented on PR #13906:
URL: https://github.com/apache/lucene/pull/13906#issuecomment-2411864517

   > One thing that is a bit awkward to me is that it makes clones cheaper than 
slices, so e.g. refactoring TermsEnum#postings to work on a slice that contains 
just the postings of the current term would be a regression vs. cloning the 
whole file like today. Not a blocker, it just gets me thinking.
   
   I had the same thought :D that's what got me here in part. Moving the offset 
handling into whatever object that uses an `IndexInput` now and replacing that 
with a stateless shared `RandomAccessInput` should allow for some performance 
gains here and there and more importantly, should allow for massive heap 
savings.
   It's still quite wild today for ES, I took a heap dump of a http_logs track 
benchmark node and somehow tens of shards today translate into MBs of 
`IndexInput` instances for us. I get that there's two sides to this, ES should 
probably have fewer segments but also still there's no need to redundantly 
clone (stateful) `IndexInput` when we often also track file pointers in the 
users of those things anyway in Lucene?
   
   > I am not sure if this really helps much
   
   See above. But also, even in the Lucene benchmarks it's about 1% of 
allocations in wikimedium saved for me. Not a massive win, but not entirely 
irrelevant either :)
   Also, this is somewhat helpful as far as making heap-dumps easier to 
interpret actually. Always easier to find redundancies if object sharing isn't 
super far down the reference chains :)  
   
   On it, looking looking into a test :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-14 Thread via GitHub


msokolov opened a new pull request, #13910:
URL: https://github.com/apache/lucene/pull/13910

   While exploring some recall-related failures in another PR I went looking 
for a unit test that checks HNSW/KNN recall and couldn't find any. I think we 
used to have one but maybe we removed it because it was flaky? But we really do 
need such a test since it is possible to make changes that preserve all the 
formal properties of the codecs and queries yet destroy recall. I thought if we 
can create such a test with known data and vectors it would be more predictable 
than one using random data, so I made one, and it uncovered a couple of bugs: 
   
   In Lucene90HnswVectorsReader we messed up (removed) ord-to-doc mappings so 
we were returning vector ords instead of docids in search results. I guess this 
would have totally borked back-compat for Lucene90 indexes. Probably there are 
none in the wild, and this was never noticed?
   
   In Lucene91RWFormat (used only for back-compat testing) we messed up 
diversity check so we were producing bad graphs.
   
   This PR fixes these things and adds the new test
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-14 Thread via GitHub


uschindler commented on PR #13906:
URL: https://github.com/apache/lucene/pull/13906#issuecomment-2411866037

   P.P.S.: Just some background: The code was copied from ByteBufferIndexInput 
where the clones were necessary. With MemorySegment we no longer create 
`dup()`s, as MemorySegment is stateless.
   
   So the changes here look fine, but need more tests. Maybe we can refactor 
the method a bit, so the `clone()` does not go through `buildSlice()` at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Better handle dynamic pruning when the leading clause has a single impact block. [lucene]

2024-10-14 Thread via GitHub


jpountz commented on PR #13904:
URL: https://github.com/apache/lucene/pull/13904#issuecomment-2410663119

   I confirmed that this makes things better when all clauses are 
`ConstantScoreQuery`s, luceneutil doesn't report a slowdown:
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   OrNotHighLow  585.86  (7.2%)  566.68  
(5.1%)   -3.3% ( -14% -9%) 0.095
  HighTermDayOfYearSort  306.99  (8.9%)  301.37  
(8.6%)   -1.8% ( -17% -   17%) 0.506
  And3Terms  226.85  (5.6%)  222.90  
(6.6%)   -1.7% ( -13% -   11%) 0.369
   OrHighNotLow  307.70  (6.4%)  302.80  
(7.2%)   -1.6% ( -14% -   12%) 0.461
  OrNotHighHigh  185.78  (5.2%)  183.47  
(6.6%)   -1.2% ( -12% -   11%) 0.510
   Or3Terms  223.86  (7.1%)  221.64  
(7.7%)   -1.0% ( -14% -   14%) 0.674
 Phrase   26.60  (2.8%)   26.37  
(4.0%)   -0.9% (  -7% -6%) 0.423
  OrHighNotHigh  220.17  (5.3%)  218.70  
(5.8%)   -0.7% ( -11% -   10%) 0.702
   HighTerm  294.16  (7.0%)  292.23  
(6.7%)   -0.7% ( -13% -   13%) 0.761
 AndHighLow  595.88  (4.3%)  591.99  
(4.7%)   -0.7% (  -9% -8%) 0.648
 OrHighHigh  117.92  (2.3%)  117.19  
(5.0%)   -0.6% (  -7% -6%) 0.614
 AndHighMed  271.88  (7.2%)  270.24  
(7.9%)   -0.6% ( -14% -   15%) 0.800
AndHighHigh  119.61  (3.9%)  118.96  
(4.5%)   -0.5% (  -8% -8%) 0.683
OrStopWords   66.94  (7.5%)   66.58  
(7.0%)   -0.5% ( -13% -   15%) 0.814
  OrHighLow  468.84  (3.9%)  466.32  
(3.9%)   -0.5% (  -7% -7%) 0.662
  HighTermMonthSort  763.50  (2.3%)  761.25  
(2.9%)   -0.3% (  -5% -5%) 0.724
 TermDTSort  216.36  (5.8%)  215.86  
(4.0%)   -0.2% (  -9% -   10%) 0.885
  OrHighMed  197.37  (4.8%)  197.03  
(7.4%)   -0.2% ( -11% -   12%) 0.929
   HighTermTitleBDVSort   52.15  (7.5%)   52.11  
(4.7%)   -0.1% ( -11% -   13%) 0.963
And2Terms2StopWords  207.53  (4.6%)  207.84  
(5.7%)0.1% (  -9% -   10%) 0.927
   OrHighNotMed  284.45  (5.8%)  284.99  
(6.8%)0.2% ( -11% -   13%) 0.925
  HighTermTitleSort   97.13  (4.9%)   97.34  
(6.5%)0.2% ( -10% -   12%) 0.904
   AndStopWords   61.72  (5.1%)   61.86  
(5.5%)0.2% (  -9% -   11%) 0.892
  CountTerm 5714.94  (3.6%) 5737.59  
(5.3%)0.4% (  -8% -9%) 0.782
MedTerm  376.20  (6.7%)  377.74  
(7.4%)0.4% ( -12% -   15%) 0.855
   PKLookup  191.27  (3.0%)  192.19  
(3.5%)0.5% (  -5% -7%) 0.647
 IntNRQ  163.19 (12.4%)  164.32  
(8.5%)0.7% ( -18% -   24%) 0.837
 Or2Terms2StopWords  209.38  (3.9%)  210.84  
(6.3%)0.7% (  -9% -   11%) 0.673
LowTerm  472.83  (6.8%)  476.30  
(6.1%)0.7% ( -11% -   14%) 0.718
 OrHighRare  254.55  (4.9%)  256.78  
(7.7%)0.9% ( -11% -   14%) 0.667
   Wildcard   69.55  (3.4%)   70.32  
(5.0%)1.1% (  -7% -9%) 0.418
CountAndHighMed  233.44  (6.1%)  236.36  
(5.3%)1.3% (  -9% -   13%) 0.487
   CountAndHighHigh   95.44  (3.6%)   96.85  
(3.7%)1.5% (  -5% -9%) 0.201
Prefix3  180.59  (5.4%)  183.71  
(7.7%)1.7% ( -10% -   15%) 0.410
 CountOrHighMed  177.68  (6.6%)  181.10  
(7.5%)1.9% ( -11% -   17%) 0.387
   OrNotHighMed  264.61  (6.9%)  270.30  
(7.9%)2.1% ( -11% -   18%) 0.358
CountOrHighHigh  113.18  (7.4%)  118.20 
(11.8%)4.4% ( -13% -   25%) 0.155
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Remove redundant code in PointInSetQuery [lucene]

2024-10-14 Thread via GitHub


easyice opened a new pull request, #13905:
URL: https://github.com/apache/lucene/pull/13905

   Clean up unused variable `MergePointVisitor#sortedPackedPoints`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Make dynamic range facets value collection and sorting faster [lucene]

2024-10-14 Thread via GitHub


timgrein commented on issue #13760:
URL: https://github.com/apache/lucene/issues/13760#issuecomment-2411631641

   Cool, I've found some performance improvements (~10-15%), which can be 
reproduced through a new `jmh` benchmark I've added. I'll open a PR the next 
few days and tag you :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [SOLR-11191] SolrIndexSplitter: Init support for routing docs by _root_ when available [lucene-solr]

2024-10-14 Thread via GitHub


zkendall commented on code in PR #2686:
URL: https://github.com/apache/lucene-solr/pull/2686#discussion_r1799751372


##
solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java:
##
@@ -130,9 +131,26 @@ public SolrIndexSplitter(SplitIndexCommand cmd) {
 }
 routeFieldName = cmd.routeFieldName;
 if (routeFieldName == null) {
-  field = searcher.getSchema().getUniqueKeyField();
+  // To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+  // otherwise use the unique key field
+  field = searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+  if(field == null) {
+field = searcher.getSchema().getUniqueKeyField();
+  }
 } else  {
-  field = searcher.getSchema().getField(routeFieldName);
+  SchemaField uniqueField = searcher.getSchema().getUniqueKeyField();
+  if (uniqueField.getName().equals(routeFieldName)) {
+// Explicitly routing based on unique field
+// To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+// otherwise use the unique key field
+field = 
searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+if (field == null) {
+  field = searcher.getSchema().getUniqueKeyField();
+}
+  } else {
+// Custom routing
+field = searcher.getSchema().getField(routeFieldName);
+  }

Review Comment:
   It kinda seems like the opposite. The routeFieldName seems to be 
pre-populated by the zk collection state (not from split caller (api doesn't 
support setting afaict)). Doesn't that mean it'll usually be set?
   
   
https://github.com/apache/lucene-solr/blob/2d63a9d1208bbf950135b90496268b0a40e119b5/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java#L118-L135



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [SOLR-11191] SolrIndexSplitter: Init support for routing docs by _root_ when available [lucene-solr]

2024-10-14 Thread via GitHub


zkendall commented on code in PR #2686:
URL: https://github.com/apache/lucene-solr/pull/2686#discussion_r1799751372


##
solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java:
##
@@ -130,9 +131,26 @@ public SolrIndexSplitter(SplitIndexCommand cmd) {
 }
 routeFieldName = cmd.routeFieldName;
 if (routeFieldName == null) {
-  field = searcher.getSchema().getUniqueKeyField();
+  // To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+  // otherwise use the unique key field
+  field = searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+  if(field == null) {
+field = searcher.getSchema().getUniqueKeyField();
+  }
 } else  {
-  field = searcher.getSchema().getField(routeFieldName);
+  SchemaField uniqueField = searcher.getSchema().getUniqueKeyField();
+  if (uniqueField.getName().equals(routeFieldName)) {
+// Explicitly routing based on unique field
+// To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+// otherwise use the unique key field
+field = 
searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+if (field == null) {
+  field = searcher.getSchema().getUniqueKeyField();
+}
+  } else {
+// Custom routing
+field = searcher.getSchema().getField(routeFieldName);
+  }

Review Comment:
   It kinda seems like the opposite. The routeFieldName seems to be 
pre-populated by the zk collection state (not from split caller (api doesn't 
support setting afaict)). Doesn't that mean it'll usually be set or at least 
could be set enough to support it? (I believe my use-case does, but I'll learn 
more when I setup e2e test in this thing.)
   
   
https://github.com/apache/lucene-solr/blob/2d63a9d1208bbf950135b90496268b0a40e119b5/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java#L118-L135



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [SOLR-11191] SolrIndexSplitter: Init support for routing docs by _root_ when available [lucene-solr]

2024-10-14 Thread via GitHub


zkendall commented on code in PR #2686:
URL: https://github.com/apache/lucene-solr/pull/2686#discussion_r1799751372


##
solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java:
##
@@ -130,9 +131,26 @@ public SolrIndexSplitter(SplitIndexCommand cmd) {
 }
 routeFieldName = cmd.routeFieldName;
 if (routeFieldName == null) {
-  field = searcher.getSchema().getUniqueKeyField();
+  // To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+  // otherwise use the unique key field
+  field = searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+  if(field == null) {
+field = searcher.getSchema().getUniqueKeyField();
+  }
 } else  {
-  field = searcher.getSchema().getField(routeFieldName);
+  SchemaField uniqueField = searcher.getSchema().getUniqueKeyField();
+  if (uniqueField.getName().equals(routeFieldName)) {
+// Explicitly routing based on unique field
+// To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+// otherwise use the unique key field
+field = 
searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+if (field == null) {
+  field = searcher.getSchema().getUniqueKeyField();
+}
+  } else {
+// Custom routing
+field = searcher.getSchema().getField(routeFieldName);
+  }

Review Comment:
   It kinda seems like the opposite. The routeFieldName seems to be 
pre-populated by the zk collection state (not from split caller (api doesn't 
support setting afaict)). Doesn't that mean it'll usually be set or at least 
could be set enough to support it? (I believe my use-case does, but I'll learn 
more when I setup more thorough test in this thing.)
   
   
https://github.com/apache/lucene-solr/blob/2d63a9d1208bbf950135b90496268b0a40e119b5/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java#L118-L135



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [SOLR-11191] SolrIndexSplitter: Init support for routing docs by _root_ when available [lucene-solr]

2024-10-14 Thread via GitHub


zkendall commented on code in PR #2686:
URL: https://github.com/apache/lucene-solr/pull/2686#discussion_r1799751372


##
solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java:
##
@@ -130,9 +131,26 @@ public SolrIndexSplitter(SplitIndexCommand cmd) {
 }
 routeFieldName = cmd.routeFieldName;
 if (routeFieldName == null) {
-  field = searcher.getSchema().getUniqueKeyField();
+  // To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+  // otherwise use the unique key field
+  field = searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+  if(field == null) {
+field = searcher.getSchema().getUniqueKeyField();
+  }
 } else  {
-  field = searcher.getSchema().getField(routeFieldName);
+  SchemaField uniqueField = searcher.getSchema().getUniqueKeyField();
+  if (uniqueField.getName().equals(routeFieldName)) {
+// Explicitly routing based on unique field
+// To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+// otherwise use the unique key field
+field = 
searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+if (field == null) {
+  field = searcher.getSchema().getUniqueKeyField();
+}
+  } else {
+// Custom routing
+field = searcher.getSchema().getField(routeFieldName);
+  }

Review Comment:
   It kinda seems like the opposite. The routeFieldName seems to be 
pre-populated by the zk collection state (not from split caller (api doesn't 
support setting afaict)). Doesn't that mean it'll usually be set or at least 
could be set enough to support it?
   
   My own collection has this in zk state from cursory glance. I'll learn more 
how this behaves when I setup a broader test-case.
   ```
   "router":{
 "field":"id",
 "name":"compositeId"},
   ```
   
   
   
https://github.com/apache/lucene-solr/blob/2d63a9d1208bbf950135b90496268b0a40e119b5/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java#L118-L135



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid allocating liveDocs for no soft-deletes (#13895) [lucene]

2024-10-14 Thread via GitHub


dnhatn merged PR #13903:
URL: https://github.com/apache/lucene/pull/13903


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [SOLR-11191] SolrIndexSplitter: Init support for routing docs by _root_ when available [lucene-solr]

2024-10-14 Thread via GitHub


zkendall commented on code in PR #2686:
URL: https://github.com/apache/lucene-solr/pull/2686#discussion_r1799751372


##
solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java:
##
@@ -130,9 +131,26 @@ public SolrIndexSplitter(SplitIndexCommand cmd) {
 }
 routeFieldName = cmd.routeFieldName;
 if (routeFieldName == null) {
-  field = searcher.getSchema().getUniqueKeyField();
+  // To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+  // otherwise use the unique key field
+  field = searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+  if(field == null) {
+field = searcher.getSchema().getUniqueKeyField();
+  }
 } else  {
-  field = searcher.getSchema().getField(routeFieldName);
+  SchemaField uniqueField = searcher.getSchema().getUniqueKeyField();
+  if (uniqueField.getName().equals(routeFieldName)) {
+// Explicitly routing based on unique field
+// To support routing child documents, use the root field if it exists 
(which would be populated with unique field),
+// otherwise use the unique key field
+field = 
searcher.getSchema().getFieldOrNull(IndexSchema.ROOT_FIELD_NAME);
+if (field == null) {
+  field = searcher.getSchema().getUniqueKeyField();
+}
+  } else {
+// Custom routing
+field = searcher.getSchema().getField(routeFieldName);
+  }

Review Comment:
   It kinda seems like the opposite. The routeFieldName seems to be 
pre-populated by the zk collection state (not from split caller (api doesn't 
support setting afaict)). Doesn't that mean it'll usually be set or at least 
could be set enough to support it?
   
   
https://github.com/apache/lucene-solr/blob/2d63a9d1208bbf950135b90496268b0a40e119b5/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java#L118-L135
   
   My own collection has this in zk state from cursory glance. I'll learn more 
how this behaves when I setup a broader test-case.
   ```
   "router":{
 "field":"id",
 "name":"compositeId"},
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-14 Thread via GitHub


uschindler commented on code in PR #13906:
URL: https://github.com/apache/lucene/pull/13906#discussion_r1799826647


##
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -563,6 +563,13 @@ public final MemorySegmentIndexInput slice(String 
sliceDescription, long offset,
 return buildSlice(sliceDescription, offset, length);
   }
 
+  public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
+if (offset == 0 && length == this.length) {
+  return this;

Review Comment:
   Yes, we need a test!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-14 Thread via GitHub


uschindler commented on code in PR #13906:
URL: https://github.com/apache/lucene/pull/13906#discussion_r1799835249


##
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -563,6 +563,13 @@ public final MemorySegmentIndexInput slice(String 
sliceDescription, long offset,
 return buildSlice(sliceDescription, offset, length);
   }
 
+  public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
+if (offset == 0 && length == this.length) {
+  return this;

Review Comment:
   But this code is perfectly fine as random access slices are stateless, so we 
can simply return the current instance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-14 Thread via GitHub


original-brownbear opened a new pull request, #13906:
URL: https://github.com/apache/lucene/pull/13906

   No need to slice when it's a clone that is to be used with random access, we 
already enforce thread access rules anyway. Also, no point in copying the 
memory segment instance via noop slicing.
   
   This doesn't really have much of a runtime impact (looking at wikimedium):
   
   ```
  BrowseDayOfYearSSDVFacets6.18  (8.5%)6.07  
(8.7%)   -1.8% ( -17% -   16%) 0.347
 IntNRQ   21.39  (3.3%)   21.13  
(3.8%)   -1.2% (  -8% -6%) 0.115
   HighIntervalsOrdered9.76  (4.3%)9.69  
(5.7%)   -0.7% ( -10% -9%) 0.511
  HighTermDayOfYearSort  156.54  (2.4%)  155.52  
(2.9%)   -0.7% (  -5% -4%) 0.279
LowSpanNear   61.87  (2.3%)   61.48  
(2.8%)   -0.6% (  -5% -4%) 0.273
 TermDTSort   53.99  (1.3%)   53.71  
(1.6%)   -0.5% (  -3% -2%) 0.108
   HighSpanNear6.69  (4.4%)6.66  
(4.3%)   -0.4% (  -8% -8%) 0.697
  OrHighLow  479.99  (2.1%)  479.02  
(2.2%)   -0.2% (  -4% -4%) 0.670
 Fuzzy1   62.30  (1.2%)   62.17  
(1.3%)   -0.2% (  -2% -2%) 0.480
  HighTermMonthSort  556.56  (1.3%)  555.47  
(1.6%)   -0.2% (  -3% -2%) 0.546
   AndHighHighDayTaxoFacets5.60  (4.7%)5.59  
(3.7%)   -0.2% (  -8% -8%) 0.844
 Fuzzy2   61.63  (1.3%)   61.54  
(1.6%)   -0.1% (  -3% -2%) 0.651
Respell   34.27  (1.5%)   34.22  
(1.4%)   -0.1% (  -3% -2%) 0.702
LowSloppyPhrase   19.76  (3.9%)   19.74  
(3.5%)   -0.1% (  -7% -7%) 0.880
MedIntervalsOrdered   19.12  (2.6%)   19.10  
(2.6%)   -0.1% (  -5% -5%) 0.872
   Wildcard  115.59  (2.8%)  115.51  
(2.8%)   -0.1% (  -5% -5%) 0.905
Prefix3  318.40  (1.4%)  318.82  
(1.9%)0.1% (  -3% -3%) 0.727
AndHighMedDayTaxoFacets   52.44  (2.5%)   52.51  
(2.3%)0.1% (  -4% -5%) 0.794
   BrowseDateSSDVFacets1.32  (6.3%)1.33  
(4.8%)0.2% ( -10% -   11%) 0.895
   PKLookup  157.21  (3.4%)  157.48  
(3.8%)0.2% (  -6% -7%) 0.829
 HighPhrase   95.48  (2.3%)   95.69  
(2.5%)0.2% (  -4% -5%) 0.682
  BrowseMonthTaxoFacets4.89  (0.4%)4.90  
(0.4%)0.2% (   0% -1%) 0.013
MedSloppyPhrase   13.62  (2.8%)   13.66  
(2.5%)0.3% (  -4% -5%) 0.615
  MedPhrase   23.07  (2.1%)   23.15  
(2.4%)0.3% (  -4% -4%) 0.495
   HighSloppyPhrase   18.02  (5.3%)   18.09  
(4.0%)0.4% (  -8% -   10%) 0.713
MedSpanNear8.17  (2.9%)8.20  
(2.9%)0.4% (  -5% -6%) 0.552
  LowPhrase  112.30  (2.0%)  112.75  
(2.0%)0.4% (  -3% -4%) 0.373
 AndHighLow  607.00  (5.8%)  609.50  
(5.4%)0.4% ( -10% -   12%) 0.740
LowTerm  204.43  (2.0%)  205.31  
(2.7%)0.4% (  -4% -5%) 0.425
  HighTermTitleSort   17.83  (1.4%)   17.91  
(1.4%)0.4% (  -2% -3%) 0.159
 OrHighMedDayTaxoFacets3.09  (3.9%)3.11  
(3.8%)0.5% (  -6% -8%) 0.599
LowIntervalsOrdered   20.57  (3.1%)   20.67  
(3.9%)0.5% (  -6% -7%) 0.550
   HighTermTitleBDVSort5.50  (3.4%)5.53  
(3.7%)0.6% (  -6% -7%) 0.468
  OrHighMed   89.37  (4.0%)   89.89  
(5.4%)0.6% (  -8% -   10%) 0.583
 AndHighMed   71.17  (4.3%)   71.61  
(5.2%)0.6% (  -8% -   10%) 0.555
   MedTermDayTaxoFacets   20.66  (4.0%)   20.81  
(4.4%)0.7% (  -7% -9%) 0.458
 OrHighHigh   32.44  (6.3%)   32.70  
(8.0%)0.8% ( -12% -   16%) 0.625
  BrowseMonthSSDVFacets6.29  (7.4%)6.34 
(10.5%)0.8% ( -15% -   20%) 0.684
   OrNotHighLow  581.89  (2.8%)  586.83  
(2.6%)0.8% (  -4% -6%) 0.164
  OrNotHighHigh  208.03  (5.2%)  209.97  
(6.0%)0.9% (  -9% -   12%) 0.457
  

Re: [PR] Dry up EverythingEnum and BlockDocsEnum in Lucene912PostingsReader [lucene]

2024-10-14 Thread via GitHub


original-brownbear commented on code in PR #13901:
URL: https://github.com/apache/lucene/pull/13901#discussion_r1799704584


##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -429,9 +403,44 @@ public PostingsEnum reset(IntBlockTermState termState, int 
flags) throws IOExcep
   }
   level1DocCountUpto = 0;
   docBufferUpto = BLOCK_SIZE;
-  freqFP = -1;
   return this;
 }
+  }
+
+  final class BlockDocsEnum extends AbstractPostingsEnum {
+
+private final long[] freqBuffer = new long[BLOCK_SIZE];
+
+private boolean needsFreq; // true if the caller actually needs frequencies
+private long freqFP;
+
+public BlockDocsEnum(FieldInfo fieldInfo) {
+  super(fieldInfo);
+}
+
+public boolean canReuse(IndexInput docIn, FieldInfo fieldInfo) {
+  final IndexOptions options = fieldInfo.getIndexOptions();
+  return docIn == Lucene912PostingsReader.this.docIn
+  && indexHasFreq == (options.compareTo(IndexOptions.DOCS_AND_FREQS) 
>= 0);
+}
+
+public PostingsEnum reset(IntBlockTermState termState, int flags) throws 
IOException {
+  startReset(termState);
+  if (pforUtil == null && docFreq >= BLOCK_SIZE) {
+pforUtil = new PForUtil(new ForUtil());
+forDeltaUtil = new ForDeltaUtil();
+  }
+  totalTermFreq = indexHasFreq ? termState.totalTermFreq : docFreq;
+
+  this.needsFreq = PostingsEnum.featureRequested(flags, 
PostingsEnum.FREQS);
+  if (indexHasFreq == false || needsFreq == false) {
+// Filling this buffer may not be cheap when doing primary key 
lookups, so we make sure to
+// not fill more than `docFreq` entries.
+Arrays.fill(freqBuffer, 0, Math.min(ForUtil.BLOCK_SIZE, docFreq), 1);
+  }
+  freqFP = -1;
+  return finishReset(termState);

Review Comment:
   I tried :) I found it quite helpful to dedup this, even if the reader has to 
jump around a little more, they also have some upside from being able to rely 
on the thing working the same for both versions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Only call madvise when necessary. [lucene]

2024-10-14 Thread via GitHub


jpountz opened a new pull request, #13907:
URL: https://github.com/apache/lucene/pull/13907

   This commit tries to save calls to `madvise` which are not necessary, either 
because they map to the OS' default, or because the advice would be overridden 
later on anyway. I have not noticed specific problems with this, but it seems 
desirable to keep calls to `madvise` to a minimum.
   
   As a consequence:
- Files that are open with `ReadAdvice.NORMAL` do not call `madvise` since 
this is the OS' default.
- Compound files are always open with `ReadAdvice.NORMAL`, and the actual 
is only set when opening sub files of these compound files.
   
   To make the latter less trappy, the `IOContext` parameter has been removed 
from `CompoundFormat#getCompoundReader`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Dry up EverythingEnum and BlockDocsEnum in Lucene912PostingsReader [lucene]

2024-10-14 Thread via GitHub


jpountz commented on code in PR #13901:
URL: https://github.com/apache/lucene/pull/13901#discussion_r1799409756


##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -429,9 +403,44 @@ public PostingsEnum reset(IntBlockTermState termState, int 
flags) throws IOExcep
   }
   level1DocCountUpto = 0;
   docBufferUpto = BLOCK_SIZE;
-  freqFP = -1;
   return this;
 }
+  }
+
+  final class BlockDocsEnum extends AbstractPostingsEnum {
+
+private final long[] freqBuffer = new long[BLOCK_SIZE];
+
+private boolean needsFreq; // true if the caller actually needs frequencies
+private long freqFP;
+
+public BlockDocsEnum(FieldInfo fieldInfo) {
+  super(fieldInfo);
+}
+
+public boolean canReuse(IndexInput docIn, FieldInfo fieldInfo) {
+  final IndexOptions options = fieldInfo.getIndexOptions();
+  return docIn == Lucene912PostingsReader.this.docIn
+  && indexHasFreq == (options.compareTo(IndexOptions.DOCS_AND_FREQS) 
>= 0);
+}
+
+public PostingsEnum reset(IntBlockTermState termState, int flags) throws 
IOException {
+  startReset(termState);
+  if (pforUtil == null && docFreq >= BLOCK_SIZE) {
+pforUtil = new PForUtil(new ForUtil());
+forDeltaUtil = new ForDeltaUtil();
+  }
+  totalTermFreq = indexHasFreq ? termState.totalTermFreq : docFreq;
+
+  this.needsFreq = PostingsEnum.featureRequested(flags, 
PostingsEnum.FREQS);
+  if (indexHasFreq == false || needsFreq == false) {
+// Filling this buffer may not be cheap when doing primary key 
lookups, so we make sure to
+// not fill more than `docFreq` entries.
+Arrays.fill(freqBuffer, 0, Math.min(ForUtil.BLOCK_SIZE, docFreq), 1);
+  }
+  freqFP = -1;
+  return finishReset(termState);

Review Comment:
   Now readers of this code need to look up 3 different places to understand 
what `reset()` does: `startReset()`, `reset()` and `finishReset()`. Can we find 
more meaningful names to the two helper methods or keep the code duplicated for 
the sake of making it easier to read?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-10-14 Thread via GitHub


jpountz merged PR #12868:
URL: https://github.com/apache/lucene/pull/12868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Create a community metrics dashboard [lucene]

2024-10-14 Thread via GitHub


stefanvodita commented on issue #13896:
URL: https://github.com/apache/lucene/issues/13896#issuecomment-2411304207

   That sounds like a plan! I'd be thrilled if you were driving this 
@prudhvigodithi - thank you for offering your help.
   It would also be great to get feedback from others on which metrics they'd 
find most useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Create a community metrics dashboard [lucene]

2024-10-14 Thread via GitHub


prudhvigodithi commented on issue #13896:
URL: https://github.com/apache/lucene/issues/13896#issuecomment-2411592601

   Thanks @stefanvodita, I would like to have the maintainers of the repo to 
take the initial stab at this metrics dashboard (come up the server, app, 
choosing the database and deploying the resources), and later as mentioned I 
will be happy to contribute to the dashboard design and features.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] `IndexOrDocValuesQuery` does not support query highlighting [lucene]

2024-10-14 Thread via GitHub


prudhvigodithi commented on issue #12686:
URL: https://github.com/apache/lucene/issues/12686#issuecomment-2411600774

   Hey @harshavamsi the PR to support the `IndexOrDocValuesQuery` for query 
highlighting is now merged.
   Thanks.
   @getsaurabh02 @msfroh


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Dry up EverythingEnum and BlockDocsEnum in Lucene912PostingsReader [lucene]

2024-10-14 Thread via GitHub


original-brownbear commented on PR #13901:
URL: https://github.com/apache/lucene/pull/13901#issuecomment-241157

   Thanks Adrien!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Dry up EverythingEnum and BlockDocsEnum in Lucene912PostingsReader [lucene]

2024-10-14 Thread via GitHub


original-brownbear merged PR #13901:
URL: https://github.com/apache/lucene/pull/13901


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Added support for highlighting `IndexOrDocValuesQuery` [lucene]

2024-10-14 Thread via GitHub


prudhvigodithi commented on PR #13902:
URL: https://github.com/apache/lucene/pull/13902#issuecomment-2411600165

   Thanks for the review and approval @mkhludnev @jpountz.
   @getsaurabh02 @dblock


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org