[GitHub] [lucene] javanna opened a new pull request, #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


javanna opened a new pull request, #11985:
URL: https://github.com/apache/lucene/pull/11985

   ExitableTerms should not iterate through the terms to retrieve min and max 
when the wrapped implementation has the values cached (e.g. OrdsFieldReader)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase opened a new issue, #11986: Polygons failing to tessellate

2022-11-29 Thread GitBox


iverase opened a new issue, #11986:
URL: https://github.com/apache/lucene/issues/11986

   ### Description
   
   A user of Elasticsearch reported a few polygons that are failing 
tessellation, for example:
   
   ```
   { "type": "MultiPolygon", "coordinates": [ [ [ [ 145.8376722, -41.3625237 ], 
[ 145.9119483, -41.3625237 ], [ 145.9124671, -41.3604861 ], [ 145.9165094, 
-41.3592481 ], [ 145.918536, -41.3618518 ], [ 145.9268395, -41.3620262 ], [ 
145.9268834, -41.3625237 ], [ 145.9296057, -41.3625237 ], [ 145.930067, 
-41.3596353 ], [ 145.9359791, -41.3604495 ], [ 145.9338908, -41.3623443 ], [ 
145.9337218, -41.3625237 ], [ 145.9406629, -41.3625237 ], [ 145.9443986, 
-41.3614739 ], [ 145.9452851, -41.355895 ], [ 145.9399764, -41.3553728 ], [ 
145.9386004, -41.3536032 ], [ 145.9338835, -41.3528597 ], [ 145.9329233, 
-41.3550551 ], [ 145.9312052, -41.3560932 ], [ 145.9291784, -41.3534897 ], [ 
145.9278489, -41.3527145 ], [ 145.9269703, -41.350457 ], [ 145.9290943, 
-41.3508852 ], [ 145.9416658, -41.3508101 ], [ 145.9427361, -41.3480878 ], [ 
145.9451005, -41.3493199 ], [ 145.9451182, -41.3511633 ], [ 145.9475533, 
-41.3501138 ], [ 145.9492273, -41.3481171 ], [ 145.9510286, -41.348866 ], [ 
145.9510476, -41.3507
 345 ], [ 145.9539715, -41.3514479 ], [ 145.9540082, -41.3551093 ], [ 
145.9570346, -41.3552711 ], [ 145.9569774, -41.3494518 ], [ 145.9596279, 
-41.3485645 ], [ 145.9601364, -41.347088 ], [ 145.9574997, -41.3464911 ], [ 
145.956217, -41.344634 ], [ 145.9536965, -41.3434193 ], [ 145.9543658, 
-41.3396175 ], [ 145.9602405, -41.33979 ], [ 145.9608139, -41.3352189 ], [ 
145.9742096, -41.3356161 ], [ 145.9763708, -41.332821 ], [ 145.980498, 
-41.3332742 ], [ 145.9817811, -41.3351309 ], [ 145.9866889, -41.3355163 ], [ 
145.987522, -41.3378172 ], [ 145.9892258, -41.3372247 ], [ 145.9911234, 
-41.3352083 ], [ 145.9995898, -41.3360013 ], [ 145.9990297, -41.3400796 ], [ 
145.9955299, -41.3402743 ], [ 145.9969167, -41.3421148 ], [ 146.0, -41.3424174 
], [ 146.0, -41.3174825 ], [ 145.9960043, -41.3171988 ], [ 145.9969866, 
-41.3153884 ], [ 146.0, -41.3155285 ], [ 146.0, -41.2276172 ], [ 145.9324631, 
-41.2276172 ], [ 145.9327736, -41.2277122 ], [ 145.9316015, -41.2315586 ], [ 
145.9286354, -41.2311096 ], [ 
 145.9292451, -41.2276172 ], [ 145.8632073, -41.2276172 ], [ 145.8630275, 
-41.2282061 ], [ 145.8600617, -41.2277558 ], [ 145.860086, -41.2276172 ], [ 
145.8485946, -41.2276172 ], [ 145.8484023, -41.2281236 ], [ 145.8456234, 
-41.2293415 ], [ 145.8455167, -41.2347984 ], [ 145.849049, -41.2356306 ], [ 
145.8478493, -41.2411272 ], [ 145.8455135, -41.2421517 ], [ 145.845, 
-41.2533017 ], [ 145.8376722, -41.2529579 ], [ 145.8376722, -41.3018446 ], [ 
145.8380557, -41.3025436 ], [ 145.8376722, -41.3024854 ], [ 145.8376722, 
-41.3504352 ], [ 145.8381764, -41.3505628 ], [ 145.8376722, -41.3509186 ], [ 
145.8376722, -41.3625237 ] ], [ [ 145.9176756, -41.3534899 ], [ 145.9186036, 
-41.3469601 ], [ 145.921422, -41.3459251 ], [ 145.9211254, -41.3480053 ], [ 
145.9262939, -41.349043 ], [ 145.9269703, -41.350457 ], [ 145.9216876, 
-41.3512921 ], [ 145.920889, -41.3524534 ], [ 145.9176756, -41.3534899 ] ], [ [ 
145.933483, -41.3211897 ], [ 145.9338125, -41.319729 ], [ 145.9396134, 
-41.3183894 ], [ 145.941287,
  -41.3163932 ], [ 145.9428066, -41.3187685 ], [ 145.939747, -41.3201429 ], [ 
145.9380912, -41.3221196 ], [ 145.933483, -41.3211897 ] ], [ [ 145.8947524, 
-41.3300813 ], [ 145.8970756, -41.3298815 ], [ 145.8974562, -41.3326266 ], [ 
145.8951329, -41.3328264 ], [ 145.8947524, -41.3300813 ] ], [ [ 145.8677917, 
-41.3343222 ], [ 145.8690228, -41.3318872 ], [ 145.8733923, -41.3323222 ], [ 
145.8742228, -41.3346236 ], [ 145.8677917, -41.3343222 ] ], [ [ 145.8558352, 
-41.3244434 ], [ 145.8576672, -41.3227166 ], [ 145.8587641, -41.3248396 ], [ 
145.8558352, -41.3244434 ] ], [ [ 145.8584175, -41.3336907 ], [ 145.8594025, 
-41.3318809 ], [ 145.8619189, -41.3323305 ], [ 145.861191, -41.3338904 ], [ 
145.8584175, -41.3336907 ] ], [ [ 145.8733418, -41.3405334 ], [ 145.8751737, 
-41.3388063 ], [ 145.8762712, -41.3409293 ], [ 145.8733418, -41.3405334 ] ], [ 
[ 145.8762712, -41.3409293 ], [ 145.8795968, -41.3419639 ], [ 145.8785325, 
-41.3457195 ], [ 145.8772987, -41.3451165 ], [ 145.8762712, -41.3409293 ] ]
 , [ [ 145.8849106, -41.343891 ], [ 145.8883959, -41.3435914 ], [ 145.8886495, 
-41.3454216 ], [ 145.8851641, -41.3457212 ], [ 145.8849106, -41.343891 ] ], [ [ 
145.8817582, -41.3421182 ], [ 145.8859167, -41.3405961 ], [ 145.8849106, 
-41.343891 ], [ 145.8817582, -41.3421182 ] ], [ [ 145.8788123, -41.3378897 ], [ 
145.8819175, -41.3363389 ], [ 145.883115, -41.3394154 ], [ 145.8788123, 
-41.3378897 ] ], [ [ 145.8853326, -41.3299653 ], [ 145.175, -41.3296657 ], 
[ 145.8891978, -41.3324108 ], [ 145.8857129, -41.3327104 ], [ 145.8853326, 
-41.3299653 ] ], [ [ 145.9059876, -41.3263368 ], [ 145.9094723, -41.3260368 ], 
[ 145.9097262, -41.3278668 ], [ 145.9062415, -41.3281668 ], [ 145.9059876, 
-41.3263368 ] ], [ [ 145.897955

[GitHub] [lucene] jpountz commented on a diff in pull request #11984: Add exponential growth to TimeLimitingBulkScorer

2022-11-29 Thread GitBox


jpountz commented on code in PR #11984:
URL: https://github.com/apache/lucene/pull/11984#discussion_r1034826332


##
lucene/core/src/test/org/apache/lucene/search/TestTimeLimitingBulkScorer.java:
##
@@ -62,6 +66,44 @@ public void testTimeLimitingBulkScorer() throws Exception {
 directory.close();
   }
 
+  public void testExponentialRate() throws Exception {
+var bulkScorer =
+new BulkScorer() {
+  int expectedInterval = TimeLimitingBulkScorer.INTERVAL;
+  int lastInterval = 0;
+  int runs = TestUtil.nextInt(random(), 1, 100);
+
+  @Override
+  public int score(LeafCollector collector, Bits acceptDocs, int min, 
int max)
+  throws IOException {
+var difference = max - min;
+// the rate shouldn't overflow - only increase or remain equal
+assertTrue("Rate should only go up", difference >= lastInterval);
+assertEquals("Incorrect rate encountered", expectedInterval, 
difference);
+
+lastInterval = difference;
+// use integer sum since the exponential growth formula yields 
different result due to
+// rounding
+expectedInterval = expectedInterval + expectedInterval / 2;
+// overflow - stop at the previous one
+if (expectedInterval < 0) {
+  expectedInterval = lastInterval;
+}
+// keep going or finish the test?
+return --runs == 0 ? DocIdSetIterator.NO_MORE_DOCS : 0;

Review Comment:
   Why would we end prematurely instead of doing the checks on the full range? 
Is it to avoid the corner case of the last range, which might be smaller than 
the expected interval? Maybe the bulk scorer could record the  segment's 
`maxDoc` and allow the difference to be less than `expectedInterval` if the 
`max` doc is equal to `maxDoc`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] luyuncheng opened a new pull request, #11987: Make Decompressor release memory buffer

2022-11-29 Thread GitBox


luyuncheng opened a new pull request, #11987:
URL: https://github.com/apache/lucene/pull/11987

   ### Description
   we have a es cluster(31G heap, 96G Mem, 30 instance nodes), with many shards 
per node(4000 per nodes), when nodes do many bulk and search requests 
concurrently, we can see the jvm going high memory usage, and can not release 
the memory even with the frequently GC and stop all write/search requests. we 
have to restart the node for recovery the heap, like the following GC metrics 
shows 
   
![image](https://user-images.githubusercontent.com/12760367/204531778-0c8e24ce-a927-492c-a173-cb2905a43c41.png)
   
   we dumped the heap shows, `CompressingStoredFieldsReader` oncupied 70% heap: 
   
![image](https://user-images.githubusercontent.com/12760367/204548626-3cfe59b0-f007-4695-802e-0ed542f8f4a5.png)
   
   all this reader path2GC roots shows with following(maybe in search or write 
thread):
   
![image](https://user-images.githubusercontent.com/12760367/204550346-21a7b219-2051-4333-910d-27138def8f3b.png)
   
   ### Root cause
   i think the root cause that these threadlocal holds the referent, because 
`SegmentReader#getFieldsReader` calling following code, and Elasticsearch 
always using fixed thread_pool and never __calling 
`CloseableThreadLocal#purge`__
   
   ```
   In `lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java` 
defined fieldsReaderLocal
 final CloseableThreadLocal fieldsReaderLocal =
 new CloseableThreadLocal() {
   @Override
   protected StoredFieldsReader initialValue() {
 return fieldsReaderOrig.clone();
   }
 };
   ```
   
   we have searched some issues like [LUCENE-9959 
](https://issues.apache.org/jira/browse/LUCENE-9959),  and 
[LUCENE-10419](https://issues.apache.org/jira/browse/LUCENE-10519), there is no 
answer for this problem
   
   ---
   i compare between different jvm heap, and different LUCENE versions, i think 
the  root cause is `LZ4WithPresetDictDecompressor`  would allocate a buffer in 
the class and init 
   ```
   LZ4WithPresetDictDecompressor() {
 compressedLengths = new int[0];
 buffer = new byte[0];
   }
   ```
   
   when the elasticsearch instance doing `Stored-Fields-Read` operations, it 
will reallocate the JVM heap. but without release, because es 
`currentEngineReference` will keep the reference
   
![image](https://user-images.githubusercontent.com/12760367/204552928-9e8f2b5f-ce61-4cbb-93eb-bc1fee4a597a.png)
   
   ### Proposal
   i think we can releasee this buffer memory when the decompress is done. it 
shows that jvm can holds more segment readers in the heap.
   when these buffer memory can release, the heap metrics shows as following:
   
![image](https://user-images.githubusercontent.com/12760367/204555346-fd6be181-eb8b-4014-9cd1-1e17aee4282e.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase opened a new pull request, #11988: Fix algorithm that chooses the bridge between a polygon and a hole

2022-11-29 Thread GitBox


iverase opened a new pull request, #11988:
URL: https://github.com/apache/lucene/pull/11988

   The current algorithm seems to fail when the bridge is located on the first 
node of the iteration and there are another vertex with the same x and y. In 
that case we seem not to be able to find the right node because we actually 
never compute the tangent of the first node. 
   This change changes the iteration to make sure we compute the tangent for 
all nodes of the polygon which seems to help finding always the right bridge. 
We remove some logic we (well, it was me) added to try to fix this issue.
   
   fixes #11986 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox


jpountz commented on PR #11982:
URL: https://github.com/apache/lucene/pull/11982#issuecomment-1330752728

   > This shows 10-20% improvement in SSDVFacets and IntNRQ tests in 
lucenebench,
   
   Woah, impressive! Can you share the luceneutil output?
   
   The change looks good to me but I'd also like @uschindler to have a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons

2022-11-29 Thread GitBox


iverase commented on issue #11883:
URL: https://github.com/apache/lucene/issues/11883#issuecomment-1330777875

   Should we backport it to branch_9x?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] thecoop commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox


thecoop commented on PR #11982:
URL: https://github.com/apache/lucene/pull/11982#issuecomment-1330803516

   
[bytebuffer-get-output.log](https://github.com/apache/lucene/files/10114303/bytebuffer-get-output.log)
 - used wikimedium1m on my local machine
   
   Whilst some tests show as much as 40% improvement, there are also some tests 
that show a 5-6% regression. I can't say I fully understand what this means.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


rmuir commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1330867303

   I don't really think it is especially trappy, since the default 
implementation is `O(log N)` and works consistently with FilterTerms classes by 
default even if they are actually filtering the terms data in some way.
   
   But seems fine to look at making it abstract (as separate change), as long 
as there is an easy way to opt-in to the existing binary search impl. Would not 
be good to see that duplicated across a bunch of simple Terms subclasses (e.g. 
in indexer, in term vectors, docvalues, whatever).
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


rmuir commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1330875083

   Also I don't know what `OrdsFieldsReader` is, but the default Terms 
implementation is `O(1)` when the Terms subclass supports seek-by-ord. So what 
am I missing that makes it a trap?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] hendrikmuhs commented on pull request #460: LUCENE-10247 - reduce size of FSTs by relative coding

2022-11-29 Thread GitBox


hendrikmuhs commented on PR #460:
URL: https://github.com/apache/lucene/pull/460#issuecomment-1330949224

   Update after a long time. This branch was outdated, the best way to 
resurrect it was a _rebase_. However it seems github managed to keep the 
comments at the right places.
   
   If I can trust test coverage this is fully functional now.
   
   I will try to find a good benchmark, to see how much storage can be saved 
with this in a real scenario. If someone has a hint, let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


javanna commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331185580

   Credit goes to @dnhatn for pointing me to the bug, thanks! I am more than 
happy to fix it and see what else we can do to avoid this in the future. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dnhatn commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


dnhatn commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331199248

   `FieldReader` (i.e., blocktree implementation) returns minTerm and 
[maxTerm](https://github.com/apache/lucene/blob/0cc6f695363419ab0f89e2bef5e7595ace077345/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java)
 without doing any I/O, while the default implementation in `Terms` might use 
I/O for retrieving them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


javanna commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331233058

   Besides the I/O aspect, I found it counter intuitive that min and max are 
known and we end up doing work to compute them again. Even if it's not a lot of 
work, it seems like we could avoid it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox


uschindler commented on PR #11982:
URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331252650

   Hi, I can only look at this on the weekend as I am at a customer this week. 
When reading the description I was also on the wrong path anyways, because I 
did not understand what you want to change, because the getLongs() and 
getFloats() calls on IndexInput are always relative. I was not aware that you 
were talking about those crazy set of different view buffers for long/float 
that currently use the position() call to copy the position from main buffer.
   
   MY humble OPINION: I never agreed to that code and it is/was heavily broken 
(my personal opinion). Whenever I see that code I quickly look at other places 
just to not have the requirement to see it for more time (I get some "I need to 
puke!" reaction everytime I see it). In short: I would not spend too much time 
into byte buffers, sorry. MemorySegment is the way to go. In 
MemorySegmentIndexInput reading is a one liner and not views are needed, 
because MemorySegments allow unaligned accesses. In Java 19 you can also 
convert a ByteBuffer to a MemorySegment so you need no views anymore - I was 
thinking about at least fixing those 2 methods in ByteBuffer*s*IndexInput.
   
   About your benchmark, I do not trust it at the moment, because your code 
shows the follwoing warning on startup: "WARNING: Using incubator modules: 
jdk.incubator.vector". I have the feeling theres something in your code that 
uses this incubation module and may affect results. Where does the message 
comes from, I grepped through your patches, luckily you do not use incubator 
modules!
   
   When reading your attached log, where do you see an 10-20% improvement on 
IntNRQ or SSDV Facets?
 IntNRQ  776.60 (21.0%)  795.66 
(21.1%)2.5% ( -32% -   56%) 0.712
   That says only 2.5%.
   
   In short, I have to closer look at the code, but I do not see much 
imrpovement, sorry. The numbers are now +/- idetical to MemorySegmentIndexInput 
for the given candidate queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox


uschindler commented on PR #11982:
URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331265646

   In addition: The patch only removes the position on the duplicate, so the 
duplicates are still there. Unless you only read small arrays with 1 or 2 
longs, there postion call cannot have too much overhead.
   
   The PR here is more a cleanup but actually it only replaces position() by 
the absolute parameter. As we still work on a suplicate, we cave to read the 
position() anyways. So this PR is just a cleanup, but (see std dev) not an 
improvement. What I like is your code to create the buffer views, by using 
streams with method references it is better to read, but the use of positional 
read on the view vs. postion + relative read can't have much effect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox


uschindler commented on PR #11982:
URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331279562

   With 5 seconds runtime, the hotspot compiler did not even start to optimize 
using tiered compilation unless you add more command line flags, so its too 
short to be "warm".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit

2022-11-29 Thread GitBox


gsmiller commented on PR #11928:
URL: https://github.com/apache/lucene/pull/11928#issuecomment-1331288870

   @jpountz I re-ran some internal benchmarking with this change to highlight 
the speedup in cases where scoring isn't needed (at least some specific 
use-cases I'm looking at). These use-cases all involve a "disjunction filter," 
meaning a disjunction of terms that is used as a required clause. So something 
like `(+ (foo:bar foo:baz foo:zed) (...))`, where the `foo` field must take on 
one of the specified values to be considered a candidate match. To provide a 
sense of scale, on average, these filters have 40 different terms in them. 
Since these "filters" don't participate in scoring at all, it's a good 
candidate for this short-circuiting.
   
   In these benchmarks, I'm observing a 2.3% QPS improvement, and a 3.5% avg. 
latency reduction (5.9% p50 reduction / 3.5% p99 reduction). So the change 
appears to be helping this type of situation.
   
   As for whether-or-not this change would actually hurt other common use-cases 
that require scoring or second-phase checks, I re-ran `luceneutil` benchmarks 
(wikimedium10m) task and don't observe any regressions there (results below). 
It's possible there's a gap in our benchmarks though, and maybe there are some 
common use-cases not covered?
   
   ```
   TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
MedSloppyPhrase  115.00  (4.9%)  112.91  
(5.1%)   -1.8% ( -11% -8%) 0.249
  HighTermTitleSort  242.48  (3.2%)  238.66  
(4.3%)   -1.6% (  -8% -6%) 0.189
   HighSloppyPhrase   36.47  (3.8%)   36.12  
(3.9%)   -0.9% (  -8% -7%) 0.439
LowTerm 1766.82  (3.4%) 1752.12  
(3.3%)   -0.8% (  -7% -6%) 0.436
 HighPhrase  263.74  (3.4%)  261.71  
(2.3%)   -0.8% (  -6% -5%) 0.404
  OrHighLow  796.71  (2.7%)  790.71  
(2.6%)   -0.8% (  -5% -4%) 0.367
   BrowseDateSSDVFacets3.46  (6.4%)3.44  
(7.1%)   -0.7% ( -13% -   13%) 0.755
  HighTermMonthSort 3070.86  (4.5%) 3051.39  
(3.9%)   -0.6% (  -8% -8%) 0.635
Prefix3  111.76  (4.6%)  111.08  
(4.2%)   -0.6% (  -8% -8%) 0.658
  OrNotHighHigh 1249.27  (3.4%) 1242.30  
(3.8%)   -0.6% (  -7% -6%) 0.627
  BrowseMonthTaxoFacets   35.43  (1.6%)   35.23  
(2.2%)   -0.6% (  -4% -3%) 0.367
LowSloppyPhrase   61.49  (2.4%)   61.18  
(2.5%)   -0.5% (  -5% -4%) 0.512
   OrHighNotMed 1139.05  (3.9%) 1133.29  
(3.6%)   -0.5% (  -7% -7%) 0.671
BrowseRandomLabelTaxoFacets   20.33  (4.5%)   20.23  
(5.6%)   -0.5% ( -10% -   10%) 0.760
   HighTerm 1635.45  (4.2%) 1628.28  
(3.9%)   -0.4% (  -8% -8%) 0.735
  MedPhrase   46.41  (2.3%)   46.22  
(1.6%)   -0.4% (  -4% -3%) 0.529
  OrHighMed  193.55  (2.8%)  192.79  
(2.9%)   -0.4% (  -5% -5%) 0.663
  OrHighNotHigh  865.20  (3.0%)  862.38  
(3.9%)   -0.3% (  -6% -6%) 0.766
 AndHighLow 1566.83  (2.7%) 1562.47  
(2.7%)   -0.3% (  -5% -5%) 0.745
   MedTermDayTaxoFacets   48.00  (3.5%)   47.89  
(3.6%)   -0.2% (  -7% -7%) 0.836
  HighTermDayOfYearSort  812.55  (2.8%)  811.49  
(2.5%)   -0.1% (  -5% -5%) 0.878
MedTerm 2390.59  (3.5%) 2387.70  
(3.5%)   -0.1% (  -6% -7%) 0.912
   BrowseDateTaxoFacets   25.05  (9.2%)   25.03  
(9.3%)   -0.1% ( -16% -   20%) 0.984
  BrowseMonthSSDVFacets   16.00 (18.9%)   15.99 
(19.0%)   -0.1% ( -31% -   46%) 0.992
LowIntervalsOrdered  276.77  (3.5%)  276.66  
(3.6%)   -0.0% (  -6% -7%) 0.972
 OrHighHigh   26.41  (4.4%)   26.40  
(4.4%)   -0.0% (  -8% -9%) 0.978
MedSpanNear   35.38  (1.0%)   35.38  
(1.1%)   -0.0% (  -2% -2%) 0.969
 TermDTSort  648.08  (1.0%)  648.35  
(1.3%)0.0% (  -2% -2%) 0.909
  BrowseDayOfYearTaxoFacets   25.12  (9.5%)   25.13  
(9.5%)0.0% ( -17% -   21%) 0.988
   OrNotHighLow 1192.74  (3.1%) 1193.69  
(2.6%)0.1% (  -5% -5%) 0.929
   HighIntervalsOrdered1.79  (3.3%)1.79  
(3.3%)0.1% (  -6

[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox


uschindler commented on PR #11982:
URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331324696

   P.S. My comments are mostly about ByteBufferIndexInput used in 
MMapDircetory. The other code I did not write.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zkendall commented on a diff in pull request #976: SOLR-13749: Implement support for joining across collections with multiple shards

2022-11-29 Thread GitBox


zkendall commented on code in PR #976:
URL: https://github.com/apache/lucene-solr/pull/976#discussion_r1035387914


##
solr/core/src/java/org/apache/solr/search/join/XCJFQuery.java:
##
@@ -0,0 +1,380 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.search.join;
+
+import java.io.IOException;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DocIdSet;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRefBuilder;
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.solr.client.solrj.io.SolrClientCache;
+import org.apache.solr.client.solrj.io.Tuple;
+import org.apache.solr.client.solrj.io.eq.FieldEqualitor;
+import org.apache.solr.client.solrj.io.stream.CloudSolrStream;
+import org.apache.solr.client.solrj.io.stream.SolrStream;
+import org.apache.solr.client.solrj.io.stream.StreamContext;
+import org.apache.solr.client.solrj.io.stream.TupleStream;
+import org.apache.solr.client.solrj.io.stream.UniqueStream;
+import org.apache.solr.client.solrj.io.stream.expr.StreamExpression;
+import 
org.apache.solr.client.solrj.io.stream.expr.StreamExpressionNamedParameter;
+import org.apache.solr.cloud.CloudDescriptor;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.cloud.ClusterState;
+import org.apache.solr.common.cloud.DocRouter;
+import org.apache.solr.common.cloud.Slice;
+import org.apache.solr.common.params.CommonParams;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.search.BitDocSet;
+import org.apache.solr.search.DocSet;
+import org.apache.solr.search.DocSetUtil;
+import org.apache.solr.search.Filter;
+import org.apache.solr.search.SolrIndexSearcher;
+
+public class XCJFQuery extends Query {
+
+  protected final String query;
+  protected final String zkHost;
+  protected final String solrUrl;
+  protected final String collection;
+  protected final String fromField;
+  protected final String toField;
+  protected final boolean routedByJoinKey;
+
+  protected final long timestamp;
+  protected final int ttl;
+
+  protected SolrParams otherParams;
+  protected String otherParamsString;
+
+  public XCJFQuery(String query, String zkHost, String solrUrl, String 
collection, String fromField, String toField,
+   boolean routedByJoinKey, int ttl, SolrParams otherParams) {
+
+this.query = query;
+this.zkHost = zkHost;
+this.solrUrl = solrUrl;
+this.collection = collection;
+this.fromField = fromField;
+this.toField = toField;
+this.routedByJoinKey = routedByJoinKey;
+
+this.timestamp = System.nanoTime();
+this.ttl = ttl;
+
+this.otherParams = otherParams;
+// SolrParams doesn't implement equals(), so use this string to compare 
them
+if (otherParams != null) {
+  this.otherParamsString = otherParams.toString();
+}
+  }
+
+  private interface JoinKeyCollector {
+void collect(Object value) throws IOException;
+DocSet getDocSet() throws IOException;
+  }
+
+  private class TermsJoinKeyCollector implements JoinKeyCollector {
+
+FieldType fieldType;
+SolrIndexSearcher searcher;
+
+TermsEnum termsEnum;
+BytesRefBuilder bytes;
+PostingsEnum postingsEnum;
+
+FixedBitSet bitSet;
+
+public TermsJoinKeyCollector(FieldType fieldType, Terms terms, 
SolrIndexSearcher searcher) throws IOException {
+  this.fieldType = fieldType;
+  this.searcher = searcher;
+
+  termsEnum = terms.i

[GitHub] [lucene] DaddyWri commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons

2022-11-29 Thread GitBox


DaddyWri commented on issue #11883:
URL: https://github.com/apache/lucene/issues/11883#issuecomment-1331472645

   Feel free to.  It should be a simple cherry-pick.  I'm tied up with work 
escalations myself.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


rmuir commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331604920

   i'm not opposed to the PR, just disagree with the bug or trap aspect. To me 
this is just a micro-optimization and I'm questioning the need to make anything 
abstract in our APIs (which makes them harder to implement).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox


rmuir commented on PR #11985:
URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331608776

   definitely not a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11987: Make Decompressor release memory buffer

2022-11-29 Thread GitBox


rmuir commented on PR #11987:
URL: https://github.com/apache/lucene/pull/11987#issuecomment-1331625234

   too many shards. need to make sure this doesn't cause performance regression 
for normal use-cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11987: Make Decompressor release memory buffer

2022-11-29 Thread GitBox


rmuir commented on PR #11987:
URL: https://github.com/apache/lucene/pull/11987#issuecomment-1331627557

   fwiw, assigning the 0-length array just makes even more waste. Still keeping 
logic to use arrayutil.grow to oversize the arrays when they won't be reused 
even more just adds more waste.
   
   better to assign null and create array of the correct size, if it won't be 
reused.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] DaddyWri commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons

2022-11-29 Thread GitBox


DaddyWri commented on issue #11883:
URL: https://github.com/apache/lucene/issues/11883#issuecomment-1331740074

   If not - I may have time this weekend, we'll see.  Perhaps we should also 
look at GeoConcavePolygon first though.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org