[GitHub] [lucene] javanna opened a new pull request, #11985: ExitableTerms to override getMin and getMax
javanna opened a new pull request, #11985: URL: https://github.com/apache/lucene/pull/11985 ExitableTerms should not iterate through the terms to retrieve min and max when the wrapped implementation has the values cached (e.g. OrdsFieldReader) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new issue, #11986: Polygons failing to tessellate
iverase opened a new issue, #11986: URL: https://github.com/apache/lucene/issues/11986 ### Description A user of Elasticsearch reported a few polygons that are failing tessellation, for example: ``` { "type": "MultiPolygon", "coordinates": [ [ [ [ 145.8376722, -41.3625237 ], [ 145.9119483, -41.3625237 ], [ 145.9124671, -41.3604861 ], [ 145.9165094, -41.3592481 ], [ 145.918536, -41.3618518 ], [ 145.9268395, -41.3620262 ], [ 145.9268834, -41.3625237 ], [ 145.9296057, -41.3625237 ], [ 145.930067, -41.3596353 ], [ 145.9359791, -41.3604495 ], [ 145.9338908, -41.3623443 ], [ 145.9337218, -41.3625237 ], [ 145.9406629, -41.3625237 ], [ 145.9443986, -41.3614739 ], [ 145.9452851, -41.355895 ], [ 145.9399764, -41.3553728 ], [ 145.9386004, -41.3536032 ], [ 145.9338835, -41.3528597 ], [ 145.9329233, -41.3550551 ], [ 145.9312052, -41.3560932 ], [ 145.9291784, -41.3534897 ], [ 145.9278489, -41.3527145 ], [ 145.9269703, -41.350457 ], [ 145.9290943, -41.3508852 ], [ 145.9416658, -41.3508101 ], [ 145.9427361, -41.3480878 ], [ 145.9451005, -41.3493199 ], [ 145.9451182, -41.3511633 ], [ 145.9475533, -41.3501138 ], [ 145.9492273, -41.3481171 ], [ 145.9510286, -41.348866 ], [ 145.9510476, -41.3507 345 ], [ 145.9539715, -41.3514479 ], [ 145.9540082, -41.3551093 ], [ 145.9570346, -41.3552711 ], [ 145.9569774, -41.3494518 ], [ 145.9596279, -41.3485645 ], [ 145.9601364, -41.347088 ], [ 145.9574997, -41.3464911 ], [ 145.956217, -41.344634 ], [ 145.9536965, -41.3434193 ], [ 145.9543658, -41.3396175 ], [ 145.9602405, -41.33979 ], [ 145.9608139, -41.3352189 ], [ 145.9742096, -41.3356161 ], [ 145.9763708, -41.332821 ], [ 145.980498, -41.3332742 ], [ 145.9817811, -41.3351309 ], [ 145.9866889, -41.3355163 ], [ 145.987522, -41.3378172 ], [ 145.9892258, -41.3372247 ], [ 145.9911234, -41.3352083 ], [ 145.9995898, -41.3360013 ], [ 145.9990297, -41.3400796 ], [ 145.9955299, -41.3402743 ], [ 145.9969167, -41.3421148 ], [ 146.0, -41.3424174 ], [ 146.0, -41.3174825 ], [ 145.9960043, -41.3171988 ], [ 145.9969866, -41.3153884 ], [ 146.0, -41.3155285 ], [ 146.0, -41.2276172 ], [ 145.9324631, -41.2276172 ], [ 145.9327736, -41.2277122 ], [ 145.9316015, -41.2315586 ], [ 145.9286354, -41.2311096 ], [ 145.9292451, -41.2276172 ], [ 145.8632073, -41.2276172 ], [ 145.8630275, -41.2282061 ], [ 145.8600617, -41.2277558 ], [ 145.860086, -41.2276172 ], [ 145.8485946, -41.2276172 ], [ 145.8484023, -41.2281236 ], [ 145.8456234, -41.2293415 ], [ 145.8455167, -41.2347984 ], [ 145.849049, -41.2356306 ], [ 145.8478493, -41.2411272 ], [ 145.8455135, -41.2421517 ], [ 145.845, -41.2533017 ], [ 145.8376722, -41.2529579 ], [ 145.8376722, -41.3018446 ], [ 145.8380557, -41.3025436 ], [ 145.8376722, -41.3024854 ], [ 145.8376722, -41.3504352 ], [ 145.8381764, -41.3505628 ], [ 145.8376722, -41.3509186 ], [ 145.8376722, -41.3625237 ] ], [ [ 145.9176756, -41.3534899 ], [ 145.9186036, -41.3469601 ], [ 145.921422, -41.3459251 ], [ 145.9211254, -41.3480053 ], [ 145.9262939, -41.349043 ], [ 145.9269703, -41.350457 ], [ 145.9216876, -41.3512921 ], [ 145.920889, -41.3524534 ], [ 145.9176756, -41.3534899 ] ], [ [ 145.933483, -41.3211897 ], [ 145.9338125, -41.319729 ], [ 145.9396134, -41.3183894 ], [ 145.941287, -41.3163932 ], [ 145.9428066, -41.3187685 ], [ 145.939747, -41.3201429 ], [ 145.9380912, -41.3221196 ], [ 145.933483, -41.3211897 ] ], [ [ 145.8947524, -41.3300813 ], [ 145.8970756, -41.3298815 ], [ 145.8974562, -41.3326266 ], [ 145.8951329, -41.3328264 ], [ 145.8947524, -41.3300813 ] ], [ [ 145.8677917, -41.3343222 ], [ 145.8690228, -41.3318872 ], [ 145.8733923, -41.3323222 ], [ 145.8742228, -41.3346236 ], [ 145.8677917, -41.3343222 ] ], [ [ 145.8558352, -41.3244434 ], [ 145.8576672, -41.3227166 ], [ 145.8587641, -41.3248396 ], [ 145.8558352, -41.3244434 ] ], [ [ 145.8584175, -41.3336907 ], [ 145.8594025, -41.3318809 ], [ 145.8619189, -41.3323305 ], [ 145.861191, -41.3338904 ], [ 145.8584175, -41.3336907 ] ], [ [ 145.8733418, -41.3405334 ], [ 145.8751737, -41.3388063 ], [ 145.8762712, -41.3409293 ], [ 145.8733418, -41.3405334 ] ], [ [ 145.8762712, -41.3409293 ], [ 145.8795968, -41.3419639 ], [ 145.8785325, -41.3457195 ], [ 145.8772987, -41.3451165 ], [ 145.8762712, -41.3409293 ] ] , [ [ 145.8849106, -41.343891 ], [ 145.8883959, -41.3435914 ], [ 145.8886495, -41.3454216 ], [ 145.8851641, -41.3457212 ], [ 145.8849106, -41.343891 ] ], [ [ 145.8817582, -41.3421182 ], [ 145.8859167, -41.3405961 ], [ 145.8849106, -41.343891 ], [ 145.8817582, -41.3421182 ] ], [ [ 145.8788123, -41.3378897 ], [ 145.8819175, -41.3363389 ], [ 145.883115, -41.3394154 ], [ 145.8788123, -41.3378897 ] ], [ [ 145.8853326, -41.3299653 ], [ 145.175, -41.3296657 ], [ 145.8891978, -41.3324108 ], [ 145.8857129, -41.3327104 ], [ 145.8853326, -41.3299653 ] ], [ [ 145.9059876, -41.3263368 ], [ 145.9094723, -41.3260368 ], [ 145.9097262, -41.3278668 ], [ 145.9062415, -41.3281668 ], [ 145.9059876, -41.3263368 ] ], [ [ 145.897955
[GitHub] [lucene] jpountz commented on a diff in pull request #11984: Add exponential growth to TimeLimitingBulkScorer
jpountz commented on code in PR #11984: URL: https://github.com/apache/lucene/pull/11984#discussion_r1034826332 ## lucene/core/src/test/org/apache/lucene/search/TestTimeLimitingBulkScorer.java: ## @@ -62,6 +66,44 @@ public void testTimeLimitingBulkScorer() throws Exception { directory.close(); } + public void testExponentialRate() throws Exception { +var bulkScorer = +new BulkScorer() { + int expectedInterval = TimeLimitingBulkScorer.INTERVAL; + int lastInterval = 0; + int runs = TestUtil.nextInt(random(), 1, 100); + + @Override + public int score(LeafCollector collector, Bits acceptDocs, int min, int max) + throws IOException { +var difference = max - min; +// the rate shouldn't overflow - only increase or remain equal +assertTrue("Rate should only go up", difference >= lastInterval); +assertEquals("Incorrect rate encountered", expectedInterval, difference); + +lastInterval = difference; +// use integer sum since the exponential growth formula yields different result due to +// rounding +expectedInterval = expectedInterval + expectedInterval / 2; +// overflow - stop at the previous one +if (expectedInterval < 0) { + expectedInterval = lastInterval; +} +// keep going or finish the test? +return --runs == 0 ? DocIdSetIterator.NO_MORE_DOCS : 0; Review Comment: Why would we end prematurely instead of doing the checks on the full range? Is it to avoid the corner case of the last range, which might be smaller than the expected interval? Maybe the bulk scorer could record the segment's `maxDoc` and allow the difference to be less than `expectedInterval` if the `max` doc is equal to `maxDoc`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] luyuncheng opened a new pull request, #11987: Make Decompressor release memory buffer
luyuncheng opened a new pull request, #11987: URL: https://github.com/apache/lucene/pull/11987 ### Description we have a es cluster(31G heap, 96G Mem, 30 instance nodes), with many shards per node(4000 per nodes), when nodes do many bulk and search requests concurrently, we can see the jvm going high memory usage, and can not release the memory even with the frequently GC and stop all write/search requests. we have to restart the node for recovery the heap, like the following GC metrics shows  we dumped the heap shows, `CompressingStoredFieldsReader` oncupied 70% heap:  all this reader path2GC roots shows with following(maybe in search or write thread):  ### Root cause i think the root cause that these threadlocal holds the referent, because `SegmentReader#getFieldsReader` calling following code, and Elasticsearch always using fixed thread_pool and never __calling `CloseableThreadLocal#purge`__ ``` In `lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java` defined fieldsReaderLocal final CloseableThreadLocal fieldsReaderLocal = new CloseableThreadLocal() { @Override protected StoredFieldsReader initialValue() { return fieldsReaderOrig.clone(); } }; ``` we have searched some issues like [LUCENE-9959 ](https://issues.apache.org/jira/browse/LUCENE-9959), and [LUCENE-10419](https://issues.apache.org/jira/browse/LUCENE-10519), there is no answer for this problem --- i compare between different jvm heap, and different LUCENE versions, i think the root cause is `LZ4WithPresetDictDecompressor` would allocate a buffer in the class and init ``` LZ4WithPresetDictDecompressor() { compressedLengths = new int[0]; buffer = new byte[0]; } ``` when the elasticsearch instance doing `Stored-Fields-Read` operations, it will reallocate the JVM heap. but without release, because es `currentEngineReference` will keep the reference  ### Proposal i think we can releasee this buffer memory when the decompress is done. it shows that jvm can holds more segment readers in the heap. when these buffer memory can release, the heap metrics shows as following:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new pull request, #11988: Fix algorithm that chooses the bridge between a polygon and a hole
iverase opened a new pull request, #11988: URL: https://github.com/apache/lucene/pull/11988 The current algorithm seems to fail when the bridge is located on the first node of the iteration and there are another vertex with the same x and y. In that case we seem not to be able to find the right node because we actually never compute the tangent of the first node. This change changes the iteration to make sure we compute the tangent for all nodes of the polygon which seems to help finding always the right bridge. We remove some logic we (well, it was me) added to try to fix this issue. fixes #11986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing
jpountz commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1330752728 > This shows 10-20% improvement in SSDVFacets and IntNRQ tests in lucenebench, Woah, impressive! Can you share the luceneutil output? The change looks good to me but I'd also like @uschindler to have a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons
iverase commented on issue #11883: URL: https://github.com/apache/lucene/issues/11883#issuecomment-1330777875 Should we backport it to branch_9x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] thecoop commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing
thecoop commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1330803516 [bytebuffer-get-output.log](https://github.com/apache/lucene/files/10114303/bytebuffer-get-output.log) - used wikimedium1m on my local machine Whilst some tests show as much as 40% improvement, there are also some tests that show a 5-6% regression. I can't say I fully understand what this means. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1330867303 I don't really think it is especially trappy, since the default implementation is `O(log N)` and works consistently with FilterTerms classes by default even if they are actually filtering the terms data in some way. But seems fine to look at making it abstract (as separate change), as long as there is an easy way to opt-in to the existing binary search impl. Would not be good to see that duplicated across a bunch of simple Terms subclasses (e.g. in indexer, in term vectors, docvalues, whatever). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1330875083 Also I don't know what `OrdsFieldsReader` is, but the default Terms implementation is `O(1)` when the Terms subclass supports seek-by-ord. So what am I missing that makes it a trap? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] hendrikmuhs commented on pull request #460: LUCENE-10247 - reduce size of FSTs by relative coding
hendrikmuhs commented on PR #460: URL: https://github.com/apache/lucene/pull/460#issuecomment-1330949224 Update after a long time. This branch was outdated, the best way to resurrect it was a _rebase_. However it seems github managed to keep the comments at the right places. If I can trust test coverage this is fully functional now. I will try to find a good benchmark, to see how much storage can be saved with this in a real scenario. If someone has a hint, let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #11985: ExitableTerms to override getMin and getMax
javanna commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331185580 Credit goes to @dnhatn for pointing me to the bug, thanks! I am more than happy to fix it and see what else we can do to avoid this in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dnhatn commented on pull request #11985: ExitableTerms to override getMin and getMax
dnhatn commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331199248 `FieldReader` (i.e., blocktree implementation) returns minTerm and [maxTerm](https://github.com/apache/lucene/blob/0cc6f695363419ab0f89e2bef5e7595ace077345/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java) without doing any I/O, while the default implementation in `Terms` might use I/O for retrieving them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #11985: ExitableTerms to override getMin and getMax
javanna commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331233058 Besides the I/O aspect, I found it counter intuitive that min and max are known and we end up doing work to compute them again. Even if it's not a lot of work, it seems like we could avoid it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331252650 Hi, I can only look at this on the weekend as I am at a customer this week. When reading the description I was also on the wrong path anyways, because I did not understand what you want to change, because the getLongs() and getFloats() calls on IndexInput are always relative. I was not aware that you were talking about those crazy set of different view buffers for long/float that currently use the position() call to copy the position from main buffer. MY humble OPINION: I never agreed to that code and it is/was heavily broken (my personal opinion). Whenever I see that code I quickly look at other places just to not have the requirement to see it for more time (I get some "I need to puke!" reaction everytime I see it). In short: I would not spend too much time into byte buffers, sorry. MemorySegment is the way to go. In MemorySegmentIndexInput reading is a one liner and not views are needed, because MemorySegments allow unaligned accesses. In Java 19 you can also convert a ByteBuffer to a MemorySegment so you need no views anymore - I was thinking about at least fixing those 2 methods in ByteBuffer*s*IndexInput. About your benchmark, I do not trust it at the moment, because your code shows the follwoing warning on startup: "WARNING: Using incubator modules: jdk.incubator.vector". I have the feeling theres something in your code that uses this incubation module and may affect results. Where does the message comes from, I grepped through your patches, luckily you do not use incubator modules! When reading your attached log, where do you see an 10-20% improvement on IntNRQ or SSDV Facets? IntNRQ 776.60 (21.0%) 795.66 (21.1%)2.5% ( -32% - 56%) 0.712 That says only 2.5%. In short, I have to closer look at the code, but I do not see much imrpovement, sorry. The numbers are now +/- idetical to MemorySegmentIndexInput for the given candidate queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331265646 In addition: The patch only removes the position on the duplicate, so the duplicates are still there. Unless you only read small arrays with 1 or 2 longs, there postion call cannot have too much overhead. The PR here is more a cleanup but actually it only replaces position() by the absolute parameter. As we still work on a suplicate, we cave to read the position() anyways. So this PR is just a cleanup, but (see std dev) not an improvement. What I like is your code to create the buffer views, by using streams with method references it is better to read, but the use of positional read on the view vs. postion + relative read can't have much effect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331279562 With 5 seconds runtime, the hotspot compiler did not even start to optimize using tiered compilation unless you add more command line flags, so its too short to be "warm". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit
gsmiller commented on PR #11928: URL: https://github.com/apache/lucene/pull/11928#issuecomment-1331288870 @jpountz I re-ran some internal benchmarking with this change to highlight the speedup in cases where scoring isn't needed (at least some specific use-cases I'm looking at). These use-cases all involve a "disjunction filter," meaning a disjunction of terms that is used as a required clause. So something like `(+ (foo:bar foo:baz foo:zed) (...))`, where the `foo` field must take on one of the specified values to be considered a candidate match. To provide a sense of scale, on average, these filters have 40 different terms in them. Since these "filters" don't participate in scoring at all, it's a good candidate for this short-circuiting. In these benchmarks, I'm observing a 2.3% QPS improvement, and a 3.5% avg. latency reduction (5.9% p50 reduction / 3.5% p99 reduction). So the change appears to be helping this type of situation. As for whether-or-not this change would actually hurt other common use-cases that require scoring or second-phase checks, I re-ran `luceneutil` benchmarks (wikimedium10m) task and don't observe any regressions there (results below). It's possible there's a gap in our benchmarks though, and maybe there are some common use-cases not covered? ``` TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value MedSloppyPhrase 115.00 (4.9%) 112.91 (5.1%) -1.8% ( -11% -8%) 0.249 HighTermTitleSort 242.48 (3.2%) 238.66 (4.3%) -1.6% ( -8% -6%) 0.189 HighSloppyPhrase 36.47 (3.8%) 36.12 (3.9%) -0.9% ( -8% -7%) 0.439 LowTerm 1766.82 (3.4%) 1752.12 (3.3%) -0.8% ( -7% -6%) 0.436 HighPhrase 263.74 (3.4%) 261.71 (2.3%) -0.8% ( -6% -5%) 0.404 OrHighLow 796.71 (2.7%) 790.71 (2.6%) -0.8% ( -5% -4%) 0.367 BrowseDateSSDVFacets3.46 (6.4%)3.44 (7.1%) -0.7% ( -13% - 13%) 0.755 HighTermMonthSort 3070.86 (4.5%) 3051.39 (3.9%) -0.6% ( -8% -8%) 0.635 Prefix3 111.76 (4.6%) 111.08 (4.2%) -0.6% ( -8% -8%) 0.658 OrNotHighHigh 1249.27 (3.4%) 1242.30 (3.8%) -0.6% ( -7% -6%) 0.627 BrowseMonthTaxoFacets 35.43 (1.6%) 35.23 (2.2%) -0.6% ( -4% -3%) 0.367 LowSloppyPhrase 61.49 (2.4%) 61.18 (2.5%) -0.5% ( -5% -4%) 0.512 OrHighNotMed 1139.05 (3.9%) 1133.29 (3.6%) -0.5% ( -7% -7%) 0.671 BrowseRandomLabelTaxoFacets 20.33 (4.5%) 20.23 (5.6%) -0.5% ( -10% - 10%) 0.760 HighTerm 1635.45 (4.2%) 1628.28 (3.9%) -0.4% ( -8% -8%) 0.735 MedPhrase 46.41 (2.3%) 46.22 (1.6%) -0.4% ( -4% -3%) 0.529 OrHighMed 193.55 (2.8%) 192.79 (2.9%) -0.4% ( -5% -5%) 0.663 OrHighNotHigh 865.20 (3.0%) 862.38 (3.9%) -0.3% ( -6% -6%) 0.766 AndHighLow 1566.83 (2.7%) 1562.47 (2.7%) -0.3% ( -5% -5%) 0.745 MedTermDayTaxoFacets 48.00 (3.5%) 47.89 (3.6%) -0.2% ( -7% -7%) 0.836 HighTermDayOfYearSort 812.55 (2.8%) 811.49 (2.5%) -0.1% ( -5% -5%) 0.878 MedTerm 2390.59 (3.5%) 2387.70 (3.5%) -0.1% ( -6% -7%) 0.912 BrowseDateTaxoFacets 25.05 (9.2%) 25.03 (9.3%) -0.1% ( -16% - 20%) 0.984 BrowseMonthSSDVFacets 16.00 (18.9%) 15.99 (19.0%) -0.1% ( -31% - 46%) 0.992 LowIntervalsOrdered 276.77 (3.5%) 276.66 (3.6%) -0.0% ( -6% -7%) 0.972 OrHighHigh 26.41 (4.4%) 26.40 (4.4%) -0.0% ( -8% -9%) 0.978 MedSpanNear 35.38 (1.0%) 35.38 (1.1%) -0.0% ( -2% -2%) 0.969 TermDTSort 648.08 (1.0%) 648.35 (1.3%)0.0% ( -2% -2%) 0.909 BrowseDayOfYearTaxoFacets 25.12 (9.5%) 25.13 (9.5%)0.0% ( -17% - 21%) 0.988 OrNotHighLow 1192.74 (3.1%) 1193.69 (2.6%)0.1% ( -5% -5%) 0.929 HighIntervalsOrdered1.79 (3.3%)1.79 (3.3%)0.1% ( -6
[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331324696 P.S. My comments are mostly about ByteBufferIndexInput used in MMapDircetory. The other code I did not write. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zkendall commented on a diff in pull request #976: SOLR-13749: Implement support for joining across collections with multiple shards
zkendall commented on code in PR #976: URL: https://github.com/apache/lucene-solr/pull/976#discussion_r1035387914 ## solr/core/src/java/org/apache/solr/search/join/XCJFQuery.java: ## @@ -0,0 +1,380 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.search.join; + +import java.io.IOException; +import java.util.Locale; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.TimeUnit; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.Terms; +import org.apache.lucene.index.TermsEnum; +import org.apache.lucene.search.ConstantScoreScorer; +import org.apache.lucene.search.ConstantScoreWeight; +import org.apache.lucene.search.DocIdSet; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.QueryVisitor; +import org.apache.lucene.search.ScoreMode; +import org.apache.lucene.search.Scorer; +import org.apache.lucene.search.Weight; +import org.apache.lucene.util.BytesRefBuilder; +import org.apache.lucene.util.FixedBitSet; +import org.apache.solr.client.solrj.io.SolrClientCache; +import org.apache.solr.client.solrj.io.Tuple; +import org.apache.solr.client.solrj.io.eq.FieldEqualitor; +import org.apache.solr.client.solrj.io.stream.CloudSolrStream; +import org.apache.solr.client.solrj.io.stream.SolrStream; +import org.apache.solr.client.solrj.io.stream.StreamContext; +import org.apache.solr.client.solrj.io.stream.TupleStream; +import org.apache.solr.client.solrj.io.stream.UniqueStream; +import org.apache.solr.client.solrj.io.stream.expr.StreamExpression; +import org.apache.solr.client.solrj.io.stream.expr.StreamExpressionNamedParameter; +import org.apache.solr.cloud.CloudDescriptor; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.cloud.ClusterState; +import org.apache.solr.common.cloud.DocRouter; +import org.apache.solr.common.cloud.Slice; +import org.apache.solr.common.params.CommonParams; +import org.apache.solr.common.params.ModifiableSolrParams; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.schema.FieldType; +import org.apache.solr.search.BitDocSet; +import org.apache.solr.search.DocSet; +import org.apache.solr.search.DocSetUtil; +import org.apache.solr.search.Filter; +import org.apache.solr.search.SolrIndexSearcher; + +public class XCJFQuery extends Query { + + protected final String query; + protected final String zkHost; + protected final String solrUrl; + protected final String collection; + protected final String fromField; + protected final String toField; + protected final boolean routedByJoinKey; + + protected final long timestamp; + protected final int ttl; + + protected SolrParams otherParams; + protected String otherParamsString; + + public XCJFQuery(String query, String zkHost, String solrUrl, String collection, String fromField, String toField, + boolean routedByJoinKey, int ttl, SolrParams otherParams) { + +this.query = query; +this.zkHost = zkHost; +this.solrUrl = solrUrl; +this.collection = collection; +this.fromField = fromField; +this.toField = toField; +this.routedByJoinKey = routedByJoinKey; + +this.timestamp = System.nanoTime(); +this.ttl = ttl; + +this.otherParams = otherParams; +// SolrParams doesn't implement equals(), so use this string to compare them +if (otherParams != null) { + this.otherParamsString = otherParams.toString(); +} + } + + private interface JoinKeyCollector { +void collect(Object value) throws IOException; +DocSet getDocSet() throws IOException; + } + + private class TermsJoinKeyCollector implements JoinKeyCollector { + +FieldType fieldType; +SolrIndexSearcher searcher; + +TermsEnum termsEnum; +BytesRefBuilder bytes; +PostingsEnum postingsEnum; + +FixedBitSet bitSet; + +public TermsJoinKeyCollector(FieldType fieldType, Terms terms, SolrIndexSearcher searcher) throws IOException { + this.fieldType = fieldType; + this.searcher = searcher; + + termsEnum = terms.i
[GitHub] [lucene] DaddyWri commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons
DaddyWri commented on issue #11883: URL: https://github.com/apache/lucene/issues/11883#issuecomment-1331472645 Feel free to. It should be a simple cherry-pick. I'm tied up with work escalations myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331604920 i'm not opposed to the PR, just disagree with the bug or trap aspect. To me this is just a micro-optimization and I'm questioning the need to make anything abstract in our APIs (which makes them harder to implement). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331608776 definitely not a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11987: Make Decompressor release memory buffer
rmuir commented on PR #11987: URL: https://github.com/apache/lucene/pull/11987#issuecomment-1331625234 too many shards. need to make sure this doesn't cause performance regression for normal use-cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11987: Make Decompressor release memory buffer
rmuir commented on PR #11987: URL: https://github.com/apache/lucene/pull/11987#issuecomment-1331627557 fwiw, assigning the 0-length array just makes even more waste. Still keeping logic to use arrayutil.grow to oversize the arrays when they won't be reused even more just adds more waste. better to assign null and create array of the correct size, if it won't be reused. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] DaddyWri commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons
DaddyWri commented on issue #11883: URL: https://github.com/apache/lucene/issues/11883#issuecomment-1331740074 If not - I may have time this weekend, we'll see. Perhaps we should also look at GeoConcavePolygon first though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org