[GitHub] [lucene] javanna opened a new pull request, #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
javanna opened a new pull request, #11985: URL: https://github.com/apache/lucene/pull/11985 ExitableTerms should not iterate through the terms to retrieve min and max when the wrapped implementation has the values cached (e.g. OrdsFieldReader) -- This is an automated message from the Apac

[GitHub] [lucene] iverase opened a new issue, #11986: Polygons failing to tessellate

2022-11-29 Thread GitBox
iverase opened a new issue, #11986: URL: https://github.com/apache/lucene/issues/11986 ### Description A user of Elasticsearch reported a few polygons that are failing tessellation, for example: ``` { "type": "MultiPolygon", "coordinates": [ [ [ [ 145.8376722, -41.3625237 ],

[GitHub] [lucene] jpountz commented on a diff in pull request #11984: Add exponential growth to TimeLimitingBulkScorer

2022-11-29 Thread GitBox
jpountz commented on code in PR #11984: URL: https://github.com/apache/lucene/pull/11984#discussion_r1034826332 ## lucene/core/src/test/org/apache/lucene/search/TestTimeLimitingBulkScorer.java: ## @@ -62,6 +66,44 @@ public void testTimeLimitingBulkScorer() throws Exception {

[GitHub] [lucene] luyuncheng opened a new pull request, #11987: Make Decompressor release memory buffer

2022-11-29 Thread GitBox
luyuncheng opened a new pull request, #11987: URL: https://github.com/apache/lucene/pull/11987 ### Description we have a es cluster(31G heap, 96G Mem, 30 instance nodes), with many shards per node(4000 per nodes), when nodes do many bulk and search requests concurrently, we can see the j

[GitHub] [lucene] iverase opened a new pull request, #11988: Fix algorithm that chooses the bridge between a polygon and a hole

2022-11-29 Thread GitBox
iverase opened a new pull request, #11988: URL: https://github.com/apache/lucene/pull/11988 The current algorithm seems to fail when the bridge is located on the first node of the iteration and there are another vertex with the same x and y. In that case we seem not to be able to find the r

[GitHub] [lucene] jpountz commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox
jpountz commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1330752728 > This shows 10-20% improvement in SSDVFacets and IntNRQ tests in lucenebench, Woah, impressive! Can you share the luceneutil output? The change looks good to me but I'd als

[GitHub] [lucene] iverase commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons

2022-11-29 Thread GitBox
iverase commented on issue #11883: URL: https://github.com/apache/lucene/issues/11883#issuecomment-1330777875 Should we backport it to branch_9x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [lucene] thecoop commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox
thecoop commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1330803516 [bytebuffer-get-output.log](https://github.com/apache/lucene/files/10114303/bytebuffer-get-output.log) - used wikimedium1m on my local machine Whilst some tests show as much as 40

[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1330867303 I don't really think it is especially trappy, since the default implementation is `O(log N)` and works consistently with FilterTerms classes by default even if they are actually filtering

[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1330875083 Also I don't know what `OrdsFieldsReader` is, but the default Terms implementation is `O(1)` when the Terms subclass supports seek-by-ord. So what am I missing that makes it a trap? --

[GitHub] [lucene] hendrikmuhs commented on pull request #460: LUCENE-10247 - reduce size of FSTs by relative coding

2022-11-29 Thread GitBox
hendrikmuhs commented on PR #460: URL: https://github.com/apache/lucene/pull/460#issuecomment-1330949224 Update after a long time. This branch was outdated, the best way to resurrect it was a _rebase_. However it seems github managed to keep the comments at the right places. If I can

[GitHub] [lucene] javanna commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
javanna commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331185580 Credit goes to @dnhatn for pointing me to the bug, thanks! I am more than happy to fix it and see what else we can do to avoid this in the future. -- This is an automated message from

[GitHub] [lucene] dnhatn commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
dnhatn commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331199248 `FieldReader` (i.e., blocktree implementation) returns minTerm and [maxTerm](https://github.com/apache/lucene/blob/0cc6f695363419ab0f89e2bef5e7595ace077345/lucene/core/src/java/org/apache/

[GitHub] [lucene] javanna commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
javanna commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331233058 Besides the I/O aspect, I found it counter intuitive that min and max are known and we end up doing work to compute them again. Even if it's not a lot of work, it seems like we could avo

[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331252650 Hi, I can only look at this on the weekend as I am at a customer this week. When reading the description I was also on the wrong path anyways, because I did not understand what you wa

[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331265646 In addition: The patch only removes the position on the duplicate, so the duplicates are still there. Unless you only read small arrays with 1 or 2 longs, there postion call cannot ha

[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331279562 With 5 seconds runtime, the hotspot compiler did not even start to optimize using tiered compilation unless you add more command line flags, so its too short to be "warm". -- This

[GitHub] [lucene] gsmiller commented on pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit

2022-11-29 Thread GitBox
gsmiller commented on PR #11928: URL: https://github.com/apache/lucene/pull/11928#issuecomment-1331288870 @jpountz I re-ran some internal benchmarking with this change to highlight the speedup in cases where scoring isn't needed (at least some specific use-cases I'm looking at). These use-c

[GitHub] [lucene] uschindler commented on pull request #11982: Change ByteBuffersDataInput and ByteBuffersIndexInput to use absolute addressing

2022-11-29 Thread GitBox
uschindler commented on PR #11982: URL: https://github.com/apache/lucene/pull/11982#issuecomment-1331324696 P.S. My comments are mostly about ByteBufferIndexInput used in MMapDircetory. The other code I did not write. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [lucene-solr] zkendall commented on a diff in pull request #976: SOLR-13749: Implement support for joining across collections with multiple shards

2022-11-29 Thread GitBox
zkendall commented on code in PR #976: URL: https://github.com/apache/lucene-solr/pull/976#discussion_r1035387914 ## solr/core/src/java/org/apache/solr/search/join/XCJFQuery.java: ## @@ -0,0 +1,380 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [lucene] DaddyWri commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons

2022-11-29 Thread GitBox
DaddyWri commented on issue #11883: URL: https://github.com/apache/lucene/issues/11883#issuecomment-1331472645 Feel free to. It should be a simple cherry-pick. I'm tied up with work escalations myself. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331604920 i'm not opposed to the PR, just disagree with the bug or trap aspect. To me this is just a micro-optimization and I'm questioning the need to make anything abstract in our APIs (which make

[GitHub] [lucene] rmuir commented on pull request #11985: ExitableTerms to override getMin and getMax

2022-11-29 Thread GitBox
rmuir commented on PR #11985: URL: https://github.com/apache/lucene/pull/11985#issuecomment-1331608776 definitely not a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] rmuir commented on pull request #11987: Make Decompressor release memory buffer

2022-11-29 Thread GitBox
rmuir commented on PR #11987: URL: https://github.com/apache/lucene/pull/11987#issuecomment-1331625234 too many shards. need to make sure this doesn't cause performance regression for normal use-cases. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] rmuir commented on pull request #11987: Make Decompressor release memory buffer

2022-11-29 Thread GitBox
rmuir commented on PR #11987: URL: https://github.com/apache/lucene/pull/11987#issuecomment-1331627557 fwiw, assigning the 0-length array just makes even more waste. Still keeping logic to use arrayutil.grow to oversize the arrays when they won't be reused even more just adds more waste.

[GitHub] [lucene] DaddyWri commented on issue #11883: Spatial3d: Wrong intersection detected between small polygons

2022-11-29 Thread GitBox
DaddyWri commented on issue #11883: URL: https://github.com/apache/lucene/issues/11883#issuecomment-1331740074 If not - I may have time this weekend, we'll see. Perhaps we should also look at GeoConcavePolygon first though. -- This is an automated message from the Apache Git Service.