Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-29 Thread via GitHub


dweiss commented on issue #12907:
URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871829048

   Apache Infra has bumped the default for us to 60k. Let's see if these stack 
traces still show up. 
   https://issues.apache.org/jira/browse/INFRA-25269


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-29 Thread via GitHub


dweiss commented on issue #12907:
URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871830389

   Answering myself - yes, still a thing, 
https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/10935/
   ```
   Caused: java.lang.InterruptedException: 
hudson.FilePath$FileMaskNoMatchesFoundException: no matches found within 6
   ```
   It's a bit insane that so many files are there and need to be scanned 
through. Maybe this pattern can be tweaked to something more reasonable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-29 Thread via GitHub


uschindler commented on issue #12907:
URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871830725

   I don't think the setting is live yet. You need to restart everything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-29 Thread via GitHub


dweiss commented on issue #12907:
URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871831161

   The exception message would indicate it's live though, right? They bumped it 
to 60k.
   https://github.com/apache/infrastructure-p6/pull/1747/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-29 Thread via GitHub


uschindler commented on issue #12907:
URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871832304

   Yes. The file mask cannot be improved. It counts all files it sees. Look at 
the other code. Unless the path starts with a fixed path it needs to go through 
all files. It's like with Lucene wildcards. A star at begin introduces linear 
scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Introduce Bloom Filter as non-experimental/core postings format [lucene]

2023-12-29 Thread via GitHub


mgodwan opened a new issue, #12986:
URL: https://github.com/apache/lucene/issues/12986

   ### Description
   
   Today, 
[BloomFilteringPostingsFormat](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java)
 in Lucene is marked experimental.
   
   Based on our analysis of the the data structure in OpenSearch for the 
Primary Key field using the `nyc_taxis` workload [[See 
Issue](https://github.com/opensearch-project/OpenSearch/issues/4489#issuecomment-1724998489)],
 we have found that it proves very useful for indexing performance, and also 
certain term queries/Get Document calls on the PK.
   
   We want to introduce this as an opt-in feature in OpenSearch for customers 
so that they can take advantage of the performance improvements, and wanted 
inputs from the community on the following:
   
   1. Why is the BloomFilteringPostingsFormat in experimental status?
   2. Is it possible to contribute and mark it as a core feature with support 
for backward compatibility in Lucene?
   
   We've done few changes in OpenSearch to support  an off-heap implementation 
and introduce certain knobs which may prove useful for Lucene customers and 
would like to see if they can be contributed to Lucene as well 
[opensearch-project/OpenSearch/pull/11027].
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-29 Thread via GitHub


uschindler commented on PR #12841:
URL: https://github.com/apache/lucene/pull/12841#issuecomment-1872008672

   Hi @easyice,
   I backported the PR. There was only a change in the test because in Java 11 
does not have random() with two parameters. We have TestUtil for that.
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-29 Thread via GitHub


easyice commented on PR #12841:
URL: https://github.com/apache/lucene/pull/12841#issuecomment-1872023239

   Thank you for the backport and all great suggestions!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Introduce Bloom Filter as non-experimental/core postings format [lucene]

2023-12-29 Thread via GitHub


rmuir commented on issue #12986:
URL: https://github.com/apache/lucene/issues/12986#issuecomment-1872038887

   supporting back compat is a one-way door and a big deal. Back compat has a 
heavy price and is responsible for lots of bugs (e.g. Lucene 9.9.1 release). It 
can't be done on a whim and IMO we already support way "too much" here by far. 
Too much commercial influence on the project...
   
   I also don't think this bloom format is ready. You argue it should have 
lucene's backwards compatibility support at the same time as pointing to 
thousands of lines unmerged PR with recommended changes to it! That's not 
stability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2023-12-29 Thread via GitHub


msfroh opened a new pull request, #12987:
URL: https://github.com/apache/lucene/pull/12987

   ### Description
   
   Binary doc values were being written directly in SimpleTextCodec, though 
they may not be valid UTF-8 (i.e. they may not be "text"). This change encodes 
them as a string representing an array of hexadecimal bytes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

2023-12-29 Thread via GitHub


msfroh commented on PR #12897:
URL: https://github.com/apache/lucene/pull/12897#issuecomment-1872283044

   I implemented a similar change for binary doc values at 
https://github.com/apache/lucene/pull/12987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Make ByteBufferIndexInput public [LUCENE-8406] [lucene]

2023-12-29 Thread via GitHub


uschindler closed issue #9453: Make ByteBufferIndexInput public [LUCENE-8406]
URL: https://github.com/apache/lucene/issues/9453


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Replace ByteBuffersIndexInput with ByteBufferIndexInput (replace and rename) [LUCENE-8661] [lucene]

2023-12-29 Thread via GitHub


uschindler commented on issue #9707:
URL: https://github.com/apache/lucene/issues/9707#issuecomment-1872409165

   We should do this cleanup at some point. We still have 2 implementations...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org