Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871829048 Apache Infra has bumped the default for us to 60k. Let's see if these stack traces still show up. https://issues.apache.org/jira/browse/INFRA-25269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871830389 Answering myself - yes, still a thing, https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/10935/ ``` Caused: java.lang.InterruptedException: hudson.FilePath$FileMaskNoMatchesFoundException: no matches found within 6 ``` It's a bit insane that so many files are there and need to be scanned through. Maybe this pattern can be tweaked to something more reasonable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871830725 I don't think the setting is live yet. You need to restart everything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871831161 The exception message would indicate it's live though, right? They bumped it to 60k. https://github.com/apache/infrastructure-p6/pull/1747/files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1871832304 Yes. The file mask cannot be improved. It counts all files it sees. Look at the other code. Unless the path starts with a fixed path it needs to go through all files. It's like with Lucene wildcards. A star at begin introduces linear scan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Introduce Bloom Filter as non-experimental/core postings format [lucene]
mgodwan opened a new issue, #12986: URL: https://github.com/apache/lucene/issues/12986 ### Description Today, [BloomFilteringPostingsFormat](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java) in Lucene is marked experimental. Based on our analysis of the the data structure in OpenSearch for the Primary Key field using the `nyc_taxis` workload [[See Issue](https://github.com/opensearch-project/OpenSearch/issues/4489#issuecomment-1724998489)], we have found that it proves very useful for indexing performance, and also certain term queries/Get Document calls on the PK. We want to introduce this as an opt-in feature in OpenSearch for customers so that they can take advantage of the performance improvements, and wanted inputs from the community on the following: 1. Why is the BloomFilteringPostingsFormat in experimental status? 2. Is it possible to contribute and mark it as a core feature with support for backward compatibility in Lucene? We've done few changes in OpenSearch to support an off-heap implementation and introduce certain knobs which may prove useful for Lucene customers and would like to see if they can be contributed to Lucene as well [opensearch-project/OpenSearch/pull/11027]. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1872008672 Hi @easyice, I backported the PR. There was only a change in the test because in Java 11 does not have random() with two parameters. We have TestUtil for that. Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1872023239 Thank you for the backport and all great suggestions! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Introduce Bloom Filter as non-experimental/core postings format [lucene]
rmuir commented on issue #12986: URL: https://github.com/apache/lucene/issues/12986#issuecomment-1872038887 supporting back compat is a one-way door and a big deal. Back compat has a heavy price and is responsible for lots of bugs (e.g. Lucene 9.9.1 release). It can't be done on a whim and IMO we already support way "too much" here by far. Too much commercial influence on the project... I also don't think this bloom format is ready. You argue it should have lucene's backwards compatibility support at the same time as pointing to thousands of lines unmerged PR with recommended changes to it! That's not stability. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Output binary doc values as hex array in SimpleTextCodec [lucene]
msfroh opened a new pull request, #12987: URL: https://github.com/apache/lucene/pull/12987 ### Description Binary doc values were being written directly in SimpleTextCodec, though they may not be valid UTF-8 (i.e. they may not be "text"). This change encodes them as a string representing an array of hexadecimal bytes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]
msfroh commented on PR #12897: URL: https://github.com/apache/lucene/pull/12897#issuecomment-1872283044 I implemented a similar change for binary doc values at https://github.com/apache/lucene/pull/12987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Make ByteBufferIndexInput public [LUCENE-8406] [lucene]
uschindler closed issue #9453: Make ByteBufferIndexInput public [LUCENE-8406] URL: https://github.com/apache/lucene/issues/9453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Replace ByteBuffersIndexInput with ByteBufferIndexInput (replace and rename) [LUCENE-8661] [lucene]
uschindler commented on issue #9707: URL: https://github.com/apache/lucene/issues/9707#issuecomment-1872409165 We should do this cleanup at some point. We still have 2 implementations... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org