[GitHub] [lucene] dweiss commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)
dweiss commented on issue #12057: URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368723771 Even a simplistic regexp/substring expression added to rat checks would work here, I think? this method is fairly uniquely-named. There are better alternatives but they will add overhead (parsing classes, etc). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)
uschindler commented on issue #12057: URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368738480 Yeah, similar like the LOG statement checks in Solr? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sebastiano1972 opened a new issue, #12059: Recurring index corruption
sebastiano1972 opened a new issue, #12059: URL: https://github.com/apache/lucene/issues/12059 We are experimenting with Elastic Search deployed in Azure Container Instances (Debian + OpenJDK). The ES indexes are stored into an Azure file share mounted via SMB (3.0). The Elastic Search cluster is made up of 4 nodes, each one have a separate file share to store the indices. We are experiencing recurring index corruption, specifically a "read past EOF" exception. I asked on the Elastic Search forum but the answer I got was a bit generic and not really helpful other than confirming that, from ES point of view, ES should work on an SMB share as long as it behaves as a local drive. As the underlying exception relates to an issue with a Lucene index, I was wondering if you could help out? Specifically, can Lucene work on SMB? I can only find sparse information on this configuration and, while NFS seems a no-no, for SMB is not that clear. Below is the exception we are getting. Many thanks. Seb ``` java.io.IOException: read past EOF: NIOFSIndexInput(path="/bitnami/elasticsearch/data/indices/mS2bUbLtSeG0FSAMuKX7JQ/0/index/_ldsn_1.fnm") buffer: java.nio.HeapByteBuffer[pos=0 lim=1024 cap=1024] chunkLen: 1024 end: 2331: NIOFSIndexInput(path="/bitnami/elasticsearch/data/indices/mS2bUbLtSeG0FSAMuKX7JQ/0/index/_ldsn_1.fnm") at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:200) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:291) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:55) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:39) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.codecs.CodecUtil.readBEInt(CodecUtil.java:667) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:184) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:253) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.codecs.lucene90.Lucene90FieldInfosFormat.read(Lucene90FieldInfosFormat.java:128) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:205) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.SegmentReader.(SegmentReader.java:156) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.ReadersAndUpdates.createNewReaderWithLatestLiveDocs(ReadersAndUpdates.java:738) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.ReadersAndUpdates.swapNewReaderWithLatestLiveDocs(ReadersAndUpdates.java:754) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:678) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.ReaderPool.writeAllDocValuesUpdates(ReaderPool.java:251) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3743) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:591) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.3.0.jar:?] at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:48) ~[elasticsearch-8.4.1.jar:?] at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:27) ~[elasticsearch-8.4.1.jar:?] at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.3.0.jar:?] at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.3.0.jar:?] at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:355) ~[elasticsearch-8.4.1.jar:?] at org.elasticsearch.index.engine.InternalEng
[GitHub] [lucene] uschindler commented on issue #12059: Recurring index corruption
uschindler commented on issue #12059: URL: https://github.com/apache/lucene/issues/12059#issuecomment-1368787343 Hi, Samba/CIFS is generally working as file store - in contrast to NFS, which should never ever be used as file system; but the problems you describe here are surely related to problems with a shared file system. Whenever possible please make the store a local disk, don't use shared/network filesystems to store Lucene indexes. We can't give you any recommendations here, there are surely no known bugs in Lucene that could create above index corrumption. In short: please avoid NFS (under all circumstances) and avoid CIFS (where possible, especially under high load). Shared/network file systems should only be use for read-only indexes. Related question: we are wondering why you not use MMapDircetory but instead use NIOFSDirectory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #12059: Recurring index corruption
uschindler closed issue #12059: Recurring index corruption URL: https://github.com/apache/lucene/issues/12059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12059: Recurring index corruption
uschindler commented on issue #12059: URL: https://github.com/apache/lucene/issues/12059#issuecomment-1368788012 This issue should be better discussed on the mailing list, it is not a bug/iusse at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #11701: Deadlock in AnalysisSPILoader [LUCENE-10665]
uschindler commented on issue #11701: URL: https://github.com/apache/lucene/issues/11701#issuecomment-1368789210 Thanks for feedback, well appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sebastiano1972 commented on issue #12059: Recurring index corruption
sebastiano1972 commented on issue #12059: URL: https://github.com/apache/lucene/issues/12059#issuecomment-1368804331 Hi Uwe, thank you for your kind reply. To answer your question, we are experimenting with Azure Container Instances, because of their relative simplicity, but they comes with some limitations: - we cannot set the max_map_count value as we do not have access to the underlying host (https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html). Unfortunately, this is required to run an ES cluster, therefore we were forced to use NIOFS - ACIs only allow volume mappings using Azure File Shares, which only works with NFS or SMB. I will move this on the suggested mailing list. Thank you again. Seb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)
rmuir commented on issue #12057: URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368933723 that's true, it is another option. I am still investigating the ECJ one. It is a bit sad that we are so messy about deprecations and can't simply use the compiler's support. For example there are many deprecations in `main`, which makes no sense. As an example, all the big-endian varhandles in `BitUtil` are deprecated, but they are in use across `main`, and from what I can tell, usage is only growing. That's because java uses it internally? So what is the point of the deprecation here? It is doing no good at all and just preventing us from using our compilers properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
gsmiller commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1369212139 +1 to this approach in general. I do wonder if the distribution assumptions generally hold if we start looking at "term in set" queries though. That's sort of irrelevant right now since that implementation is still separate (`TermInSetQuery`), but this may add another reason to keep that implementation separate going forward. I think the difference with "term in set" is that it may not follow natural language distributions in general, while the current MultiTermQuery implementations most likely do. I also wonder if we could be more aggressive with the number of clauses we build into a `BooleanQuery` if we leverage the short-circuiting idea in #11928. Might be a nice fit for this "filtering" case. Just a couple thoughts but certainly nothing blocking or anything that needs to be included as part of this PR. Just wanted to toss them out there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vatsalpatel3689 opened a new pull request, #12060: fixing fkltr change'
vatsalpatel3689 opened a new pull request, #12060: URL: https://github.com/apache/lucene/pull/12060 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
rmuir commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1369222376 there's no reason to duplicate a bunch of code just because of minor changes to a rewrite method. we can have more than one or two of these rewritemethods, and use different ones for different queries: this fact seems to have been forgotten here. and maybe we should have e.g. simple FILTER rewrite that just does that, this one with lots of magic could have a different name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vatsalpatel3689 closed pull request #12060: fixing fkltr change'
vatsalpatel3689 closed pull request #12060: fixing fkltr change' URL: https://github.com/apache/lucene/pull/12060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] aykutfirat commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]
aykutfirat commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1369369480 Lots of things happened since Aug, like the arrival of ChatGPT, and people's increased desire to use OpenAI's state of the art embeddings which are of size 1536. Can you at least please increase it to 2048 for now, while you discuss upper limits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang merged pull request #12046: Out of boundary in CombinedFieldQuery#addTerm
LuXugang merged PR #12046: URL: https://github.com/apache/lucene/pull/12046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org