[GitHub] [lucene] dweiss commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)

2023-01-02 Thread GitBox


dweiss commented on issue #12057:
URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368723771

   Even a simplistic regexp/substring expression added to rat checks would work 
here, I think? this method is fairly uniquely-named. There are better 
alternatives but they will add overhead (parsing classes, etc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)

2023-01-02 Thread GitBox


uschindler commented on issue #12057:
URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368738480

   Yeah, similar like the LOG statement checks in Solr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sebastiano1972 opened a new issue, #12059: Recurring index corruption

2023-01-02 Thread GitBox


sebastiano1972 opened a new issue, #12059:
URL: https://github.com/apache/lucene/issues/12059

   We are experimenting with Elastic Search deployed in Azure Container 
Instances (Debian + OpenJDK). The ES indexes are stored into an Azure file 
share mounted via SMB (3.0). The Elastic Search cluster is made up of 4 nodes, 
each one have a separate file share to store the indices. We are experiencing 
recurring index corruption, specifically a "read past EOF" exception. I asked 
on the Elastic Search forum but the answer I got was a bit generic and not 
really helpful other than confirming that, from ES point of view, ES should 
work on an SMB share as long as it behaves as a local drive. As the underlying 
exception relates to an issue with a Lucene index, I was wondering if you could 
help out? Specifically, can Lucene work on SMB? I can only find sparse 
information on this configuration and, while NFS seems a no-no, for SMB is not 
that clear. Below is the exception we are getting.
   
   Many thanks.
   
   Seb
   
   ```
   java.io.IOException: read past EOF: 
NIOFSIndexInput(path="/bitnami/elasticsearch/data/indices/mS2bUbLtSeG0FSAMuKX7JQ/0/index/_ldsn_1.fnm")
 buffer: java.nio.HeapByteBuffer[pos=0 lim=1024 cap=1024] chunkLen: 1024 end: 
2331: 
NIOFSIndexInput(path="/bitnami/elasticsearch/data/indices/mS2bUbLtSeG0FSAMuKX7JQ/0/index/_ldsn_1.fnm")
 at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:200)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:291) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:55) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:39)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.codecs.CodecUtil.readBEInt(CodecUtil.java:667) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:184) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:253) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.codecs.lucene90.Lucene90FieldInfosFormat.read(Lucene90FieldInfosFormat.java:128)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:205) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.SegmentReader.(SegmentReader.java:156) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.ReadersAndUpdates.createNewReaderWithLatestLiveDocs(ReadersAndUpdates.java:738)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.ReadersAndUpdates.swapNewReaderWithLatestLiveDocs(ReadersAndUpdates.java:754)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:678)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.ReaderPool.writeAllDocValuesUpdates(ReaderPool.java:251)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3743) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:591) 
~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) 
~[lucene-core-9.3.0.jar:?]
 at 
org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:48)
 ~[elasticsearch-8.4.1.jar:?]
 at 
org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:27)
 ~[elasticsearch-8.4.1.jar:?]
 at 
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)
 ~[lucene-core-9.3.0.jar:?]
 at 
org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:355)
 ~[elasticsearch-8.4.1.jar:?]
 at 
org.elasticsearch.index.engine.InternalEng

[GitHub] [lucene] uschindler commented on issue #12059: Recurring index corruption

2023-01-02 Thread GitBox


uschindler commented on issue #12059:
URL: https://github.com/apache/lucene/issues/12059#issuecomment-1368787343

   Hi,
   Samba/CIFS is generally working as file store - in contrast to NFS, which 
should never ever be used as file system; but the problems you describe here 
are surely related to problems with a shared file system. Whenever possible 
please make the store a local disk, don't use shared/network filesystems to 
store Lucene indexes. We can't give you any recommendations here, there are 
surely no known bugs in Lucene that could create above index corrumption.
   
   In short: please avoid NFS (under all circumstances) and avoid CIFS (where 
possible, especially under high load). Shared/network file systems should only 
be use for read-only indexes.
   
   Related question: we are wondering why you not use MMapDircetory but instead 
use NIOFSDirectory?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #12059: Recurring index corruption

2023-01-02 Thread GitBox


uschindler closed issue #12059: Recurring index corruption
URL: https://github.com/apache/lucene/issues/12059


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12059: Recurring index corruption

2023-01-02 Thread GitBox


uschindler commented on issue #12059:
URL: https://github.com/apache/lucene/issues/12059#issuecomment-1368788012

   This issue should be better discussed on the mailing list, it is not a 
bug/iusse at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11701: Deadlock in AnalysisSPILoader [LUCENE-10665]

2023-01-02 Thread GitBox


uschindler commented on issue #11701:
URL: https://github.com/apache/lucene/issues/11701#issuecomment-1368789210

   Thanks for feedback, well appreciated. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sebastiano1972 commented on issue #12059: Recurring index corruption

2023-01-02 Thread GitBox


sebastiano1972 commented on issue #12059:
URL: https://github.com/apache/lucene/issues/12059#issuecomment-1368804331

   Hi Uwe,
   
   thank you for your kind reply.
   
   To answer your question, we are experimenting with Azure Container 
Instances, because of their relative simplicity, but they comes with some 
limitations:
   
   - we cannot set the max_map_count value as we do not have access to the 
underlying host 
(https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html).
 Unfortunately, this is required to run an ES cluster, therefore we were forced 
to use NIOFS
   - ACIs only allow volume mappings using Azure File Shares, which only works 
with NFS or SMB.
   
   I will move this on the suggested mailing list.
   
   Thank you again.
   
   Seb
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)

2023-01-02 Thread GitBox


rmuir commented on issue #12057:
URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368933723

   that's true, it is another option. I am still investigating the ECJ one. 
   
   It is a bit sad that we are so messy about deprecations and can't simply use 
the compiler's support. For example there are many deprecations in `main`, 
which makes no sense.
   
   As an example, all the big-endian varhandles in `BitUtil` are deprecated, 
but they are in use across `main`, and from what I can tell, usage is only 
growing. That's because java uses it internally? So what is the point of the 
deprecation here? It is doing no good at all and just preventing us from using 
our compilers properly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-02 Thread GitBox


gsmiller commented on PR #12055:
URL: https://github.com/apache/lucene/pull/12055#issuecomment-1369212139

   +1 to this approach in general.
   
   I do wonder if the distribution assumptions generally hold if we start 
looking at "term in set" queries though. That's sort of irrelevant right now 
since that implementation is still separate (`TermInSetQuery`), but this may 
add another reason to keep that implementation separate going forward. I think 
the difference with "term in set" is that it may not follow natural language 
distributions in general, while the current MultiTermQuery implementations most 
likely do.
   
   I also wonder if we could be more aggressive with the number of clauses we 
build into a `BooleanQuery` if we leverage the short-circuiting idea in #11928. 
Might be a nice fit for this "filtering" case.
   
   Just a couple thoughts but certainly nothing blocking or anything that needs 
to be included as part of this PR. Just wanted to toss them out there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vatsalpatel3689 opened a new pull request, #12060: fixing fkltr change'

2023-01-02 Thread GitBox


vatsalpatel3689 opened a new pull request, #12060:
URL: https://github.com/apache/lucene/pull/12060

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-02 Thread GitBox


rmuir commented on PR #12055:
URL: https://github.com/apache/lucene/pull/12055#issuecomment-1369222376

   there's no reason to duplicate a bunch of code just because of minor changes 
to a rewrite method. we can have more than one or two of these rewritemethods, 
and use different ones for different queries: this fact seems to have been 
forgotten here. and maybe we should have e.g. simple FILTER rewrite that just 
does that, this one with lots of magic could have a different name.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vatsalpatel3689 closed pull request #12060: fixing fkltr change'

2023-01-02 Thread GitBox


vatsalpatel3689 closed pull request #12060: fixing fkltr change'
URL: https://github.com/apache/lucene/pull/12060


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] aykutfirat commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-01-02 Thread GitBox


aykutfirat commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1369369480

   Lots of things happened since Aug, like the arrival of ChatGPT, and people's 
increased desire to use OpenAI's state of the art embeddings which are of size 
1536. Can you at least please increase it to 2048 for now, while you discuss 
upper limits? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang merged pull request #12046: Out of boundary in CombinedFieldQuery#addTerm

2023-01-02 Thread GitBox


LuXugang merged PR #12046:
URL: https://github.com/apache/lucene/pull/12046


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org