[GitHub] [lucene] uschindler commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-14 Thread GitBox


uschindler commented on PR #11840:
URL: https://github.com/apache/lucene/pull/11840#issuecomment-1278656494

   I think the general idea looks well.
   This should me Lucene 10 only, as the changes in query API may break lots of 
code downstream. We could theoretically add a backwards combatiility layer 
using VirtualMethod (says the policeman), if that's wanted, I can help. 
`VirtualMethod` utility class would be needed to figure out which of both 
methods (`rewrite(IndexReader)` or `rewrite(IndexSearcher)`) was overridden by 
a subclass downstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-14 Thread GitBox


jpountz commented on PR #11840:
URL: https://github.com/apache/lucene/pull/11840#issuecomment-1278661850

   Agreed on making this Lucene 10.
   
   As far as bw compat is concerned, I was thinking of updating Lucene 9 this 
way:
- introduce `Query#rewrite(IndexSearcher)` and make it delegate to 
`Query#rewrite(IndexReader)`
- keep existing implementations as they are, except possibly the few ones 
where we'd like to take advantage of the executor
- deprecate `Query#rewrite(IndexReader)`
- make `IndexSearcher#rewrite(Query)` use `Query#rewrite(IndexSearcher)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-14 Thread GitBox


rmuir commented on PR #11847:
URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278844737

   it absolutely is. Look at the fucking diff dude. You won't sneak this one 
past me.
   
   -1 to this interning stuff. I remember what things were like when we made 
this mistake before and had to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-14 Thread GitBox


dsmiley commented on PR #11847:
URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278967192

   Could you comment on the pertinent line of code please?  `canonicalStrings` 
is not static.  Maybe you are thinking of the ThreadLocal?  That's some 
temporary byte buffer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] almogtavor commented on pull request #11852: Luke Webapp

2022-10-14 Thread GitBox


almogtavor commented on PR #11852:
URL: https://github.com/apache/lucene/pull/11852#issuecomment-1279032576

   @msokolov What are the reasons for choosing to create a web app without a 
usage of a web framework such as React? Writing Vanilla will get pretty 
cumbersome when things get complex


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] YmBIgo opened a new issue, #11853: Make CJKAnalyzer that use Trigram instead of Bigram

2022-10-14 Thread GitBox


YmBIgo opened a new issue, #11853:
URL: https://github.com/apache/lucene/issues/11853

   ### Description
   
   Hi. I want to make Trigram Filter for [CJK 
Analyzer](https://github.com/apache/lucene/tree/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk)
 since I am Japanese.
   And I have written some simple code to achieve it. It is below inspired by 
original 
[CJKBigramFilter.java](https://github.com/apache/lucene/blob/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKBigramFilter.java)
 file. It is work fine at my environment.
   
   But I still cannot understand how I should do conditional branch for 
`flushUnigram` ( and if necessary `flushBigram`) instead of add-on method that 
flush trigram `flushTrigram`.
   
   Is there any idea that teach me how and why I should flushUnigram at `if 
(offsetAtt.startOffset() != lastEndOffset)` case3.
   
   Thank you.
   
   ```patch
   /*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements.  See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License.  You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
   package org.apache.lucene.analysis.cjk;
   
   import java.io.IOException;
   import org.apache.lucene.analysis.TokenFilter;
   import org.apache.lucene.analysis.TokenStream;
   import org.apache.lucene.analysis.standard.StandardTokenizer;
   import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
   import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
   import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
   import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
   import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
   import org.apache.lucene.util.ArrayUtil;
   
   /**
* Forms bigrams of CJK terms that are generated from StandardTokenizer or 
ICUTokenizer.
*
* CJK types are set by these tokenizers, but you can also use {@link
* #CJKBigramFilter(TokenStream, int)} to explicitly control which of the 
CJK scripts are turned
* into bigrams.
*
* By default, when a CJK character has no adjacent characters to form a 
bigram, it is output in
* unigram form. If you want to always output both unigrams and bigrams, set 
the 
* outputUnigrams flag in {@link 
CJKBigramFilter#CJKBigramFilter(TokenStream, int, boolean)}.
* This can be used for a combined unigram+bigram approach.
*
* Unlike ICUTokenizer, StandardTokenizer does not split at script 
boundaries. Korean Hangul
* characters are treated the same as many other scripts' letters, and as a 
result,
* StandardTokenizer can produce tokens that mix Hangul and non-Hangul 
characters, e.g. "한국abc".
* Such mixed-script tokens are typed as  
rather than 
* , and as a result, will not be converted to bigrams 
by CJKBigramFilter.
*
* In all cases, all non-CJK input is passed thru unmodified.
*/
   public final class CJKTrigramFilter extends TokenFilter {
 // configuration
 /** bigram flag for Han Ideographs */
 public static final int HAN = 1;
 /** bigram flag for Hiragana */
 public static final int HIRAGANA = 2;
 /** bigram flag for Katakana */
 public static final int KATAKANA = 4;
 /** bigram flag for Hangul */
 public static final int HANGUL = 8;
   
 /** when we emit a bigram, it's then marked as this type */
 public static final String DOUBLE_TYPE = "";
 /** when we emit a unigram, it's then marked as this type */
 public static final String SINGLE_TYPE = "";
 /*  */
   +  public static final String TRIPLE_TYPE = "";
   
 // the types from standardtokenizer
 private static final String HAN_TYPE =
 StandardTokenizer.TOKEN_TYPES[StandardTokenizer.IDEOGRAPHIC];
 private static final String HIRAGANA_TYPE =
 StandardTokenizer.TOKEN_TYPES[StandardTokenizer.HIRAGANA];
 private static final String KATAKANA_TYPE =
 StandardTokenizer.TOKEN_TYPES[StandardTokenizer.KATAKANA];
 private static final String HANGUL_TYPE = 
StandardTokenizer.TOKEN_TYPES[StandardTokenizer.HANGUL];
   
 // sentinel value for ignoring a script
 private static final Object NO = new Object();
   
  

[GitHub] [lucene] benwtrent commented on issue #11830: Store HNSW graph connections more compactly

2022-10-14 Thread GitBox


benwtrent commented on issue #11830:
URL: https://github.com/apache/lucene/issues/11830#issuecomment-1279207529

   OK, I ran this patch against minst, sift, glove, and deep image. Recall is 
the exact same in all cases, so no mistakes there. 
   
   There are indeed slight differences in latency, below are images. I ran 
search 5 times for each parameter combination and chose the fastest. All of 
these were with `"M": 16,  "efConstruction": 100`.
   
   For each, I specifically point out the QPS at the lowest recall.
   
   The deltas in QPS vary from near 0% change to 11% decrease. 11% does seem 
large but I think its an outlier. The median decrease is 2%, the average change 
is 0% (or 0.002%).
   [ann_qps_comparison - 
ann_scores.csv](https://github.com/apache/lucene/files/9788126/ann_qps_comparison.-.ann_scores.csv)
   
   For every data set, the memory required reduced at around a similar amount.
   
   This change seems like a good step forward to me. 
   
   @jtibshirani 
   
   # Deep Image
   
   
![deep-image-96-angular-batch](https://user-images.githubusercontent.com/4357155/195885355-70fa0c17-23bb-4389-af70-f7c3e5668f64.png)
   
   # Glove
   
   
![glove-100-angular-batch](https://user-images.githubusercontent.com/4357155/195885357-0776bc5e-ed4d-41af-9109-6a3764c3ba5d.png)
   
   # Minst
   
   
![mnist-784-euclidean-batch](https://user-images.githubusercontent.com/4357155/195885361-c3267fd0-8a79-4184-953c-a6cc41f1f7dd.png)
   
   # Sift
   
   
![sift-128-euclidean-batch](https://user-images.githubusercontent.com/4357155/195885362-fa71d343-f75f-4d51-9dfc-b202e75f319a.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp

2022-10-14 Thread GitBox


rmuir commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r995989985


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   Maybe just allow this to run on localhost only? Always?
   
   Otherwise I am concerned about security implications. PPl will be reporting 
vulnerabilities that there is no TLS, no authentication, that you can do bad 
things, etc.
   
   If we only let it run on localhost (or unix socket), then if someone wants 
to expose it to the network, they have to do that themselves (e.g. put nginx or 
haproxy in front of it, where the TLS, auth, etc can sit).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp

2022-10-14 Thread GitBox


rmuir commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r995991513


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   At the minimum at least default to a localhost only bind if the user doesn't 
specify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-14 Thread GitBox


rmuir commented on PR #11847:
URL: https://github.com/apache/lucene/pull/11847#issuecomment-1279364308

   both are an issue.
   
   The threadlocal will leak badly, especially for cases that "churn" lots of 
threads (e.g. jetty defaults, e.g. java webapps). It will cause more harm than 
good.
   
   the private map just leaks for the caller by unnecessarily saving strings. 
it's a lot of overhead to datainput to add this map. keep in mind datainput 
isn't closeable and has other implementations such as ByteArrayDataInput.
   
   Sorry, I don't see advantages to this stuff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-10-14 Thread GitBox


mdmarshmallow commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1279371679

   I resolved a merge conflict with `CHANGES.txt`. I think I've also addressed 
all the issues people have pointed out as well. Would it be possible for 
someone to take a look at this and merge it if there are no more objections? 
Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp

2022-10-14 Thread GitBox


msokolov commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r996170730


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   thanks for checking this. I changed it to default to localhost. I think we 
should allow users to expose this if they want to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #11852: Luke Webapp

2022-10-14 Thread GitBox


msokolov commented on PR #11852:
URL: https://github.com/apache/lucene/pull/11852#issuecomment-1279488548

   > What are the reasons for choosing to create a web app without a usage of a 
web framework such as React? Writing Vanilla will get pretty cumbersome when 
things get complex
   
   I don't want to get into endless discussions about what is the best 
framework, that's one reason. I think you can do a lot with vanilla without all 
that much trouble, and I actually don't want things to get complex. The whole 
idea of this is simple. Another reason is I want this to last a long time, and 
JS frameworks seem to have a habit of going out of style after a minute or two. 
Another reason is if you pull in dependencies, then you have to keep them up to 
date.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #11852: Luke Webapp

2022-10-14 Thread GitBox


msokolov commented on PR #11852:
URL: https://github.com/apache/lucene/pull/11852#issuecomment-1279498308

   Re: JS frameworks - I recognize my position is from Ludd, and it might be 
untenable. If it gets out of hand we can always add something like jQuery, but 
we can never remove, so let's start here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] patsonluk opened a new pull request, #2673: Flag exception during core creation if the node defined in state.json is not the same as current node

2022-10-14 Thread GitBox


patsonluk opened a new pull request, #2673:
URL: https://github.com/apache/lucene-solr/pull/2673

   ## Description
   It's found that our prod env have certain data nodes have "ghost replicas" 
that do not have data dir but has the core.properties file and core directory. 
Replica with same name is actually reside on a different node as defined in the 
`state.json`. Such "ghost replicas" can trigger  `DOWN` replica state being 
published, which the real replica (with same name) is actually healthy in 
another node.
   
   More details of the issue can be found in 
https://app.shortcut.com/fullstory/story/217252/investigate-replica-that-failed-to-come-up-as-active-during-restart-deployment#activity-217734
   
   ## Solution
   While we do not yet know the exact cause of those "ghost replicas" (probably 
from some migration hiccup during c82 creation?), it seems to be a rare 
occurrence now (8 replicas in c82). 
   
   Therefore we will add a new exception `InconsistentClusterStateException`, 
which would be thrown from `checkStateInZk` if node name of a replica defined 
in state.json is different from the current node which tries to spin up such 
core. Such exception would interrupt the core creation, and no longer publish a 
`DOWN` state.
   
   For now, we will NOT provide an cleanup in the Solr code, as this seems to 
be an edge case and cleanup (ie unload core and remove the physical directory) 
could be risky. 
   
   Take note that we will probably still need to "clean up" those ghost 
replicas later on perhaps by manually purging them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] patsonluk closed pull request #2673: Flag exception during core creation if the node defined in state.json is not the same as current node

2022-10-14 Thread GitBox


patsonluk closed pull request #2673: Flag exception during core creation if the 
node defined in state.json is not the same as current node
URL: https://github.com/apache/lucene-solr/pull/2673


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on issue #11830: Store HNSW graph connections more compactly

2022-10-14 Thread GitBox


jtibshirani commented on issue #11830:
URL: https://github.com/apache/lucene/issues/11830#issuecomment-1279577436

   Thanks @benwtrent , the results look promising! It seems like a good time to 
open a PR. We might even have ideas for optimizations once we see how the 
change looks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-14 Thread GitBox


dsmiley commented on PR #11847:
URL: https://github.com/apache/lucene/pull/11847#issuecomment-1279653616

   The ThreadLocal aspect was a needless addition here; it could be reverted.
   
   > the private map just leaks for the caller by unnecessarily saving strings. 
it's a lot of overhead to datainput to add this map. keep in mind datainput 
isn't closeable and has other implementations such as ByteArrayDataInput.
   
   Let's imagine a small improvement to the patch to lazily create the map on 
first use.  Some usages of ByteArrayDataInput and perhaps others would never 
use it (e.g. SynonymGraphFilter).  Are there cases you see that would lead to 
undesirable use of these canonical strings?  That it's not closable doesn't 
seem to be an issue to me since DataInput should not be outliving the things it 
was used to read; quite the reverse, no?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org