[GitHub] [lucene] uschindler commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite
uschindler commented on PR #11840: URL: https://github.com/apache/lucene/pull/11840#issuecomment-1278656494 I think the general idea looks well. This should me Lucene 10 only, as the changes in query API may break lots of code downstream. We could theoretically add a backwards combatiility layer using VirtualMethod (says the policeman), if that's wanted, I can help. `VirtualMethod` utility class would be needed to figure out which of both methods (`rewrite(IndexReader)` or `rewrite(IndexSearcher)`) was overridden by a subclass downstream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite
jpountz commented on PR #11840: URL: https://github.com/apache/lucene/pull/11840#issuecomment-1278661850 Agreed on making this Lucene 10. As far as bw compat is concerned, I was thinking of updating Lucene 9 this way: - introduce `Query#rewrite(IndexSearcher)` and make it delegate to `Query#rewrite(IndexReader)` - keep existing implementations as they are, except possibly the few ones where we'd like to take advantage of the executor - deprecate `Query#rewrite(IndexReader)` - make `IndexSearcher#rewrite(Query)` use `Query#rewrite(IndexSearcher)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput
rmuir commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278844737 it absolutely is. Look at the fucking diff dude. You won't sneak this one past me. -1 to this interning stuff. I remember what things were like when we made this mistake before and had to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput
dsmiley commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278967192 Could you comment on the pertinent line of code please? `canonicalStrings` is not static. Maybe you are thinking of the ThreadLocal? That's some temporary byte buffer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] almogtavor commented on pull request #11852: Luke Webapp
almogtavor commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1279032576 @msokolov What are the reasons for choosing to create a web app without a usage of a web framework such as React? Writing Vanilla will get pretty cumbersome when things get complex -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] YmBIgo opened a new issue, #11853: Make CJKAnalyzer that use Trigram instead of Bigram
YmBIgo opened a new issue, #11853: URL: https://github.com/apache/lucene/issues/11853 ### Description Hi. I want to make Trigram Filter for [CJK Analyzer](https://github.com/apache/lucene/tree/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk) since I am Japanese. And I have written some simple code to achieve it. It is below inspired by original [CJKBigramFilter.java](https://github.com/apache/lucene/blob/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKBigramFilter.java) file. It is work fine at my environment. But I still cannot understand how I should do conditional branch for `flushUnigram` ( and if necessary `flushBigram`) instead of add-on method that flush trigram `flushTrigram`. Is there any idea that teach me how and why I should flushUnigram at `if (offsetAtt.startOffset() != lastEndOffset)` case3. Thank you. ```patch /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.lucene.analysis.cjk; import java.io.IOException; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardTokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; import org.apache.lucene.util.ArrayUtil; /** * Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer. * * CJK types are set by these tokenizers, but you can also use {@link * #CJKBigramFilter(TokenStream, int)} to explicitly control which of the CJK scripts are turned * into bigrams. * * By default, when a CJK character has no adjacent characters to form a bigram, it is output in * unigram form. If you want to always output both unigrams and bigrams, set the * outputUnigrams flag in {@link CJKBigramFilter#CJKBigramFilter(TokenStream, int, boolean)}. * This can be used for a combined unigram+bigram approach. * * Unlike ICUTokenizer, StandardTokenizer does not split at script boundaries. Korean Hangul * characters are treated the same as many other scripts' letters, and as a result, * StandardTokenizer can produce tokens that mix Hangul and non-Hangul characters, e.g. "한국abc". * Such mixed-script tokens are typed asrather than * , and as a result, will not be converted to bigrams by CJKBigramFilter. * * In all cases, all non-CJK input is passed thru unmodified. */ public final class CJKTrigramFilter extends TokenFilter { // configuration /** bigram flag for Han Ideographs */ public static final int HAN = 1; /** bigram flag for Hiragana */ public static final int HIRAGANA = 2; /** bigram flag for Katakana */ public static final int KATAKANA = 4; /** bigram flag for Hangul */ public static final int HANGUL = 8; /** when we emit a bigram, it's then marked as this type */ public static final String DOUBLE_TYPE = ""; /** when we emit a unigram, it's then marked as this type */ public static final String SINGLE_TYPE = ""; /* */ + public static final String TRIPLE_TYPE = ""; // the types from standardtokenizer private static final String HAN_TYPE = StandardTokenizer.TOKEN_TYPES[StandardTokenizer.IDEOGRAPHIC]; private static final String HIRAGANA_TYPE = StandardTokenizer.TOKEN_TYPES[StandardTokenizer.HIRAGANA]; private static final String KATAKANA_TYPE = StandardTokenizer.TOKEN_TYPES[StandardTokenizer.KATAKANA]; private static final String HANGUL_TYPE = StandardTokenizer.TOKEN_TYPES[StandardTokenizer.HANGUL]; // sentinel value for ignoring a script private static final Object NO = new Object();
[GitHub] [lucene] benwtrent commented on issue #11830: Store HNSW graph connections more compactly
benwtrent commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1279207529 OK, I ran this patch against minst, sift, glove, and deep image. Recall is the exact same in all cases, so no mistakes there. There are indeed slight differences in latency, below are images. I ran search 5 times for each parameter combination and chose the fastest. All of these were with `"M": 16, "efConstruction": 100`. For each, I specifically point out the QPS at the lowest recall. The deltas in QPS vary from near 0% change to 11% decrease. 11% does seem large but I think its an outlier. The median decrease is 2%, the average change is 0% (or 0.002%). [ann_qps_comparison - ann_scores.csv](https://github.com/apache/lucene/files/9788126/ann_qps_comparison.-.ann_scores.csv) For every data set, the memory required reduced at around a similar amount. This change seems like a good step forward to me. @jtibshirani # Deep Image  # Glove  # Minst  # Sift  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp
rmuir commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r995989985 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: Maybe just allow this to run on localhost only? Always? Otherwise I am concerned about security implications. PPl will be reporting vulnerabilities that there is no TLS, no authentication, that you can do bad things, etc. If we only let it run on localhost (or unix socket), then if someone wants to expose it to the network, they have to do that themselves (e.g. put nginx or haproxy in front of it, where the TLS, auth, etc can sit). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp
rmuir commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r995991513 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: At the minimum at least default to a localhost only bind if the user doesn't specify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput
rmuir commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1279364308 both are an issue. The threadlocal will leak badly, especially for cases that "churn" lots of threads (e.g. jetty defaults, e.g. java webapps). It will cause more harm than good. the private map just leaks for the caller by unnecessarily saving strings. it's a lot of overhead to datainput to add this map. keep in mind datainput isn't closeable and has other implementations such as ByteArrayDataInput. Sorry, I don't see advantages to this stuff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mdmarshmallow commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1279371679 I resolved a merge conflict with `CHANGES.txt`. I think I've also addressed all the issues people have pointed out as well. Would it be possible for someone to take a look at this and merge it if there are no more objections? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp
msokolov commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r996170730 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: thanks for checking this. I changed it to default to localhost. I think we should allow users to expose this if they want to. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11852: Luke Webapp
msokolov commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1279488548 > What are the reasons for choosing to create a web app without a usage of a web framework such as React? Writing Vanilla will get pretty cumbersome when things get complex I don't want to get into endless discussions about what is the best framework, that's one reason. I think you can do a lot with vanilla without all that much trouble, and I actually don't want things to get complex. The whole idea of this is simple. Another reason is I want this to last a long time, and JS frameworks seem to have a habit of going out of style after a minute or two. Another reason is if you pull in dependencies, then you have to keep them up to date. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11852: Luke Webapp
msokolov commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1279498308 Re: JS frameworks - I recognize my position is from Ludd, and it might be untenable. If it gets out of hand we can always add something like jQuery, but we can never remove, so let's start here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] patsonluk opened a new pull request, #2673: Flag exception during core creation if the node defined in state.json is not the same as current node
patsonluk opened a new pull request, #2673: URL: https://github.com/apache/lucene-solr/pull/2673 ## Description It's found that our prod env have certain data nodes have "ghost replicas" that do not have data dir but has the core.properties file and core directory. Replica with same name is actually reside on a different node as defined in the `state.json`. Such "ghost replicas" can trigger `DOWN` replica state being published, which the real replica (with same name) is actually healthy in another node. More details of the issue can be found in https://app.shortcut.com/fullstory/story/217252/investigate-replica-that-failed-to-come-up-as-active-during-restart-deployment#activity-217734 ## Solution While we do not yet know the exact cause of those "ghost replicas" (probably from some migration hiccup during c82 creation?), it seems to be a rare occurrence now (8 replicas in c82). Therefore we will add a new exception `InconsistentClusterStateException`, which would be thrown from `checkStateInZk` if node name of a replica defined in state.json is different from the current node which tries to spin up such core. Such exception would interrupt the core creation, and no longer publish a `DOWN` state. For now, we will NOT provide an cleanup in the Solr code, as this seems to be an edge case and cleanup (ie unload core and remove the physical directory) could be risky. Take note that we will probably still need to "clean up" those ghost replicas later on perhaps by manually purging them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] patsonluk closed pull request #2673: Flag exception during core creation if the node defined in state.json is not the same as current node
patsonluk closed pull request #2673: Flag exception during core creation if the node defined in state.json is not the same as current node URL: https://github.com/apache/lucene-solr/pull/2673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on issue #11830: Store HNSW graph connections more compactly
jtibshirani commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1279577436 Thanks @benwtrent , the results look promising! It seems like a good time to open a PR. We might even have ideas for optimizations once we see how the change looks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput
dsmiley commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1279653616 The ThreadLocal aspect was a needless addition here; it could be reverted. > the private map just leaks for the caller by unnecessarily saving strings. it's a lot of overhead to datainput to add this map. keep in mind datainput isn't closeable and has other implementations such as ByteArrayDataInput. Let's imagine a small improvement to the patch to lazily create the map on first use. Some usages of ByteArrayDataInput and perhaps others would never use it (e.g. SynonymGraphFilter). Are there cases you see that would lead to undesirable use of these canonical strings? That it's not closable doesn't seem to be an issue to me since DataInput should not be outliving the things it was used to read; quite the reverse, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org