[GitHub] [lucene] romseygeek commented on issue #11851: Luke web interface
romseygeek commented on issue #11851: URL: https://github.com/apache/lucene/issues/11851#issuecomment-1280453204 That was marple @dsmiley! Which might be a useful start but is indeed very out of date. It uses node to build the UI, but I'm sure there are better ways to do things now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv opened a new pull request, #11856: Fix nanos to millis conversion for tests
matriv opened a new pull request, #11856: URL: https://github.com/apache/lucene/pull/11856 Fix some wrong division to calculate millis from nanos in tests, after commit that removes System.currentTimeMillis() calls. Follows: fb40a43c0de65950e51380e9bc0b409030c53068 Follows: https://github.com/apache/lucene/pull/11749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]
matriv commented on issue #11459: URL: https://github.com/apache/lucene/issues/11459#issuecomment-1280660003 Thank you @donnerpeter! Opened a PR to address the rest of the incorrect divisions: https://github.com/apache/lucene/pull/11856 /cc @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #11856: Fix nanos to millis conversion for tests
dweiss commented on code in PR #11856: URL: https://github.com/apache/lucene/pull/11856#discussion_r996928969 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/sinks/TestTeeSinkTokenFilter.java: ## @@ -199,7 +199,11 @@ public void performance() throws Exception { } long finish = System.nanoTime(); System.out.println( -"ModCount: " + modCounts[j] + " Two fields took " + (finish - start) / 100_000 + " ms"); +"ModCount: " ++ modCounts[j] ++ " Two fields took " ++ (finish - start) / 1_000_000 Review Comment: Maybe we could be less old-fashioned and just use TimeUnit.seconds(1).toNanos() here and in other places? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests
matriv commented on code in PR #11856: URL: https://github.com/apache/lucene/pull/11856#discussion_r996932110 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/sinks/TestTeeSinkTokenFilter.java: ## @@ -199,7 +199,11 @@ public void performance() throws Exception { } long finish = System.nanoTime(); System.out.println( -"ModCount: " + modCounts[j] + " Two fields took " + (finish - start) / 100_000 + " ms"); +"ModCount: " ++ modCounts[j] ++ " Two fields took " ++ (finish - start) / 1_000_000 Review Comment: You mean maybe: `TimeUnit.NANOSECONDS.toMillis(finish-start)` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #11856: Fix nanos to millis conversion for tests
dweiss commented on code in PR #11856: URL: https://github.com/apache/lucene/pull/11856#discussion_r996934411 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/sinks/TestTeeSinkTokenFilter.java: ## @@ -199,7 +199,11 @@ public void performance() throws Exception { } long finish = System.nanoTime(); System.out.println( -"ModCount: " + modCounts[j] + " Two fields took " + (finish - start) / 100_000 + " ms"); +"ModCount: " ++ modCounts[j] ++ " Two fields took " ++ (finish - start) / 1_000_000 Review Comment: Either way (constant to nanos or the value from nanos to millis). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11856: Fix nanos to millis conversion for tests
rmuir commented on PR #11856: URL: https://github.com/apache/lucene/pull/11856#issuecomment-1280767858 thank you @matriv for following up here. I like the use of TimeUnit conversion (thanks @dweiss), very easy to read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11856: Fix nanos to millis conversion for tests
matriv commented on PR #11856: URL: https://github.com/apache/lucene/pull/11856#issuecomment-1280771031 I pushed only for the wrong calculations, will let you know once I've converted all such divisions to use `TimeUnit` instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp
msokolov commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r997034628 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: I think that has already been addressed and we are not talking about the latest version here. (Note the "Outdated" tag on the code snippet above). GitHub UI is a little confusing this way -- I'll add a comment next to the updated version to draw your attention over there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp
msokolov commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r997035422 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.luke.app.web; + +import java.net.InetSocketAddress; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.CountDownLatch; +import java.util.logging.Level; +import java.util.logging.Logger; +import org.apache.lucene.luke.app.IndexHandler; +import org.apache.lucene.luke.util.LoggerFactory; + +/** Entry class for web Luke */ +public final class LukeWebMain { + + private LukeWebMain() {} + + static { +LoggerFactory.initGuiLogging(); + } + + private static final Logger LOG = LoggerFactory.getLogger(LukeWebMain.class); + + public static void main(String[] args) throws Exception { +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); +} +IndexHandler indexHandler = IndexHandler.getInstance(); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); +CountDownLatch tombstone = new CountDownLatch(1); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); +httpService.start(); +tombstone.await(); + } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress("127.0.0.1", port); Review Comment: Note: updated to use `localhost` by default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11852: Luke Webapp
uschindler commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r997045001 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: Sorry. Haven't seen the update. Anyways, I'd prefer "::1" as IP address (says the IPv4-must-die-policeman). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests
matriv commented on code in PR #11856: URL: https://github.com/apache/lucene/pull/11856#discussion_r997068202 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes( } private static double nsToSec(long ns) { -return ns / 10.0; +return ns / (double) TimeUnit.SECONDS.toNanos(1); Review Comment: Heads up for such changes in production code, could this potentially add overhead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk opened a new pull request, #2674: SOLR-16464: Upgrade commons-text to 1.10.0
risdenk opened a new pull request, #2674: URL: https://github.com/apache/lucene-solr/pull/2674 https://issues.apache.org/jira/browse/SOLR-16464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp
rmuir commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r997309676 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: sorry maybe i'm unclear. i just dont think we should allow anything except localhost. please, let's avoid security issues of the fact that such a thing could read any file on the system (maybe /etc/passwd) and even leak shit about non-indexes via exceptions. or that it wouldn't have any TLS at all leaving it vulnerable to anything in-flight. or that it wouldn't have authentication, etc. we could always "open it up" later if we want to do that, but I really think there is value in not attaching ourselves to the network. again, if someone wants, they can place reverse proxy in front of it that will handle all these things better (TLS, auth, request filtering, rate-limiting, DDOS, ...) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp
rmuir commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r997311191 ## lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java: ## @@ -17,31 +17,78 @@ package org.apache.lucene.luke.app.web; +import java.net.InetSocketAddress; import java.util.concurrent.CountDownLatch; +import java.util.HashMap; +import java.util.Map; import org.apache.lucene.luke.app.IndexHandler; import org.apache.lucene.luke.util.LoggerFactory; /** Entry class for web Luke */ -public class LukeWebMain { +public final class LukeWebMain { + + private LukeWebMain() { + } static { LoggerFactory.initGuiLogging(); } public static void main(String[] args) throws Exception { -String index = null; -if (args.length == 2 && args[0].equals("--index")) { - index = args[1]; -} else { - System.err.println("usage: LukeWebMain --index "); - Runtime.getRuntime().exit(1); +Map parsed = null; +try { + parsed = parseArgs(args); +} catch (Exception e) { + usage(e.getMessage()); } - IndexHandler indexHandler = IndexHandler.getInstance(); -indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, true, false); +indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", true, true, false); CountDownLatch tombstone = new CountDownLatch(1); -HttpService httpService = new HttpService(indexHandler, tombstone); +HttpService httpService = new HttpService(getSockAddr(parsed), indexHandler, tombstone); httpService.start(); tombstone.await(); } + + private static String getIndex(Map args) { +String index = (String) args.get("index"); +if (index == null) { + usage("index arg is required"); +} +return index; + } + + private static InetSocketAddress getSockAddr(Map args) { +String host = (String) args.get("host"); +int port = (Integer) args.getOrDefault("port", 8080); +if (host == null) { + return new InetSocketAddress(port); Review Comment: ... security headers, compression, just would love for us to stay out of the business of all this. and avoid hassles like security reports as a project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
rmuir commented on code in PR #11857: URL: https://github.com/apache/lucene/pull/11857#discussion_r997318423 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); Review Comment: can we add a code comment above the map maybe? to me, looks like this is a cache on a per-query basis essentially? due to how compounds get decomposed internally we'll do a bunch of redundant work otherwise? but an unbounded cache looks scary as hell, if you look at any code for the first time? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk merged pull request #2674: SOLR-16464: Upgrade commons-text to 1.10.0
risdenk merged PR #2674: URL: https://github.com/apache/lucene-solr/pull/2674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] donnerpeter commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
donnerpeter commented on code in PR #11857: URL: https://github.com/apache/lucene/pull/11857#discussion_r997326314 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); Review Comment: It's a per-`suggest`-call cache, it will be GCed afterwards. It caches the prefixes tried as the first parts of potential compounds. The cached `findStem` calls are relatively expensive, and there can be thousands of them for each `suggest`. If such a call returns `null`, the rest of a word won't be split for compounds, so caching nulls is essential. I hope this makes things clearer. Which parts of this would you like in the comment? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
rmuir commented on code in PR #11857: URL: https://github.com/apache/lucene/pull/11857#discussion_r997328633 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); Review Comment: above works for me! thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
dweiss commented on code in PR #11857: URL: https://github.com/apache/lucene/pull/11857#discussion_r997341034 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); + @Override boolean acceptsStem(int formID) { return !dictionary.hasFlag(formID, dictionary.noSuggest) && !dictionary.hasFlag(formID, dictionary.subStandard); } + + @Override + Root findStem( + char[] chars, int offset, int length, WordCase originalCase, WordContext context) { +if (context == COMPOUND_BEGIN && originalCase == null) { + return compoundCache + .computeIfAbsent( + new String(chars, offset, length), + __ -> + Optional.ofNullable(super.findStem(chars, offset, length, null, context))) Review Comment: Sorry for being dim but what's the point of storing those Optionals in the map? Is storing just null values somehow out of fashion? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #11856: Fix nanos to millis conversion for tests
dweiss commented on code in PR #11856: URL: https://github.com/apache/lucene/pull/11856#discussion_r997342151 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes( } private static double nsToSec(long ns) { -return ns / 10.0; +return ns / (double) TimeUnit.SECONDS.toNanos(1); Review Comment: If you're concerned about it, move it to a constant? I think it'll be negligible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] donnerpeter commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
donnerpeter commented on code in PR #11857: URL: https://github.com/apache/lucene/pull/11857#discussion_r997349204 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); + @Override boolean acceptsStem(int formID) { return !dictionary.hasFlag(formID, dictionary.noSuggest) && !dictionary.hasFlag(formID, dictionary.subStandard); } + + @Override + Root findStem( + char[] chars, int offset, int length, WordCase originalCase, WordContext context) { +if (context == COMPOUND_BEGIN && originalCase == null) { + return compoundCache + .computeIfAbsent( + new String(chars, offset, length), + __ -> + Optional.ofNullable(super.findStem(chars, offset, length, null, context))) Review Comment: computeIfAbsent is called even if there's an associated null value. And containsKey+get+put finds the hash entry several times ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); + @Override boolean acceptsStem(int formID) { return !dictionary.hasFlag(formID, dictionary.noSuggest) && !dictionary.hasFlag(formID, dictionary.subStandard); } + + @Override + Root findStem( + char[] chars, int offset, int length, WordCase originalCase, WordContext context) { +if (context == COMPOUND_BEGIN && originalCase == null) { + return compoundCache + .computeIfAbsent( + new String(chars, offset, length), + __ -> + Optional.ofNullable(super.findStem(chars, offset, length, null, context))) Review Comment: computeIfAbsent lambda is called even if there's an associated null value. And containsKey+get+put finds the hash entry several times -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests
matriv commented on code in PR #11856: URL: https://github.com/apache/lucene/pull/11856#discussion_r997357120 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes( } private static double nsToSec(long ns) { -return ns / 10.0; +return ns / (double) TimeUnit.SECONDS.toNanos(1); Review Comment: Any suggestion for a "central" place to put this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
dweiss commented on code in PR #11857: URL: https://github.com/apache/lucene/pull/11857#discussion_r997417873 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java: ## @@ -614,11 +617,27 @@ private void doSuggest( Runnable checkCanceled) { Hunspell suggestionSpeller = new Hunspell(dictionary, policy, checkCanceled) { + final Map>> compoundCache = new HashMap<>(); + @Override boolean acceptsStem(int formID) { return !dictionary.hasFlag(formID, dictionary.noSuggest) && !dictionary.hasFlag(formID, dictionary.subStandard); } + + @Override + Root findStem( + char[] chars, int offset, int length, WordCase originalCase, WordContext context) { +if (context == COMPOUND_BEGIN && originalCase == null) { + return compoundCache + .computeIfAbsent( + new String(chars, offset, length), + __ -> + Optional.ofNullable(super.findStem(chars, offset, length, null, context))) Review Comment: Had to look it up to believe it: "If the specified key is not already associated with a value (or is mapped to null)". Damn, that's a non-intuitive spec... I've always had a clear perception that it's only the key association (key presence) that is validated here... and it's just been ruined. Sorry for the noise, @donnerpeter, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] donnerpeter merged pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests
donnerpeter merged PR #11857: URL: https://github.com/apache/lucene/pull/11857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #11849: Fix failure to load larger data sets in KnnGraphTest
benwtrent commented on PR #11849: URL: https://github.com/apache/lucene/pull/11849#issuecomment-1281604399 @jtibshirani I confirmed that on my M1 Macbook, there is no significant change in QPS, I tested glove-100-angular and deep-image-96-angular -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #11849: Fix failure to load larger data sets in KnnGraphTest
jtibshirani merged PR #11849: URL: https://github.com/apache/lucene/pull/11849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new issue, #11858: Lucene94HnswVectorsFormat validation fails with large datasets
jtibshirani opened a new issue, #11858: URL: https://github.com/apache/lucene/issues/11858 I ran a test on Lucene 9.4 where I tried to force merge 2 million vectors with dimension 768. It failed with ``` java.lang.IllegalStateException: Vector data length 3070061568 not matching size=999369 * dim=768 * byteSize=4 = -1224905728 ``` The problem is that we use an integer to represent the size, which is too small to hold it. The bug snuck in during the work to enable int8 values, which switched a long to an int: #1054. This error doesn't occur before version 9.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on issue #11858: Lucene94HnswVectorsFormat validation fails with large datasets
jtibshirani commented on issue #11858: URL: https://github.com/apache/lucene/issues/11858#issuecomment-1281793284 Here's the stack trace: ``` java.lang.IllegalStateException: Vector data length 3070061568 not matching size=999369 * dim=768 * byteSize=4 = -1224905728 at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.validateFieldEntry(Lucene94HnswVectorsReader.java:185) at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.readFields(Lucene94HnswVectorsReader.java:156) at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.readMetadata(Lucene94HnswVectorsReader.java:103) at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.(Lucene94HnswVectorsReader.java:64) at org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsFormat.fieldsReader(Lucene94HnswVectorsFormat.java:157) ``` Thanks to @ebadyano and @benwtrent for helping me track this down so quickly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] harishankar-gopalan commented on pull request #239: LUCENE-10040: Handle deletions in nearest vector search
harishankar-gopalan commented on PR #239: URL: https://github.com/apache/lucene/pull/239#issuecomment-1281844998 Hi @jtibshirani / @msokolov , had a quick doubt on the deletion implementation. From the discussion and changes I understand that for now we have only marked the docs as deleted with the Bits value and ignored those documents while selecting our candidates. So with this basic understanding my doubts are: 1. Is the search space being expanded to meet the top-k criteria ? 2. Is there any active work going on to prune these "marked-for-delete" nodes in the graph i.e implementing the version of delete suggested by the Weaviate author(s) [here](https://github.com/nmslib/hnswlib/issues/4#issuecomment-678315156) ? Another unrelated doubt: 1. How are segment merges in general handled as I suppose graphs of the corresponding segments also would have to be merged ? And in the same context does deleted nodes have any effect on the merge operations in terms of high time complexity for some edge cases that the users should be aware of ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org