[GitHub] [lucene] romseygeek commented on issue #11851: Luke web interface

2022-10-17 Thread GitBox


romseygeek commented on issue #11851:
URL: https://github.com/apache/lucene/issues/11851#issuecomment-1280453204

   That was marple @dsmiley! Which might be a useful start but is indeed very 
out of date.  It uses node to build the UI, but I'm sure there are better ways 
to do things now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv opened a new pull request, #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


matriv opened a new pull request, #11856:
URL: https://github.com/apache/lucene/pull/11856

   Fix some wrong division to calculate millis from nanos in tests, after 
commit that removes System.currentTimeMillis() calls.
   
   Follows: fb40a43c0de65950e51380e9bc0b409030c53068
   Follows: https://github.com/apache/lucene/pull/11749


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]

2022-10-17 Thread GitBox


matriv commented on issue #11459:
URL: https://github.com/apache/lucene/issues/11459#issuecomment-1280660003

   Thank you @donnerpeter!
   Opened a PR to address the rest of the incorrect divisions: 
https://github.com/apache/lucene/pull/11856
   /cc @rmuir 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


dweiss commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r996928969


##
lucene/analysis/common/src/test/org/apache/lucene/analysis/sinks/TestTeeSinkTokenFilter.java:
##
@@ -199,7 +199,11 @@ public void performance() throws Exception {
 }
 long finish = System.nanoTime();
 System.out.println(
-"ModCount: " + modCounts[j] + " Two fields took " + (finish - 
start) / 100_000 + " ms");
+"ModCount: "
++ modCounts[j]
++ " Two fields took "
++ (finish - start) / 1_000_000

Review Comment:
   Maybe we could be less old-fashioned and just use 
TimeUnit.seconds(1).toNanos() here and in other places?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


matriv commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r996932110


##
lucene/analysis/common/src/test/org/apache/lucene/analysis/sinks/TestTeeSinkTokenFilter.java:
##
@@ -199,7 +199,11 @@ public void performance() throws Exception {
 }
 long finish = System.nanoTime();
 System.out.println(
-"ModCount: " + modCounts[j] + " Two fields took " + (finish - 
start) / 100_000 + " ms");
+"ModCount: "
++ modCounts[j]
++ " Two fields took "
++ (finish - start) / 1_000_000

Review Comment:
   You mean maybe: `TimeUnit.NANOSECONDS.toMillis(finish-start)` ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


dweiss commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r996934411


##
lucene/analysis/common/src/test/org/apache/lucene/analysis/sinks/TestTeeSinkTokenFilter.java:
##
@@ -199,7 +199,11 @@ public void performance() throws Exception {
 }
 long finish = System.nanoTime();
 System.out.println(
-"ModCount: " + modCounts[j] + " Two fields took " + (finish - 
start) / 100_000 + " ms");
+"ModCount: "
++ modCounts[j]
++ " Two fields took "
++ (finish - start) / 1_000_000

Review Comment:
   Either way (constant to nanos or the value from nanos to millis).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


rmuir commented on PR #11856:
URL: https://github.com/apache/lucene/pull/11856#issuecomment-1280767858

   thank you @matriv for following up here. I like the use of TimeUnit 
conversion (thanks @dweiss), very easy to read.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


matriv commented on PR #11856:
URL: https://github.com/apache/lucene/pull/11856#issuecomment-1280771031

   I pushed only for the wrong calculations, will let you know once I've 
converted all such divisions to use `TimeUnit` instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp

2022-10-17 Thread GitBox


msokolov commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r997034628


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   I think that has already been addressed and we are not talking about the 
latest version here. (Note the "Outdated" tag on the code snippet above). 
GitHub UI is a little confusing this way -- I'll add a comment next to the 
updated version to draw your attention over there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp

2022-10-17 Thread GitBox


msokolov commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r997035422


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.luke.app.web;
+
+import java.net.InetSocketAddress;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.concurrent.CountDownLatch;
+import java.util.logging.Level;
+import java.util.logging.Logger;
+import org.apache.lucene.luke.app.IndexHandler;
+import org.apache.lucene.luke.util.LoggerFactory;
+
+/** Entry class for web Luke */
+public final class LukeWebMain {
+
+  private LukeWebMain() {}
+
+  static {
+LoggerFactory.initGuiLogging();
+  }
+
+  private static final Logger LOG = LoggerFactory.getLogger(LukeWebMain.class);
+
+  public static void main(String[] args) throws Exception {
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
+}
+IndexHandler indexHandler = IndexHandler.getInstance();
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
+CountDownLatch tombstone = new CountDownLatch(1);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
+httpService.start();
+tombstone.await();
+  }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress("127.0.0.1", port);

Review Comment:
   Note: updated to use `localhost` by default



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11852: Luke Webapp

2022-10-17 Thread GitBox


uschindler commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r997045001


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   Sorry. Haven't seen the update. Anyways, I'd prefer "::1" as IP address 
(says the IPv4-must-die-policeman).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


matriv commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r997068202


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes(
   }
 
   private static double nsToSec(long ns) {
-return ns / 10.0;
+return ns / (double) TimeUnit.SECONDS.toNanos(1);

Review Comment:
   Heads up for such changes in production code, could this potentially add 
overhead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk opened a new pull request, #2674: SOLR-16464: Upgrade commons-text to 1.10.0

2022-10-17 Thread GitBox


risdenk opened a new pull request, #2674:
URL: https://github.com/apache/lucene-solr/pull/2674

   https://issues.apache.org/jira/browse/SOLR-16464


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp

2022-10-17 Thread GitBox


rmuir commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r997309676


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   sorry maybe i'm unclear. i just dont think we should allow anything except 
localhost. please, let's avoid security issues of the fact that such a thing 
could read any file on the system (maybe /etc/passwd) and even leak shit about 
non-indexes via exceptions. or that it wouldn't have any TLS at all leaving it 
vulnerable to anything in-flight. or that it wouldn't have authentication, etc. 
we could always "open it up" later if we want to do that, but I really think 
there is value in not attaching ourselves to the network. again, if someone 
wants, they can place reverse proxy in front of it that will handle all these 
things better (TLS, auth, request filtering, rate-limiting, DDOS, ...)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp

2022-10-17 Thread GitBox


rmuir commented on code in PR #11852:
URL: https://github.com/apache/lucene/pull/11852#discussion_r997311191


##
lucene/luke/src/java/org/apache/lucene/luke/app/web/LukeWebMain.java:
##
@@ -17,31 +17,78 @@
 
 package org.apache.lucene.luke.app.web;
 
+import java.net.InetSocketAddress;
 import java.util.concurrent.CountDownLatch;
+import java.util.HashMap;
+import java.util.Map;
 import org.apache.lucene.luke.app.IndexHandler;
 import org.apache.lucene.luke.util.LoggerFactory;
 
 /** Entry class for web Luke */
-public class LukeWebMain {
+public final class LukeWebMain {
+
+  private LukeWebMain() {
+  }
 
   static {
 LoggerFactory.initGuiLogging();
   }
 
   public static void main(String[] args) throws Exception {
-String index = null;
-if (args.length == 2 && args[0].equals("--index")) {
-  index = args[1];
-} else {
-  System.err.println("usage: LukeWebMain --index ");
-  Runtime.getRuntime().exit(1);
+Map parsed = null;
+try {
+  parsed = parseArgs(args);
+} catch (Exception e) {
+  usage(e.getMessage());
 }
-
 IndexHandler indexHandler = IndexHandler.getInstance();
-indexHandler.open(index, "org.apache.lucene.store.FSDirectory", true, 
true, false);
+indexHandler.open(getIndex(parsed), "org.apache.lucene.store.FSDirectory", 
true, true, false);
 CountDownLatch tombstone = new CountDownLatch(1);
-HttpService httpService = new HttpService(indexHandler, tombstone);
+HttpService httpService = new HttpService(getSockAddr(parsed), 
indexHandler, tombstone);
 httpService.start();
 tombstone.await();
   }
+
+  private static String getIndex(Map args) {
+String index = (String) args.get("index");
+if (index == null) {
+  usage("index arg is required");
+}
+return index;
+  }
+
+  private static InetSocketAddress getSockAddr(Map args) {
+String host = (String) args.get("host");
+int port = (Integer) args.getOrDefault("port", 8080);
+if (host == null) {
+  return new InetSocketAddress(port);

Review Comment:
   ... security headers, compression, 
   
   just would love for us to stay out of the business of all this. and avoid 
hassles like security reports as a project.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


rmuir commented on code in PR #11857:
URL: https://github.com/apache/lucene/pull/11857#discussion_r997318423


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();

Review Comment:
   can we add a code comment above the map maybe?
   
   to me, looks like this is a cache on a per-query basis essentially? due to 
how compounds get decomposed internally we'll do a bunch of redundant work 
otherwise?  
   
   but an unbounded cache looks scary as hell, if you look at any code for the 
first time? :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk merged pull request #2674: SOLR-16464: Upgrade commons-text to 1.10.0

2022-10-17 Thread GitBox


risdenk merged PR #2674:
URL: https://github.com/apache/lucene-solr/pull/2674


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] donnerpeter commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


donnerpeter commented on code in PR #11857:
URL: https://github.com/apache/lucene/pull/11857#discussion_r997326314


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();

Review Comment:
   It's a per-`suggest`-call cache, it will be GCed afterwards. It caches the 
prefixes tried as the first parts of potential compounds. The cached `findStem` 
calls are relatively expensive, and there can be thousands of them for each 
`suggest`. If such a call returns `null`, the rest of a word won't be split for 
compounds, so caching nulls is essential.
   
   I hope this makes things clearer. Which parts of this would you like in the 
comment? :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


rmuir commented on code in PR #11857:
URL: https://github.com/apache/lucene/pull/11857#discussion_r997328633


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();

Review Comment:
   above works for me! thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


dweiss commented on code in PR #11857:
URL: https://github.com/apache/lucene/pull/11857#discussion_r997341034


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();
+
   @Override
   boolean acceptsStem(int formID) {
 return !dictionary.hasFlag(formID, dictionary.noSuggest)
 && !dictionary.hasFlag(formID, dictionary.subStandard);
   }
+
+  @Override
+  Root findStem(
+  char[] chars, int offset, int length, WordCase originalCase, 
WordContext context) {
+if (context == COMPOUND_BEGIN && originalCase == null) {
+  return compoundCache
+  .computeIfAbsent(
+  new String(chars, offset, length),
+  __ ->
+  Optional.ofNullable(super.findStem(chars, offset, 
length, null, context)))

Review Comment:
   Sorry for being dim but what's the point of storing those Optionals in the 
map? Is storing just null values somehow out of fashion? :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


dweiss commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r997342151


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes(
   }
 
   private static double nsToSec(long ns) {
-return ns / 10.0;
+return ns / (double) TimeUnit.SECONDS.toNanos(1);

Review Comment:
   If you're concerned about it, move it to a constant? I think it'll be 
negligible. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] donnerpeter commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


donnerpeter commented on code in PR #11857:
URL: https://github.com/apache/lucene/pull/11857#discussion_r997349204


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();
+
   @Override
   boolean acceptsStem(int formID) {
 return !dictionary.hasFlag(formID, dictionary.noSuggest)
 && !dictionary.hasFlag(formID, dictionary.subStandard);
   }
+
+  @Override
+  Root findStem(
+  char[] chars, int offset, int length, WordCase originalCase, 
WordContext context) {
+if (context == COMPOUND_BEGIN && originalCase == null) {
+  return compoundCache
+  .computeIfAbsent(
+  new String(chars, offset, length),
+  __ ->
+  Optional.ofNullable(super.findStem(chars, offset, 
length, null, context)))

Review Comment:
   computeIfAbsent is called even if there's an associated null value. And 
containsKey+get+put finds the hash entry several times



##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();
+
   @Override
   boolean acceptsStem(int formID) {
 return !dictionary.hasFlag(formID, dictionary.noSuggest)
 && !dictionary.hasFlag(formID, dictionary.subStandard);
   }
+
+  @Override
+  Root findStem(
+  char[] chars, int offset, int length, WordCase originalCase, 
WordContext context) {
+if (context == COMPOUND_BEGIN && originalCase == null) {
+  return compoundCache
+  .computeIfAbsent(
+  new String(chars, offset, length),
+  __ ->
+  Optional.ofNullable(super.findStem(chars, offset, 
length, null, context)))

Review Comment:
   computeIfAbsent lambda is called even if there's an associated null value. 
And containsKey+get+put finds the hash entry several times



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-17 Thread GitBox


matriv commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r997357120


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes(
   }
 
   private static double nsToSec(long ns) {
-return ns / 10.0;
+return ns / (double) TimeUnit.SECONDS.toNanos(1);

Review Comment:
   Any suggestion for a "central" place to put this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


dweiss commented on code in PR #11857:
URL: https://github.com/apache/lucene/pull/11857#discussion_r997417873


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Hunspell.java:
##
@@ -614,11 +617,27 @@ private void doSuggest(
   Runnable checkCanceled) {
 Hunspell suggestionSpeller =
 new Hunspell(dictionary, policy, checkCanceled) {
+  final Map>> compoundCache = new 
HashMap<>();
+
   @Override
   boolean acceptsStem(int formID) {
 return !dictionary.hasFlag(formID, dictionary.noSuggest)
 && !dictionary.hasFlag(formID, dictionary.subStandard);
   }
+
+  @Override
+  Root findStem(
+  char[] chars, int offset, int length, WordCase originalCase, 
WordContext context) {
+if (context == COMPOUND_BEGIN && originalCase == null) {
+  return compoundCache
+  .computeIfAbsent(
+  new String(chars, offset, length),
+  __ ->
+  Optional.ofNullable(super.findStem(chars, offset, 
length, null, context)))

Review Comment:
   Had to look it up to believe it: "If the specified key is not already 
associated with a value (or is mapped to null)". Damn, that's a non-intuitive 
spec... I've always had a clear perception that it's only the key association 
(key presence) that is validated here... and it's just been ruined. Sorry for 
the noise, @donnerpeter, LGTM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] donnerpeter merged pull request #11857: hunspell: speedup suggestions by caching speller and compound stemming requests

2022-10-17 Thread GitBox


donnerpeter merged PR #11857:
URL: https://github.com/apache/lucene/pull/11857


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-17 Thread GitBox


benwtrent commented on PR #11849:
URL: https://github.com/apache/lucene/pull/11849#issuecomment-1281604399

   @jtibshirani I confirmed that on my M1 Macbook, there is no significant 
change in QPS, I tested glove-100-angular and deep-image-96-angular


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani merged pull request #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-17 Thread GitBox


jtibshirani merged PR #11849:
URL: https://github.com/apache/lucene/pull/11849


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani opened a new issue, #11858: Lucene94HnswVectorsFormat validation fails with large datasets

2022-10-17 Thread GitBox


jtibshirani opened a new issue, #11858:
URL: https://github.com/apache/lucene/issues/11858

   I ran a test on Lucene 9.4 where I tried to force merge 2 million vectors 
with
   dimension 768. It failed with
   
   ```
   java.lang.IllegalStateException: Vector data length 3070061568 not matching
   size=999369 * dim=768 * byteSize=4 = -1224905728
   ```
   
   The problem is that we use an integer to represent the size, which is too 
small
   to hold it. The bug snuck in during the work to enable int8 values, which
   switched a long to an int: #1054. This error doesn't occur before version 
9.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on issue #11858: Lucene94HnswVectorsFormat validation fails with large datasets

2022-10-17 Thread GitBox


jtibshirani commented on issue #11858:
URL: https://github.com/apache/lucene/issues/11858#issuecomment-1281793284

   Here's the stack trace:
   ```
   java.lang.IllegalStateException: Vector data length 3070061568 not matching 
size=999369 * dim=768 * byteSize=4 = -1224905728
at 
org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.validateFieldEntry(Lucene94HnswVectorsReader.java:185)
at 
org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.readFields(Lucene94HnswVectorsReader.java:156)
at 
org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.readMetadata(Lucene94HnswVectorsReader.java:103)
at 
org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsReader.(Lucene94HnswVectorsReader.java:64)
at 
org.apache.lucene.core@9.4.0/org.apache.lucene.codecs.lucene94.Lucene94HnswVectorsFormat.fieldsReader(Lucene94HnswVectorsFormat.java:157)
   ```
   
   Thanks to @ebadyano and @benwtrent for helping me track this down so quickly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] harishankar-gopalan commented on pull request #239: LUCENE-10040: Handle deletions in nearest vector search

2022-10-17 Thread GitBox


harishankar-gopalan commented on PR #239:
URL: https://github.com/apache/lucene/pull/239#issuecomment-1281844998

   Hi @jtibshirani / @msokolov , had a quick doubt on the deletion 
implementation. From the discussion and changes I understand that for now we 
have only marked the docs as deleted with the Bits value and ignored those 
documents while selecting our candidates.
   So with this basic understanding my doubts are:
   1. Is the search space being expanded to meet the top-k criteria ? 
   2. Is there any active work going on to prune these "marked-for-delete" 
nodes in the graph i.e implementing the version of delete suggested by the 
Weaviate author(s) 
[here](https://github.com/nmslib/hnswlib/issues/4#issuecomment-678315156) ?
   
   Another unrelated doubt:
   1. How are segment merges in general handled as I suppose graphs of the 
corresponding segments also would have to be merged ? And in the same context 
does deleted nodes have any effect on the merge operations in terms of high 
time complexity for some edge cases that the users should be aware of ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org