[GitHub] [lucene] iverase commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-08-21 Thread via GitHub


iverase commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1685783623

   I am currently not planing to replace any of the usages as I am not familiar 
with them. Note that some of them encode data in big endian while 
DataOutput/DataInput uses little endian since 8.0 so there might not be 
compatible. The `SerializedDVStrategy' uses a `java.io.ByteArrayInputStream` so 
it is not a good candidate either.
   
   
   My use case is more similar to 
[ShapeDocValues](https://github.com/apache/lucene/blob/fad3108b27b7c9b9514a5b96e26295da3f7c8723/lucene/core/src/java/org/apache/lucene/document/ShapeDocValues.java#L578)
 and that would be a good candidate. I am not familiar with the implementation 
and it seems to requires some signature changes so left the implementation to 
whoever is interested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] azagniotov commented on pull request #935: LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary

2023-08-21 Thread via GitHub


azagniotov commented on PR #935:
URL: https://github.com/apache/lucene-solr/pull/935#issuecomment-1685887305

   Hello Team,
   
   May I inquire where are we on this?
   
   ### TL;DR
   
   In the meanwhile, I attempted and succeeded to build the 
[unidic-cwj-202302_full](https://clrd.ninjal.ac.jp/unidic_archive/2302/) from 
Ninjal. Here, I am using the tweaks that @johtani added in his PR three years 
ago, plus a few minor tweaks of my own. See the attached screenshot 
(**Disclaimer**: I did not test the built dictionary to tokenize text, I just 
built it)
   
   Shall I try make a new PR under https://github.com/apache/lucene in order to 
get a conversation re-started on this? cc: @mocobeta šŸ™‡šŸ¼ā€ā™€ļø 
   
   ### Build command
   
   The following has been performed on the fresh clone of 
https://github.com/apache/lucene:
   
   My build command leveraged the new Gradle setup and the 
[DictionaryBuilder](https://github.com/apache/lucene/blob/main/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/DictionaryBuilder.java)
 JavaDoc comment about how to do it.
   
   I added in `lucene/analysis/kuromoji/build.gradle` a `run` task:
   ```
   application {
 mainModule = 'org.apache.lucene.analysis.kuromoji' // name defined in 
module-info.java
 mainClass = 'org.apache.lucene.analysis.ja.dict.DictionaryBuilder'
   }
   ```
   
   My shell Gradle command is as follows which I executed under the root 
directory `lucene`, where the `gradlew` is:
   ```
   ./gradlew -p lucene/analysis/kuromoji run --args='unidic 
"/Users/azagniotov/Downloads/unidic-cwj-202302_full" 
"/Users/azagniotov/Downloads/unidic-cwj-202302_full/lucene-kuromoji-built" 
"UTF-8" false'
   ```
   
   ### Screenshot
   https://github.com/apache/lucene-solr/assets/989900/2f31f2ad-3715-4abb-9f77-0c559cea200d";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] SevenCss commented on issue #7820: CheckIndex cannot "fix" indexes that have individual segments with missing or corrupt .si files because sanity checks will fail trying to read the

2023-08-21 Thread via GitHub


SevenCss commented on issue #7820:
URL: https://github.com/apache/lucene/issues/7820#issuecomment-1685896438

   @mikemccand 
   Appreciated for your response. 
   Exactly, after i manually removed the broken one `segments_a7`, the index 
could recover successfully.  However, i'm trying to figure out a way to fix the 
problem programmatically.  Hence, I had a try with `checkindex`, but failed to 
detect the problem and fix the index. (Then, i found this issue.)
   
   I checked the log and have not found any clue that indicates OS or JVM crash 
happens. Unfortunately, we could not reproduce this issue either.  No, we did 
not deploy our index on a mounted drive. Instead, the index is deployed locally 
with my program (on windows server).  No index replication exists.   I also 
checked the code and found the comments regarding to the Windows issue ( 
https://github.com/apache/lucene/blob/releases/lucene-solr/8.8.1/lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java#L694C1-L707C4
 ). However, i'm curious that why we did not print any log, which could provide 
some hints to end user.   It seems that Windows has not plan to fix the OS 
specific issue, right? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12417: forutil add vectorized and scalar code

2023-08-21 Thread via GitHub


gsmiller commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1686776595

   @ChrisHegarty I was considering some experimentation with [vectorized prefix 
sum implementations](https://en.algorithmica.org/hpc/algorithms/prefix/), but 
saw your comment above stating:
   
   > What bothers me even more is that we cannot easily integrate the prefix 
sum calculation into the unpack - as we run into Panama bounds check issues 
that make the performance very poor.
   
   I also came across some 
[benchmarks](https://github.com/jpountz/vectorized-prefix-sum) it looks like 
you may have collaborated on with @jpountz related to some different prefix sum 
SIMD approaches.
   
   Can you elaborate any more on the performance issues related to these 
vectorized attempts? I assume the benchmark results were poor for you as well 
(I've tested on a few different machines with pretty horrid results, but I 
don't really understand why they're so bad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on pull request #12337: Index arbitrary fields in taxonomy docs

2023-08-21 Thread via GitHub


stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1686832362

   The commit I pushed makes `DirectoryTaxonomyReader.getInternalIndexReader` 
public. We also stop relying on the full path field. I’m not sure why I thought 
we needed it, we can use `getPath`/`getBulkPath` to get labels if we have the 
corresponding ordinal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12499: Simplify task executor for concurrent operations

2023-08-21 Thread via GitHub


javanna commented on PR #12499:
URL: https://github.com/apache/lucene/pull/12499#issuecomment-1686902882

   @sohami I will open a follow-up to offload single slices too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12499: Simplify task executor for concurrent operations

2023-08-21 Thread via GitHub


javanna merged PR #12499:
URL: https://github.com/apache/lucene/pull/12499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12499: Simplify task executor for concurrent operations

2023-08-21 Thread via GitHub


javanna commented on PR #12499:
URL: https://github.com/apache/lucene/pull/12499#issuecomment-1686951882

   Thanks all for looking!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor

2023-08-21 Thread via GitHub


almogtavor commented on issue #12406:
URL: https://github.com/apache/lucene/issues/12406#issuecomment-1687057323

   @romseygeek @dweiss @uschindler @dsmiley @gsmiller @javanna @benwtrent I'd 
love to get feedback from you on the subject


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #12515: Offload single slice to executor

2023-08-21 Thread via GitHub


javanna opened a new pull request, #12515:
URL: https://github.com/apache/lucene/pull/12515

   When an executor is set to the IndexSearcher, we should try and offload most 
of the computation to such executor. Ideally, the caller thread would only do 
light coordination work, and the executor is responsible for the heavier 
workload. If we don't offload sequential execution to the executor, it becomes 
very difficult to make any distinction about the type of workload performed on 
the two sides.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12499: Simplify task executor for concurrent operations

2023-08-21 Thread via GitHub


javanna commented on PR #12499:
URL: https://github.com/apache/lucene/pull/12499#issuecomment-1687122736

   @sohami here it is: #12515 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #12516: Unwrap execution exceptions cause and rethrow as is when possible

2023-08-21 Thread via GitHub


javanna opened a new pull request, #12516:
URL: https://github.com/apache/lucene/pull/12516

   When performing concurrent search, we may get an execution exception from 
one or more slices. In that case, we'd like to rethrow the cause of the 
execution exception, which we do by wrapping it into a new runtime exception. 
Instead, we can rethrow runtime exceptions as-is, and the same is true for io 
exceptions. Any other exception is still wrapped into a new runtime exception. 
This unifies the exceptions that get thrown between sequential codepath (when 
no executor is provided) and concurrent codepath (when an executor is provided).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12516: Unwrap execution exceptions cause and rethrow as is when possible

2023-08-21 Thread via GitHub


javanna commented on PR #12516:
URL: https://github.com/apache/lucene/pull/12516#issuecomment-1687152027

   Another one that you may be interested in @reta @sohami 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #12512: Remove unused variable in BKDWriter

2023-08-21 Thread via GitHub


iverase commented on PR #12512:
URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687195803

   Sure, it is probably a left over from another change. Now that we are here I 
think we should rename `scratch1` to `scratch`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on a diff in pull request #12516: Unwrap execution exceptions cause and rethrow as is when possible

2023-08-21 Thread via GitHub


reta commented on code in PR #12516:
URL: https://github.com/apache/lucene/pull/12516#discussion_r1300778596


##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -57,6 +58,12 @@ final  List invokeAll(Collection> 
tasks) {
   } catch (InterruptedException e) {
 throw new ThreadInterruptedException(e);
   } catch (ExecutionException e) {
+if (e.getCause() instanceof IOException ioException) {
+  throw ioException;

Review Comment:
   ```suggestion
 throw e.getCause();
   ```



##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -57,6 +58,12 @@ final  List invokeAll(Collection> 
tasks) {
   } catch (InterruptedException e) {
 throw new ThreadInterruptedException(e);
   } catch (ExecutionException e) {
+if (e.getCause() instanceof IOException ioException) {
+  throw ioException;

Review Comment:
   ```suggestion
 throw e.getCause();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on a diff in pull request #12516: Unwrap execution exceptions cause and rethrow as is when possible

2023-08-21 Thread via GitHub


reta commented on code in PR #12516:
URL: https://github.com/apache/lucene/pull/12516#discussion_r1300779063


##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -57,6 +58,12 @@ final  List invokeAll(Collection> 
tasks) {
   } catch (InterruptedException e) {
 throw new ThreadInterruptedException(e);
   } catch (ExecutionException e) {
+if (e.getCause() instanceof IOException ioException) {
+  throw ioException;
+}
+if (e.getCause() instanceof RuntimeException runtimeException) {
+  throw runtimeException;
+}
 throw new RuntimeException(e.getCause());

Review Comment:
   You may check for `Error` as well, to be safe:
   
   ```
   if (e.getCause() instanceof Error error) {
 throw error;
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on issue #12514: Could we add more index for BKD LeafNode?

2023-08-21 Thread via GitHub


iverase commented on issue #12514:
URL: https://github.com/apache/lucene/issues/12514#issuecomment-1687280508

   I am not sure this is the right trade off. The BKD tree was developed to 
perform efficient range queries. If your use case is to perform efficient 
`PointInSetQuery`, you might be better indexing your data using the inverted 
index as the performance should be better for this type of query.
   
   Another option might be to lower the `maxPointsInLeafNode` from 512 to a 
lower value. That might provide you a similar effect without having to 
introduce an extra data index structure. The tradeoff here will be the index 
size.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] easyice commented on pull request #12512: Remove unused variable in BKDWriter

2023-08-21 Thread via GitHub


easyice commented on PR #12512:
URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687334045

   @iverase It is a good idea, this seems clearer, I've renamed `scratch1` to 
`scratch`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #12512: Remove unused variable in BKDWriter

2023-08-21 Thread via GitHub


iverase commented on PR #12512:
URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687356896

   LGTM, Thanks @easyice !
   
   Could you please add a CHANGES entry under 9.8.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] easyice commented on pull request #12512: Remove unused variable in BKDWriter

2023-08-21 Thread via GitHub


easyice commented on PR #12512:
URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687412992

   Thanks for @iverase and @benwtrent, the CHANGES.txt has updated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org