[GitHub] [lucene] iverase commented on pull request #12460: Allow reading binary doc values as a DataInput
iverase commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1685783623 I am currently not planing to replace any of the usages as I am not familiar with them. Note that some of them encode data in big endian while DataOutput/DataInput uses little endian since 8.0 so there might not be compatible. The `SerializedDVStrategy' uses a `java.io.ByteArrayInputStream` so it is not a good candidate either. My use case is more similar to [ShapeDocValues](https://github.com/apache/lucene/blob/fad3108b27b7c9b9514a5b96e26295da3f7c8723/lucene/core/src/java/org/apache/lucene/document/ShapeDocValues.java#L578) and that would be a good candidate. I am not familiar with the implementation and it seems to requires some signature changes so left the implementation to whoever is interested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] azagniotov commented on pull request #935: LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary
azagniotov commented on PR #935: URL: https://github.com/apache/lucene-solr/pull/935#issuecomment-1685887305 Hello Team, May I inquire where are we on this? ### TL;DR In the meanwhile, I attempted and succeeded to build the [unidic-cwj-202302_full](https://clrd.ninjal.ac.jp/unidic_archive/2302/) from Ninjal. Here, I am using the tweaks that @johtani added in his PR three years ago, plus a few minor tweaks of my own. See the attached screenshot (**Disclaimer**: I did not test the built dictionary to tokenize text, I just built it) Shall I try make a new PR under https://github.com/apache/lucene in order to get a conversation re-started on this? cc: @mocobeta šš¼āāļø ### Build command The following has been performed on the fresh clone of https://github.com/apache/lucene: My build command leveraged the new Gradle setup and the [DictionaryBuilder](https://github.com/apache/lucene/blob/main/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/DictionaryBuilder.java) JavaDoc comment about how to do it. I added in `lucene/analysis/kuromoji/build.gradle` a `run` task: ``` application { mainModule = 'org.apache.lucene.analysis.kuromoji' // name defined in module-info.java mainClass = 'org.apache.lucene.analysis.ja.dict.DictionaryBuilder' } ``` My shell Gradle command is as follows which I executed under the root directory `lucene`, where the `gradlew` is: ``` ./gradlew -p lucene/analysis/kuromoji run --args='unidic "/Users/azagniotov/Downloads/unidic-cwj-202302_full" "/Users/azagniotov/Downloads/unidic-cwj-202302_full/lucene-kuromoji-built" "UTF-8" false' ``` ### Screenshot https://github.com/apache/lucene-solr/assets/989900/2f31f2ad-3715-4abb-9f77-0c559cea200d";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] SevenCss commented on issue #7820: CheckIndex cannot "fix" indexes that have individual segments with missing or corrupt .si files because sanity checks will fail trying to read the
SevenCss commented on issue #7820: URL: https://github.com/apache/lucene/issues/7820#issuecomment-1685896438 @mikemccand Appreciated for your response. Exactly, after i manually removed the broken one `segments_a7`, the index could recover successfully. However, i'm trying to figure out a way to fix the problem programmatically. Hence, I had a try with `checkindex`, but failed to detect the problem and fix the index. (Then, i found this issue.) I checked the log and have not found any clue that indicates OS or JVM crash happens. Unfortunately, we could not reproduce this issue either. No, we did not deploy our index on a mounted drive. Instead, the index is deployed locally with my program (on windows server). No index replication exists. I also checked the code and found the comments regarding to the Windows issue ( https://github.com/apache/lucene/blob/releases/lucene-solr/8.8.1/lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java#L694C1-L707C4 ). However, i'm curious that why we did not print any log, which could provide some hints to end user. It seems that Windows has not plan to fix the OS specific issue, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12417: forutil add vectorized and scalar code
gsmiller commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1686776595 @ChrisHegarty I was considering some experimentation with [vectorized prefix sum implementations](https://en.algorithmica.org/hpc/algorithms/prefix/), but saw your comment above stating: > What bothers me even more is that we cannot easily integrate the prefix sum calculation into the unpack - as we run into Panama bounds check issues that make the performance very poor. I also came across some [benchmarks](https://github.com/jpountz/vectorized-prefix-sum) it looks like you may have collaborated on with @jpountz related to some different prefix sum SIMD approaches. Can you elaborate any more on the performance issues related to these vectorized attempts? I assume the benchmark results were poor for you as well (I've tested on a few different machines with pretty horrid results, but I don't really understand why they're so bad). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita commented on pull request #12337: Index arbitrary fields in taxonomy docs
stefanvodita commented on PR #12337: URL: https://github.com/apache/lucene/pull/12337#issuecomment-1686832362 The commit I pushed makes `DirectoryTaxonomyReader.getInternalIndexReader` public. We also stop relying on the full path field. Iām not sure why I thought we needed it, we can use `getPath`/`getBulkPath` to get labels if we have the corresponding ordinal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12499: Simplify task executor for concurrent operations
javanna commented on PR #12499: URL: https://github.com/apache/lucene/pull/12499#issuecomment-1686902882 @sohami I will open a follow-up to offload single slices too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna merged pull request #12499: Simplify task executor for concurrent operations
javanna merged PR #12499: URL: https://github.com/apache/lucene/pull/12499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12499: Simplify task executor for concurrent operations
javanna commented on PR #12499: URL: https://github.com/apache/lucene/pull/12499#issuecomment-1686951882 Thanks all for looking! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor
almogtavor commented on issue #12406: URL: https://github.com/apache/lucene/issues/12406#issuecomment-1687057323 @romseygeek @dweiss @uschindler @dsmiley @gsmiller @javanna @benwtrent I'd love to get feedback from you on the subject -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #12515: Offload single slice to executor
javanna opened a new pull request, #12515: URL: https://github.com/apache/lucene/pull/12515 When an executor is set to the IndexSearcher, we should try and offload most of the computation to such executor. Ideally, the caller thread would only do light coordination work, and the executor is responsible for the heavier workload. If we don't offload sequential execution to the executor, it becomes very difficult to make any distinction about the type of workload performed on the two sides. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12499: Simplify task executor for concurrent operations
javanna commented on PR #12499: URL: https://github.com/apache/lucene/pull/12499#issuecomment-1687122736 @sohami here it is: #12515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #12516: Unwrap execution exceptions cause and rethrow as is when possible
javanna opened a new pull request, #12516: URL: https://github.com/apache/lucene/pull/12516 When performing concurrent search, we may get an execution exception from one or more slices. In that case, we'd like to rethrow the cause of the execution exception, which we do by wrapping it into a new runtime exception. Instead, we can rethrow runtime exceptions as-is, and the same is true for io exceptions. Any other exception is still wrapped into a new runtime exception. This unifies the exceptions that get thrown between sequential codepath (when no executor is provided) and concurrent codepath (when an executor is provided). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on pull request #12516: Unwrap execution exceptions cause and rethrow as is when possible
javanna commented on PR #12516: URL: https://github.com/apache/lucene/pull/12516#issuecomment-1687152027 Another one that you may be interested in @reta @sohami -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #12512: Remove unused variable in BKDWriter
iverase commented on PR #12512: URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687195803 Sure, it is probably a left over from another change. Now that we are here I think we should rename `scratch1` to `scratch`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on a diff in pull request #12516: Unwrap execution exceptions cause and rethrow as is when possible
reta commented on code in PR #12516: URL: https://github.com/apache/lucene/pull/12516#discussion_r1300778596 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -57,6 +58,12 @@ final List invokeAll(Collection> tasks) { } catch (InterruptedException e) { throw new ThreadInterruptedException(e); } catch (ExecutionException e) { +if (e.getCause() instanceof IOException ioException) { + throw ioException; Review Comment: ```suggestion throw e.getCause(); ``` ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -57,6 +58,12 @@ final List invokeAll(Collection> tasks) { } catch (InterruptedException e) { throw new ThreadInterruptedException(e); } catch (ExecutionException e) { +if (e.getCause() instanceof IOException ioException) { + throw ioException; Review Comment: ```suggestion throw e.getCause(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on a diff in pull request #12516: Unwrap execution exceptions cause and rethrow as is when possible
reta commented on code in PR #12516: URL: https://github.com/apache/lucene/pull/12516#discussion_r1300779063 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -57,6 +58,12 @@ final List invokeAll(Collection> tasks) { } catch (InterruptedException e) { throw new ThreadInterruptedException(e); } catch (ExecutionException e) { +if (e.getCause() instanceof IOException ioException) { + throw ioException; +} +if (e.getCause() instanceof RuntimeException runtimeException) { + throw runtimeException; +} throw new RuntimeException(e.getCause()); Review Comment: You may check for `Error` as well, to be safe: ``` if (e.getCause() instanceof Error error) { throw error; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on issue #12514: Could we add more index for BKD LeafNode?
iverase commented on issue #12514: URL: https://github.com/apache/lucene/issues/12514#issuecomment-1687280508 I am not sure this is the right trade off. The BKD tree was developed to perform efficient range queries. If your use case is to perform efficient `PointInSetQuery`, you might be better indexing your data using the inverted index as the performance should be better for this type of query. Another option might be to lower the `maxPointsInLeafNode` from 512 to a lower value. That might provide you a similar effect without having to introduce an extra data index structure. The tradeoff here will be the index size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] easyice commented on pull request #12512: Remove unused variable in BKDWriter
easyice commented on PR #12512: URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687334045 @iverase It is a good idea, this seems clearer, I've renamed `scratch1` to `scratch` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #12512: Remove unused variable in BKDWriter
iverase commented on PR #12512: URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687356896 LGTM, Thanks @easyice ! Could you please add a CHANGES entry under 9.8.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] easyice commented on pull request #12512: Remove unused variable in BKDWriter
easyice commented on PR #12512: URL: https://github.com/apache/lucene/pull/12512#issuecomment-1687412992 Thanks for @iverase and @benwtrent, the CHANGES.txt has updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org