[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468417#comment-17468417 ] Adrien Grand commented on LUCENE-8739: -- I updated block sizes so that ZSTD uses the same block sizes as BEST_COMPRESSION and it looks much better now. ||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs (ms) || | BEST_SPEED (LZ4 with small blocks) | 35383 | 90.175 | 190.17524 | | BEST_COMPRESSION (vanilla zlib, DEFLATE level=6) | 76671 | 58.682 | 1910.42106 | | BEST_COMPRESSION (Cloudflare zlib, DEFLATE level=6) | 54791 | 58.601 | 1395.53593 | | ZSTD dict (level=1) | 24687 | 63.324 | 928.73997 | | ZSTD dict (level=2) | 24934 | 63.722 | 977.29911 | | ZSTD dict (level=3) | 28285 | 62.072 | 938.10886 | | ZSTD dict (level=4) | 37863 | 60.427 | 969.18655 | | ZSTD dict (level=5) | 45479 | 59.317 | 941.20922 | | ZSTD dict (level=6) | 57842 | 58.481 | 881.69049 | | ZSTD dict (level=7) | 65796 | 58.107 | 886.42249 | On this dataset, the main benefit seems to be the retrieval speed. Regarding indexing times and space efficiency, either you go with level 5 and you are faster to index data but less space-efficient than DEFLATE (with the Cloudflare zlib), or you go with level 6 and you are more space-efficient but slower to index. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468482#comment-17468482 ] Tobias Ibounig commented on LUCENE-8739: Would it make sense to increase the block size until retrieval times approach those of zlib (between CF and vanilla)? Would such an increase even make sense or would this cause other issues? Then there also could be 3 presets BEST_SPEED --> stays LZ4 BALANCED --> low level ZSTD + dict (maybe even slightly smaller block size, for slightly faster retrial) BEST_COMPRESSION --> ZSTD with higher block size and higher level (maybe 5-9) Or would 3 presets be too much choice? Anyway I see potential for good tradeoffs here. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1004639920 > I've been wondering what to do with the security manager. If we're on Java 17, where it's officially deprecated, perhaps we should just get rid of it? I know it sounds terrible but it's inevitable in the long run anyway - if main is the future version then perhaps it's the time to commit to such a decision? Hi, I was thinking about the same thing. But we should not doing this on this Commit. So let's start with Suppression and later decide how to fix the test sandbox. I agree, we should remove the AccessController parts in productive code. But one step after the other (first fix tests then remove). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
Uwe Schindler created LUCENE-10352: -- Summary: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system Key: LUCENE-10352 URL: https://issues.apache.org/jira/browse/LUCENE-10352 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Uwe Schindler Assignee: Uwe Schindler Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the analysis-commons module, but e.g. we do not do a random chain with kuromoji and ICU. Also both tests rely on some hacky classpath-inspection and the tests fail if ran on a JAR file. This issue tracks progress I am currently doing to refactor this: - Move those 2 classes to a new gradle subproject :lucene:analysis:integration.tests and add a module-info referring to all other analysis packages - Rewrite the class discovery to use ModuleReader - Run TestAllAnalyzersHaveFactories per module (using one module reader), so it discovers all classes and ensures that factory and stream are in same module (there are some core vs. analysis.common discrepancies) - RunTestRandomChains on the whole module graph. The classes are discovered from all module readers in the graph (filtering on module name starting with "org.apache.lucene.analysis." - Also compare that the SPI factories returned by discovery match those we have in the module graphs While doing this I disovered some bad things: - TestRandomChains depends on test-only resources. We may need to replicate those (it is about 5 files that are fed into the ctors) - We have 5 different StrickMockResourceLoaders: Originally it was only in analysis common, now its everywhere. I will move this class to test-framework. This is unrelated but can be done here. The background of this was that analysis factories and resource loaders were not part of lucene core, so the resourceloader interface couldn't be in test-framework. You can watch progress here: [https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains] (it currently is a branch against LUCENE-10348 because it was not yet merged, but it depends on that one) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468498#comment-17468498 ] Adrien Grand commented on LUCENE-8739: -- bq. Would such an increase even make sense or would this cause other issues? It would require reading more data from disk. This read would be sequential so I suspect it wouldn't hurt much, including on slower I/O. The main drawback is probably that it would trash a bit more of filesystem cache. That said I agree with you that we should probably look into increasing the block size with ZStandard. I just did a run with 1.5x larger blocks and level=6, it slightly outperforms our current BEST_COMPRESSION mode across indexing time, disk usage and compression. ||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs (ms) || | ZSTD dict level=6 1.5x larger blocks | 43228 | 57.455 | 1269.22127 | bq. Or would 3 presets be too much choice? IMO it would be too much, but I like the fact that ZSTD could help us have two options for compression that share the exact same read logic, e.g. if we replaced BEST_SPEED with what you suggested for BALANCED: low level ZSTD compression with a small block size. bq. Anyway I see potential for good tradeoffs here. +1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, because lucene-core shouldn't have dependencies and we don't want to use JNI in the lucere-core build. Maybe we can reconsider when Project Panama lands and it gets easier to interact with native libraries. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
jpountz commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1004649045 Sorry @uschindler I should have made it clear that this PR was not passing precommit checks yet. Agreed on handling the SecurityManager changes in a separate PR. I only fixed usage of Long/Integer constructors because it looked easier than I had thought initially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #581: LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader
uschindler commented on pull request #581: URL: https://github.com/apache/lucene/pull/581#issuecomment-1004649040 > Change looks good to me. I like that tests are improving. We may wish to someday migrate the `TestAllAnalyzerHaveFactories` and maybe even `TestRandomChains` to the modular layer. This way we know for sure all the things really work with the module system. But that's for another day. I started https://issues.apache.org/jira/browse/LUCENE-10352, you can watch progress here: https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468504#comment-17468504 ] Uwe Schindler commented on LUCENE-10352: You can watch progress here: [https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains] (it currently is a branch against LUCENE-10348 because it was not yet merged, but it depends on that one) > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StrickMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468505#comment-17468505 ] Uwe Schindler commented on LUCENE-10352: CC [~rmuir] > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StrickMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10352: --- Description: Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the analysis-commons module, but e.g. we do not do a random chain with kuromoji and ICU. Also both tests rely on some hacky classpath-inspection and the tests fail if ran on a JAR file. This issue tracks progress I am currently doing to refactor this: - Move those 2 classes to a new gradle subproject :lucene:analysis:integration.tests and add a module-info referring to all other analysis packages - Rewrite the class discovery to use ModuleReader - Run TestAllAnalyzersHaveFactories per module (using one module reader), so it discovers all classes and ensures that factory and stream are in same module (there are some core vs. analysis.common discrepancies) - RunTestRandomChains on the whole module graph. The classes are discovered from all module readers in the graph (filtering on module name starting with "org.apache.lucene.analysis." - Also compare that the SPI factories returned by discovery match those we have in the module graphs While doing this I disovered some bad things: - TestRandomChains depends on test-only resources. We may need to replicate those (it is about 5 files that are fed into the ctors) - We have 5 different StrickMockResourceLoaders: Originally it was only in analysis common, now its everywhere. I will move this class to test-framework. This is unrelated but can be done here. The background of this was that analysis factories and resource loaders were not part of lucene core, so the resourceloader interface couldn't be in test-framework. was: Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the analysis-commons module, but e.g. we do not do a random chain with kuromoji and ICU. Also both tests rely on some hacky classpath-inspection and the tests fail if ran on a JAR file. This issue tracks progress I am currently doing to refactor this: - Move those 2 classes to a new gradle subproject :lucene:analysis:integration.tests and add a module-info referring to all other analysis packages - Rewrite the class discovery to use ModuleReader - Run TestAllAnalyzersHaveFactories per module (using one module reader), so it discovers all classes and ensures that factory and stream are in same module (there are some core vs. analysis.common discrepancies) - RunTestRandomChains on the whole module graph. The classes are discovered from all module readers in the graph (filtering on module name starting with "org.apache.lucene.analysis." - Also compare that the SPI factories returned by discovery match those we have in the module graphs While doing this I disovered some bad things: - TestRandomChains depends on test-only resources. We may need to replicate those (it is about 5 files that are fed into the ctors) - We have 5 different StrickMockResourceLoaders: Originally it was only in analysis common, now its everywhere. I will move this class to test-framework. This is unrelated but can be done here. The background of this was that analysis factories and resource loaders were not part of lucene core, so the resourceloader interface couldn't be in test-framework. You can watch progress here: [https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains] (it currently is a branch against LUCENE-10348 because it was not yet merged, but it depends on that one) > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > -
[jira] [Updated] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10352: --- Description: Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the analysis-commons module, but e.g. we do not do a random chain with kuromoji and ICU. Also both tests rely on some hacky classpath-inspection and the tests fail if ran on a JAR file. This issue tracks progress I am currently doing to refactor this: - Move those 2 classes to a new gradle subproject :lucene:analysis:integration.tests and add a module-info referring to all other analysis packages - Rewrite the class discovery to use ModuleReader - Run TestAllAnalyzersHaveFactories per module (using one module reader), so it discovers all classes and ensures that factory and stream are in same module (there are some core vs. analysis.common discrepancies) - RunTestRandomChains on the whole module graph. The classes are discovered from all module readers in the graph (filtering on module name starting with "org.apache.lucene.analysis." - Also compare that the SPI factories returned by discovery match those we have in the module graphs While doing this I disovered some bad things: - TestRandomChains depends on test-only resources. We may need to replicate those (it is about 5 files that are fed into the ctors) - We have 5 different StringMockResourceLoaders: Originally it was only in analysis common, now its everywhere. I will move this class to test-framework. This is unrelated but can be done here. The background of this was that analysis factories and resource loaders were not part of lucene core, so the resourceloader interface couldn't be in test-framework. was: Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the analysis-commons module, but e.g. we do not do a random chain with kuromoji and ICU. Also both tests rely on some hacky classpath-inspection and the tests fail if ran on a JAR file. This issue tracks progress I am currently doing to refactor this: - Move those 2 classes to a new gradle subproject :lucene:analysis:integration.tests and add a module-info referring to all other analysis packages - Rewrite the class discovery to use ModuleReader - Run TestAllAnalyzersHaveFactories per module (using one module reader), so it discovers all classes and ensures that factory and stream are in same module (there are some core vs. analysis.common discrepancies) - RunTestRandomChains on the whole module graph. The classes are discovered from all module readers in the graph (filtering on module name starting with "org.apache.lucene.analysis." - Also compare that the SPI factories returned by discovery match those we have in the module graphs While doing this I disovered some bad things: - TestRandomChains depends on test-only resources. We may need to replicate those (it is about 5 files that are fed into the ctors) - We have 5 different StrickMockResourceLoaders: Originally it was only in analysis common, now its everywhere. I will move this class to test-framework. This is unrelated but can be done here. The background of this was that analysis factories and resource loaders were not part of lucene core, so the resourceloader interface couldn't be in test-framework. > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned
[GitHub] [lucene] gf2121 commented on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()
gf2121 commented on pull request #578: URL: https://github.com/apache/lucene/pull/578#issuecomment-1004760012 Indeed, I am very suprised to see such a big QPS improvement just by removing some null check and method call. It is also interesting that BrowseMonthTaxoFacets can be much faster then BrowseMonthSSDVFacets now but I do think they should have similar QPS... I noticed that the stddev is also becoming much higher, so I wonder if there is a amazing optimization in JVM that is triggered from time to time after this change? It seems delicious to find this out and try to make it stable. To double check, I rerun a wikimediumall benchmark, getting a similar result (higher QPS & higher stddev) ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets4.67 (9.8%)4.50 (6.0%) -3.6% ( -17% - 13%) 0.161 BrowseRandomLabelSSDVFacets3.09 (5.6%)3.03 (4.6%) -2.0% ( -11% -8%) 0.211 HighPhrase 15.15 (3.3%) 14.84 (3.8%) -2.0% ( -8% -5%) 0.074 LowPhrase 37.67 (3.2%) 36.95 (3.6%) -1.9% ( -8% -5%) 0.080 LowTerm 1671.58 (4.1%) 1644.34 (4.8%) -1.6% ( -10% -7%) 0.245 HighTerm 1210.15 (5.5%) 1191.05 (7.5%) -1.6% ( -13% - 12%) 0.448 BrowseDayOfYearSSDVFacets4.19 (6.9%)4.12 (5.5%) -1.5% ( -13% - 11%) 0.436 LowSloppyPhrase 11.70 (3.4%) 11.52 (1.7%) -1.5% ( -6% -3%) 0.069 MedPhrase 17.15 (2.5%) 16.90 (3.6%) -1.5% ( -7% -4%) 0.140 HighSloppyPhrase8.52 (5.1%)8.40 (3.2%) -1.4% ( -9% -7%) 0.287 AndHighMedDayTaxoFacets 45.58 (2.8%) 44.97 (2.2%) -1.3% ( -6% -3%) 0.093 OrNotHighLow 999.69 (3.1%) 987.25 (3.6%) -1.2% ( -7% -5%) 0.242 AndHighHighDayTaxoFacets 11.16 (3.2%) 11.02 (3.8%) -1.2% ( -8% -6%) 0.271 AndHighLow 662.82 (2.5%) 655.64 (3.0%) -1.1% ( -6% -4%) 0.212 Respell 41.99 (2.7%) 41.58 (3.2%) -1.0% ( -6% -5%) 0.290 OrHighNotMed 1074.76 (4.8%) 1065.13 (6.7%) -0.9% ( -11% - 11%) 0.626 MedSloppyPhrase6.08 (3.4%)6.02 (2.0%) -0.9% ( -6% -4%) 0.317 AndHighHigh 30.48 (4.4%) 30.26 (4.2%) -0.7% ( -8% -8%) 0.588 OrHighLow 451.34 (2.9%) 448.36 (4.2%) -0.7% ( -7% -6%) 0.562 OrNotHighHigh 807.68 (4.5%) 803.96 (5.2%) -0.5% ( -9% -9%) 0.764 LowIntervalsOrdered 21.59 (5.9%) 21.49 (5.6%) -0.5% ( -11% - 11%) 0.804 AndHighMed 82.97 (3.9%) 82.62 (4.1%) -0.4% ( -8% -7%) 0.737 PKLookup 196.30 (4.4%) 195.53 (4.0%) -0.4% ( -8% -8%) 0.769 OrHighNotHigh 815.15 (4.4%) 812.52 (5.8%) -0.3% ( -10% - 10%) 0.843 OrNotHighMed 1132.97 (3.6%) 1130.84 (6.7%) -0.2% ( -10% - 10%) 0.912 MedIntervalsOrdered 18.77 (9.9%) 18.74 (9.2%) -0.2% ( -17% - 21%) 0.958 OrHighHigh 15.86 (5.1%) 15.88 (3.5%)0.1% ( -8% -9%) 0.916 Fuzzy1 63.19 (2.6%) 63.33 (2.8%)0.2% ( -5% -5%) 0.799 HighIntervalsOrdered4.97 (10.2%)4.98 (9.5%)0.2% ( -17% - 22%) 0.938 OrHighMed 43.11 (4.8%) 43.25 (3.8%)0.3% ( -7% -9%) 0.819 IntNRQ 47.52 (27.6%) 47.67 (29.2%)0.3% ( -44% - 79%) 0.971 Fuzzy2 74.23 (2.7%) 74.48 (3.3%)0.3% ( -5% -6%) 0.722 LowSpanNear 85.95 (3.7%) 86.34 (4.5%)0.5% ( -7% -9%) 0.731 OrHighMedDayTaxoFacets1.91 (6.9%)1.93 (8.0%)1.1% ( -12% - 17%) 0.655 OrHighNotLow 1181.50 (3.9%) 1194.56 (6.5%)1.1% ( -8% - 11%) 0.511 MedTerm 1734.50 (4.6%) 1754.47 (5.9%)1.2% ( -8% - 12%) 0.4
[GitHub] [lucene] gf2121 removed a comment on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()
gf2121 removed a comment on pull request #578: URL: https://github.com/apache/lucene/pull/578#issuecomment-1004760012 Indeed, I am very suprised to see such a big QPS improvement just by removing some null check and method call. It is also interesting that BrowseMonthTaxoFacets can be much faster then BrowseMonthSSDVFacets now but I do think they should have similar QPS... I noticed that the stddev is also becoming much higher, so I wonder if there is a amazing optimization in JVM that is triggered from time to time after this change? It seems delicious to find this out and try to make it stable. To double check, I rerun a wikimediumall benchmark, getting a similar result (higher QPS & higher stddev) ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets4.67 (9.8%)4.50 (6.0%) -3.6% ( -17% - 13%) 0.161 BrowseRandomLabelSSDVFacets3.09 (5.6%)3.03 (4.6%) -2.0% ( -11% -8%) 0.211 HighPhrase 15.15 (3.3%) 14.84 (3.8%) -2.0% ( -8% -5%) 0.074 LowPhrase 37.67 (3.2%) 36.95 (3.6%) -1.9% ( -8% -5%) 0.080 LowTerm 1671.58 (4.1%) 1644.34 (4.8%) -1.6% ( -10% -7%) 0.245 HighTerm 1210.15 (5.5%) 1191.05 (7.5%) -1.6% ( -13% - 12%) 0.448 BrowseDayOfYearSSDVFacets4.19 (6.9%)4.12 (5.5%) -1.5% ( -13% - 11%) 0.436 LowSloppyPhrase 11.70 (3.4%) 11.52 (1.7%) -1.5% ( -6% -3%) 0.069 MedPhrase 17.15 (2.5%) 16.90 (3.6%) -1.5% ( -7% -4%) 0.140 HighSloppyPhrase8.52 (5.1%)8.40 (3.2%) -1.4% ( -9% -7%) 0.287 AndHighMedDayTaxoFacets 45.58 (2.8%) 44.97 (2.2%) -1.3% ( -6% -3%) 0.093 OrNotHighLow 999.69 (3.1%) 987.25 (3.6%) -1.2% ( -7% -5%) 0.242 AndHighHighDayTaxoFacets 11.16 (3.2%) 11.02 (3.8%) -1.2% ( -8% -6%) 0.271 AndHighLow 662.82 (2.5%) 655.64 (3.0%) -1.1% ( -6% -4%) 0.212 Respell 41.99 (2.7%) 41.58 (3.2%) -1.0% ( -6% -5%) 0.290 OrHighNotMed 1074.76 (4.8%) 1065.13 (6.7%) -0.9% ( -11% - 11%) 0.626 MedSloppyPhrase6.08 (3.4%)6.02 (2.0%) -0.9% ( -6% -4%) 0.317 AndHighHigh 30.48 (4.4%) 30.26 (4.2%) -0.7% ( -8% -8%) 0.588 OrHighLow 451.34 (2.9%) 448.36 (4.2%) -0.7% ( -7% -6%) 0.562 OrNotHighHigh 807.68 (4.5%) 803.96 (5.2%) -0.5% ( -9% -9%) 0.764 LowIntervalsOrdered 21.59 (5.9%) 21.49 (5.6%) -0.5% ( -11% - 11%) 0.804 AndHighMed 82.97 (3.9%) 82.62 (4.1%) -0.4% ( -8% -7%) 0.737 PKLookup 196.30 (4.4%) 195.53 (4.0%) -0.4% ( -8% -8%) 0.769 OrHighNotHigh 815.15 (4.4%) 812.52 (5.8%) -0.3% ( -10% - 10%) 0.843 OrNotHighMed 1132.97 (3.6%) 1130.84 (6.7%) -0.2% ( -10% - 10%) 0.912 MedIntervalsOrdered 18.77 (9.9%) 18.74 (9.2%) -0.2% ( -17% - 21%) 0.958 OrHighHigh 15.86 (5.1%) 15.88 (3.5%)0.1% ( -8% -9%) 0.916 Fuzzy1 63.19 (2.6%) 63.33 (2.8%)0.2% ( -5% -5%) 0.799 HighIntervalsOrdered4.97 (10.2%)4.98 (9.5%)0.2% ( -17% - 22%) 0.938 OrHighMed 43.11 (4.8%) 43.25 (3.8%)0.3% ( -7% -9%) 0.819 IntNRQ 47.52 (27.6%) 47.67 (29.2%)0.3% ( -44% - 79%) 0.971 Fuzzy2 74.23 (2.7%) 74.48 (3.3%)0.3% ( -5% -6%) 0.722 LowSpanNear 85.95 (3.7%) 86.34 (4.5%)0.5% ( -7% -9%) 0.731 OrHighMedDayTaxoFacets1.91 (6.9%)1.93 (8.0%)1.1% ( -12% - 17%) 0.655 OrHighNotLow 1181.50 (3.9%) 1194.56 (6.5%)1.1% ( -8% - 11%) 0.511 MedTerm 1734.50 (4.6%) 1754.47 (5.9%)1.2% ( -8% -
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468610#comment-17468610 ] Tobias Ibounig commented on LUCENE-8739: Ok this all sounds very good. Just one more thing for further trade off considerations: ZSTD also supports negative compression levels (but I don't know how those are exposed in JNI library), [see benchmark table|https://github.com/facebook/zstd]. So level=-1 could be another consideration to get closer to LZ4 Retrieval Speed for BEST_SPEED. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1.5h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #581: LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader
uschindler merged pull request #581: URL: https://github.com/apache/lucene/pull/581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10348) Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader
[ https://issues.apache.org/jira/browse/LUCENE-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468644#comment-17468644 ] ASF subversion and git services commented on LUCENE-10348: -- Commit 4bacf93c7ee925fe42bde52af801ed6f494afdd4 in lucene's branch refs/heads/main from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4bacf93 ] LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader (#581) > Make stopwords resources from analyzers modules visible to > ClasspathResourceLoader and ModuleResourceLoader > --- > > Key: LUCENE-10348 > URL: https://issues.apache.org/jira/browse/LUCENE-10348 > Project: Lucene - Core > Issue Type: New Feature > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > This is a followup after LUCENE-10335: > In LUCENE-10335 we fixed all Analyzers static constructors to not rely on the > analysis modules to export stopwords files. Nevertheless, it is currently > impossible to use CustomAnalyzer and refer to stopwords files in the > analyzers module, because those are hidden, [~rmuir] showed some examples of > how this is uesed. This would also make it impossible for Solr to reuse the > resources files! > Basically we should make the stopwords files (and other resources) visible. > We have to options: > - open all packages with resources unconditionally > - open all packages with resources only to module "org.apache.lucene.core" > I prefer the second one. The problem with Java's module system is the > following: If you open a package, it is also open for (deep) reflection, > which we don't want. So my proposal is to only open those packages only to > lucene core, so the ModuleResourceLoader and ClasspathResourceLoader can see > those resources (and Solr would be able to delegate to these resource loaders > if needed). > I will adds a test in the distribution.tests package that verifies that > resources in all analysis packages are open. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10348) Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader
[ https://issues.apache.org/jira/browse/LUCENE-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468647#comment-17468647 ] ASF subversion and git services commented on LUCENE-10348: -- Commit d17c54b5ff56ffadd8692e45629215eaf479d8e3 in lucene's branch refs/heads/branch_9x from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d17c54b ] LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader (#581) > Make stopwords resources from analyzers modules visible to > ClasspathResourceLoader and ModuleResourceLoader > --- > > Key: LUCENE-10348 > URL: https://issues.apache.org/jira/browse/LUCENE-10348 > Project: Lucene - Core > Issue Type: New Feature > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This is a followup after LUCENE-10335: > In LUCENE-10335 we fixed all Analyzers static constructors to not rely on the > analysis modules to export stopwords files. Nevertheless, it is currently > impossible to use CustomAnalyzer and refer to stopwords files in the > analyzers module, because those are hidden, [~rmuir] showed some examples of > how this is uesed. This would also make it impossible for Solr to reuse the > resources files! > Basically we should make the stopwords files (and other resources) visible. > We have to options: > - open all packages with resources unconditionally > - open all packages with resources only to module "org.apache.lucene.core" > I prefer the second one. The problem with Java's module system is the > following: If you open a package, it is also open for (deep) reflection, > which we don't want. So my proposal is to only open those packages only to > lucene core, so the ModuleResourceLoader and ClasspathResourceLoader can see > those resources (and Solr would be able to delegate to these resource loaders > if needed). > I will adds a test in the distribution.tests package that verifies that > resources in all analysis packages are open. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo
uschindler opened a new pull request #582: URL: https://github.com/apache/lucene/pull/582 See: https://issues.apache.org/jira/browse/LUCENE-10352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778117015 ## File path: gradle/java/modules.gradle ## @@ -214,7 +214,7 @@ allprojects { } // Configure (tasks.test, sourceSets.test) -tasks.matching { it.name == "test" }.all { Test task -> +tasks.matching { it.name ==~ /test(_[0-9]+)?/ }.all { Test task -> Review comment: Hi @dweiss, this was needed to make the `gradlew beast` working. I just patched this here. If you have a better way to fix this (maybe select tests based on type not name) just commit to main. It may also be that you have rewritten it anyways in your PR. This is just to inform you that something was missing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778121113 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/da_UTF8.xml ## @@ -0,0 +1,1208 @@ + Review comment: Tis file was copied from common's tests. Maybe we can use a more simplified one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778121113 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/da_UTF8.xml ## @@ -0,0 +1,1208 @@ + Review comment: This file was copied from common's tests. Maybe we can use a more simplified one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778121625 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/simple.aff ## @@ -0,0 +1,20 @@ +SET UTF-8 Review comment: This file was copied from common's tests. ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/simple.dic ## @@ -0,0 +1,11 @@ +9 Review comment: This file was copied from common's tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10348) Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader
[ https://issues.apache.org/jira/browse/LUCENE-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-10348. Fix Version/s: 9.1 10.0 (main) Resolution: Fixed > Make stopwords resources from analyzers modules visible to > ClasspathResourceLoader and ModuleResourceLoader > --- > > Key: LUCENE-10348 > URL: https://issues.apache.org/jira/browse/LUCENE-10348 > Project: Lucene - Core > Issue Type: New Feature > Components: core/other >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Fix For: 9.1, 10.0 (main) > > Time Spent: 50m > Remaining Estimate: 0h > > This is a followup after LUCENE-10335: > In LUCENE-10335 we fixed all Analyzers static constructors to not rely on the > analysis modules to export stopwords files. Nevertheless, it is currently > impossible to use CustomAnalyzer and refer to stopwords files in the > analyzers module, because those are hidden, [~rmuir] showed some examples of > how this is uesed. This would also make it impossible for Solr to reuse the > resources files! > Basically we should make the stopwords files (and other resources) visible. > We have to options: > - open all packages with resources unconditionally > - open all packages with resources only to module "org.apache.lucene.core" > I prefer the second one. The problem with Java's module system is the > following: If you open a package, it is also open for (deep) reflection, > which we don't want. So my proposal is to only open those packages only to > lucene core, so the ModuleResourceLoader and ClasspathResourceLoader can see > those resources (and Solr would be able to delegate to these resource loaders > if needed). > I will adds a test in the distribution.tests package that verifies that > resources in all analysis packages are open. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468664#comment-17468664 ] Uwe Schindler commented on LUCENE-10352: Here is now a first PR: https://github.com/apache/lucene/pull/582 - It compiles and I only had to copy 3 test resource files. We may simplify them, so the modular integration test can use a limited set of resources. - The StringMockResourceLoader was deduplicated and moved to test-framework - A unused class inside TestRandomChains used only by TestBugInSomething was moved - The available snowball stemmers are also discovered from module system with same code. This spares to copy the stemmers.txt file (which is autogenerated and I did not want to change this) What's broken: - A few components have no factory. The problem was that tests were only running in common, not all modules - We need more argument providers for TestRandomChains, e.g. for ICU or Morphologic Ctors. To do a quick test, you can restrict the module discovery to only read org.apache.lucene.core and org.apache.lucene.analysis.common. I marked a place to comment in/out code > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778140193 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.tests; + +import java.io.IOException; +import java.lang.module.ResolvedModule; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.SortedMap; +import java.util.TreeMap; +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.Assert; + +/** Discovers all classes from the module graph and loads them (without initialization) */ +abstract class ModuleClassDiscovery { + + private static final Module THIS_MODULE = ModuleClassDiscovery.class.getModule(); + private static final ModuleLayer LAYER = THIS_MODULE.getLayer(); + private static final SortedMap ALL_ANALYSIS_MODULES; + + static { +Assert.assertTrue( +"Analysis integration tests must run in Java Module System as named module", +THIS_MODULE.isNamed()); +Assert.assertNotNull("Module layer is missing", LAYER); + +var mods = new TreeMap(); +discoverAnalysisModules(LAYER, mods); +ALL_ANALYSIS_MODULES = Collections.unmodifiableSortedMap(mods); +if (LuceneTestCase.VERBOSE) { + System.out.println( + "Discovered the following analysis modules: " + ALL_ANALYSIS_MODULES.keySet()); +} + } + + private static void discoverAnalysisModules( + ModuleLayer layer, Map result) { +for (var mod : layer.configuration().modules()) { + String name = mod.name(); + if (Objects.equals(name, THIS_MODULE.getName())) { +continue; + } + if (name.equals("org.apache.lucene.core") || name.equals("org.apache.lucene.analysis.common")) { + //nocommit (enable all modules): if (name.equals("org.apache.lucene.core") || name.startsWith("org.apache.lucene.analysis.")) { Review comment: enable all modules here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10328: - Attachment: image-2022-01-04-16-04-56-563.png > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 4.5h > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468676#comment-17468676 ] Dawid Weiss commented on LUCENE-10328: -- I've been working on this for too long and I am nowhere nearer a perfect solution so I think we should just merge what's currently done on PR 571. Here is what is currently implemented: !image-2022-01-04-16-04-56-563.png|width=589,height=532! The diagram above shows two of gradle's main source sets: the main source set (a project's "jar") and the test source set (test classes). In theory (and often in practice) there can be more source sets but let's limit ourselves to just this. Each source set can be class-based or modular (containing a top-level module-info.java). On an intersection of the matching option you can see how this particular arrangement of the main and test source set is currently resolved for compilation and runtime tasks. In short: we do have full support for either full classpath mode testing (right lower corner) or full modular tests (upper left corner). A mix of the two requires module patching magic that I simply don't believe the infrastructure (or people) are ready for. I couldn't get IDEs to work with those options, I couldn't get other tasks and dependencies like ECJ to parse them gracefully. I am exhausted looking through various options and, in all honesty, I don't think this bit is so much needed. Let me explain. The current way tests work is fine in my opinion. They work in classpath mode (so they can access package-scope, etc.) with the main source set also working in "classpath mode". This is what majority of downstream Lucene consumers will be using for a long time in the future (my guess) so leaving tests there seems like a reasonable option. For any tests requiring a full modular setup, we also have a working infrastructure - you have to create a separate subproject with a suffix ".tests", then create a fully modular test source set referencing the module you're interested in testing. Examples are provided in the form of "lucene/core.tests" and "lucene/analysis/morfologik.tests" - these are fully fledged modular tests, running on of the test-framework (module, yay!). I realize this is a bit inconvenient but if you want the upper right corner of the diagram to work I have to warn you I've tried all the options below and none of them worked well (again: IDE support, various other build tasks): * main source set patching, * test source set patching, * combining main+test into a separate artifact, * plus perhaps a gazillion other try-reverse attempts I made. > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 4.5h > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module
rmuir commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1004897909 This is gonna be fun :) Thanks for starting this! It is almost guaranteed to find some bugs. Previously only `analyzers-common` got this level of testing. I will help wrestle these tests some this evening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module
dweiss commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1004900949 @uschindler - can you take a look at #571? Or we can merge that branch here too. I've converted non-distribution .tests projects there to utilize the test framework and generally improved the handling of modular paths (even though the module patching defeated me). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs
mayya-sharipova commented on a change in pull request #580: URL: https://github.com/apache/lucene/pull/580#discussion_r778170401 ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -559,6 +561,12 @@ public void testDeleteAllVectorDocs() throws Exception { VectorValues values = getOnlyLeafReader(r).getVectorValues("v"); assertNotNull(values); assertEquals(0, values.size()); + +// assert that knn search doesn't fail on a field with all deleted docs +IndexSearcher searcher = newSearcher(r); Review comment: @jtibshirani Thanks for the review. I've adjusted the test in b4dfe087bf4a2f8686c1d90f1f658389b94f3f1f -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check
dweiss commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778171354 ## File path: lucene/analysis/integration.tests/src/test/module-info.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Test module for global integration tests of all {@code org.apache.lucene.analysis} + * packages/modules. + */ +@SuppressWarnings({"requires-automatic"}) +module org.apache.lucene.analysis.tests { + requires java.xml; + requires org.apache.lucene.core; + requires org.apache.lucene.analysis.common; + requires org.apache.lucene.analysis.icu; + requires org.apache.lucene.analysis.kuromoji; + requires org.apache.lucene.analysis.morfologik; + requires org.apache.lucene.analysis.nori; + requires org.apache.lucene.analysis.opennlp; + requires org.apache.lucene.analysis.phonetic; + requires org.apache.lucene.analysis.smartcn; + requires org.apache.lucene.analysis.stempel; + requires org.apache.lucene.test_framework; + requires junit; Review comment: Nice! I made the test_framework expose this as a transitive module (and randomized testing). It's the API layer of that module and makes other descriptors simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778140193 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.tests; + +import java.io.IOException; +import java.lang.module.ResolvedModule; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.SortedMap; +import java.util.TreeMap; +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.Assert; + +/** Discovers all classes from the module graph and loads them (without initialization) */ +abstract class ModuleClassDiscovery { + + private static final Module THIS_MODULE = ModuleClassDiscovery.class.getModule(); + private static final ModuleLayer LAYER = THIS_MODULE.getLayer(); + private static final SortedMap ALL_ANALYSIS_MODULES; + + static { +Assert.assertTrue( +"Analysis integration tests must run in Java Module System as named module", +THIS_MODULE.isNamed()); +Assert.assertNotNull("Module layer is missing", LAYER); + +var mods = new TreeMap(); +discoverAnalysisModules(LAYER, mods); +ALL_ANALYSIS_MODULES = Collections.unmodifiableSortedMap(mods); +if (LuceneTestCase.VERBOSE) { + System.out.println( + "Discovered the following analysis modules: " + ALL_ANALYSIS_MODULES.keySet()); +} + } + + private static void discoverAnalysisModules( + ModuleLayer layer, Map result) { +for (var mod : layer.configuration().modules()) { + String name = mod.name(); + if (Objects.equals(name, THIS_MODULE.getName())) { +continue; + } + if (name.equals("org.apache.lucene.core") || name.equals("org.apache.lucene.analysis.common")) { + //nocommit (enable all modules): if (name.equals("org.apache.lucene.core") || name.startsWith("org.apache.lucene.analysis.")) { Review comment: enable all modules here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778172990 ## File path: lucene/analysis/integration.tests/src/test/module-info.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Test module for global integration tests of all {@code org.apache.lucene.analysis} + * packages/modules. + */ +@SuppressWarnings({"requires-automatic"}) +module org.apache.lucene.analysis.tests { + requires java.xml; + requires org.apache.lucene.core; + requires org.apache.lucene.analysis.common; + requires org.apache.lucene.analysis.icu; + requires org.apache.lucene.analysis.kuromoji; + requires org.apache.lucene.analysis.morfologik; + requires org.apache.lucene.analysis.nori; + requires org.apache.lucene.analysis.opennlp; + requires org.apache.lucene.analysis.phonetic; + requires org.apache.lucene.analysis.smartcn; + requires org.apache.lucene.analysis.stempel; + requires org.apache.lucene.test_framework; + requires junit; Review comment: Ok I wanted to cleanup the requires clauses at end. We may need to add more transitive stuff, especially when we expose APIs or required modules, otherwise we will get "exports" warnings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo
uschindler commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1004909845 > @uschindler - can you take a look at #571? Or we can merge that branch here too. I've converted non-distribution .tests projects there to utilize the test framework and generally improved the handling of modular paths (even though the module patching defeated me). I will check later, I am a bit busy with argument providers now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check
dweiss commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778173223 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.tests; + +import java.io.IOException; +import java.lang.module.ResolvedModule; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.SortedMap; +import java.util.TreeMap; +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.Assert; + +/** Discovers all classes from the module graph and loads them (without initialization) */ +abstract class ModuleClassDiscovery { + + private static final Module THIS_MODULE = ModuleClassDiscovery.class.getModule(); + private static final ModuleLayer LAYER = THIS_MODULE.getLayer(); + private static final SortedMap ALL_ANALYSIS_MODULES; + + static { Review comment: it'd be great if this wasn't a static final but a lazily-computed static - static finals are, from test framework point of view, sort of evil because they cannot be ordered reasonably with respect to test execution lifecycle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check
dweiss commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778176703 ## File path: lucene/analysis/integration.tests/src/test/module-info.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Test module for global integration tests of all {@code org.apache.lucene.analysis} + * packages/modules. + */ +@SuppressWarnings({"requires-automatic"}) +module org.apache.lucene.analysis.tests { + requires java.xml; + requires org.apache.lucene.core; + requires org.apache.lucene.analysis.common; + requires org.apache.lucene.analysis.icu; + requires org.apache.lucene.analysis.kuromoji; + requires org.apache.lucene.analysis.morfologik; + requires org.apache.lucene.analysis.nori; + requires org.apache.lucene.analysis.opennlp; + requires org.apache.lucene.analysis.phonetic; + requires org.apache.lucene.analysis.smartcn; + requires org.apache.lucene.analysis.stempel; + requires org.apache.lucene.test_framework; + requires junit; Review comment: I needed that particular one so I piggybacked it on the PR. There is LUCENE-10323 to tackle the entire set of module descriptors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778176445 ## File path: lucene/analysis/integration.tests/src/test/module-info.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Test module for global integration tests of all {@code org.apache.lucene.analysis} + * packages/modules. + */ +@SuppressWarnings({"requires-automatic"}) +module org.apache.lucene.analysis.tests { + requires java.xml; + requires org.apache.lucene.core; + requires org.apache.lucene.analysis.common; + requires org.apache.lucene.analysis.icu; + requires org.apache.lucene.analysis.kuromoji; + requires org.apache.lucene.analysis.morfologik; + requires org.apache.lucene.analysis.nori; + requires org.apache.lucene.analysis.opennlp; + requires org.apache.lucene.analysis.phonetic; + requires org.apache.lucene.analysis.smartcn; + requires org.apache.lucene.analysis.stempel; + requires org.apache.lucene.test_framework; + requires junit; Review comment: I will wait for the other branch to be merged and remove the requires here. If I see other places that needs transitive, i will fix. E.g. ICU may need to be transitive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778176916 ## File path: lucene/analysis/integration.tests/src/test/module-info.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Test module for global integration tests of all {@code org.apache.lucene.analysis} + * packages/modules. + */ +@SuppressWarnings({"requires-automatic"}) +module org.apache.lucene.analysis.tests { + requires java.xml; + requires org.apache.lucene.core; + requires org.apache.lucene.analysis.common; + requires org.apache.lucene.analysis.icu; + requires org.apache.lucene.analysis.kuromoji; + requires org.apache.lucene.analysis.morfologik; + requires org.apache.lucene.analysis.nori; + requires org.apache.lucene.analysis.opennlp; + requires org.apache.lucene.analysis.phonetic; + requires org.apache.lucene.analysis.smartcn; + requires org.apache.lucene.analysis.stempel; + requires org.apache.lucene.test_framework; + requires junit; Review comment: At least this "integration test" shows very well what is missing in our modules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] (LUCENE-10344) ECJ batch compiler does not support --patch-module at all
[ https://issues.apache.org/jira/browse/LUCENE-10344 ] Dawid Weiss deleted comment on LUCENE-10344: -- was (Author: JIRAUSER282805): My account is hacked. I'm not sure what is going on. > ECJ batch compiler does not support --patch-module at all > - > > Key: LUCENE-10344 > URL: https://issues.apache.org/jira/browse/LUCENE-10344 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > > There is no way to use module patching with ecj. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10344) ECJ batch compiler does not support --patch-module at all
[ https://issues.apache.org/jira/browse/LUCENE-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468698#comment-17468698 ] Dawid Weiss commented on LUCENE-10344: -- Sorry, I just noticed your comment. There is no easy way to untie all the relationships between tasks, gradle configurations and sourcesets - so if you modify one, it typically entails all others. I gave up on module patching - couldn't get it to work on many other fronts as well. > ECJ batch compiler does not support --patch-module at all > - > > Key: LUCENE-10344 > URL: https://issues.apache.org/jira/browse/LUCENE-10344 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > > There is no way to use module patching with ecj. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10344) ECJ batch compiler does not support --patch-module at all
[ https://issues.apache.org/jira/browse/LUCENE-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10344. -- Resolution: Won't Fix > ECJ batch compiler does not support --patch-module at all > - > > Key: LUCENE-10344 > URL: https://issues.apache.org/jira/browse/LUCENE-10344 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > > There is no way to use module patching with ecj. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs
mayya-sharipova commented on a change in pull request #580: URL: https://github.com/apache/lucene/pull/580#discussion_r778185295 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -239,6 +239,9 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro if (fieldEntry == null || fieldEntry.dimension == 0) { return null; } +if (fieldEntry.size() == 0) { Review comment: @jpountz Thanks for the review. > the above code that checks if (fieldEntry == null || fieldEntry.dimension == 0) is dead code I've removed this dead code in f380e54fea020dd4502d36043d33d09996f72277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10353) Add null injection to analyzer integration tests (e.g. TestRandomChains)
Robert Muir created LUCENE-10353: Summary: Add null injection to analyzer integration tests (e.g. TestRandomChains) Key: LUCENE-10353 URL: https://issues.apache.org/jira/browse/LUCENE-10353 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir These tests inject random parameter values (from argumentProviders). Some generated values may be illegal and IllegalArgumentException is "allowed" if the constructor returns it. None of the values should cause failures at runtime. But for object types, we never inject null values (unless the argumentProvider were to do it itself). We should do this some low % of the time, and "allow" ctors to return NPE too. I see bugs in some of the analyzers where they are just a missing null check in the constructor. It is important to fail on invalid configuration up-front in the ctor, rather than failing e.g. at index time. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10354) Clarify contract of codec APIs with missing/disabled fields
Adrien Grand created LUCENE-10354: - Summary: Clarify contract of codec APIs with missing/disabled fields Key: LUCENE-10354 URL: https://issues.apache.org/jira/browse/LUCENE-10354 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand The question has come up a few times of how codec APIs should react to fields that are missing or do not have the relevant feature enabled. This issue proposes that we improve javadocs and AssertingCodec following the same model as doc values and norms: - The behavior of codec APIs on fields that are missing or don't have the feature enabled is undefined. - CodecReader is responsible for checking FieldInfos before delegating to codec APIs. - AssertingCodec ensures that we never call codec APIs on missing/disabled fields. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.
jpountz opened a new pull request #583: URL: https://github.com/apache/lucene/pull/583 See https://issues.apache.org/jira/browse/LUCENE-10354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs
jpountz commented on a change in pull request #580: URL: https://github.com/apache/lucene/pull/580#discussion_r778216878 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -239,6 +239,9 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro if (fieldEntry == null || fieldEntry.dimension == 0) { return null; } +if (fieldEntry.size() == 0) { Review comment: Thanks Mayya. I opened an issue to clarify these contracts. https://issues.apache.org/jira/browse/LUCENE-10354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO
Feng Guo created LUCENE-10355: - Summary: Remove EMPTY LongValues in favor of LongValues#ZERO Key: LUCENE-10355 URL: https://issues.apache.org/jira/browse/LUCENE-10355 Project: Lucene - Core Issue Type: Improvement Reporter: Feng Guo -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO
[ https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10355: -- Component/s: core/codecs > Remove EMPTY LongValues in favor of LongValues#ZERO > --- > > Key: LUCENE-10355 > URL: https://issues.apache.org/jira/browse/LUCENE-10355 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO
[ https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10355: -- Description: Remove EMPTY LongValues in favor of LongValues#ZEROS > Remove EMPTY LongValues in favor of LongValues#ZERO > --- > > Key: LUCENE-10355 > URL: https://issues.apache.org/jira/browse/LUCENE-10355 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Trivial > > Remove EMPTY LongValues in favor of LongValues#ZEROS -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 closed pull request #584: LUCENE-10355: Remove EMPTY LongValues in favor of LongValues#ZEROS
gf2121 closed pull request #584: URL: https://github.com/apache/lucene/pull/584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468745#comment-17468745 ] Uwe Schindler commented on LUCENE-10352: I found a missing TokenFilter in META_INF/services and also the module: {{org.apache.lucene.analysis.ko.KoreanNumberFilterFactory}} > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468760#comment-17468760 ] Uwe Schindler commented on LUCENE-10352: I also added a new annotation to lucene.core that allows to mark classes and constructors that should not be tested with random chains: {{@IgnoreRandomChains}} > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778280755 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.tests; + +import java.io.IOException; +import java.lang.module.ResolvedModule; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.SortedMap; +import java.util.TreeMap; +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.Assert; + +/** Discovers all classes from the module graph and loads them (without initialization) */ +abstract class ModuleClassDiscovery { + + private static final Module THIS_MODULE = ModuleClassDiscovery.class.getModule(); + private static final ModuleLayer LAYER = THIS_MODULE.getLayer(); + private static final SortedMap ALL_ANALYSIS_MODULES; + + static { Review comment: This is not a test! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778283613 ## File path: lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.tests; + +import java.io.IOException; +import java.lang.module.ResolvedModule; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.SortedMap; +import java.util.TreeMap; +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.Assert; + +/** Discovers all classes from the module graph and loads them (without initialization) */ +abstract class ModuleClassDiscovery { + + private static final Module THIS_MODULE = ModuleClassDiscovery.class.getModule(); + private static final ModuleLayer LAYER = THIS_MODULE.getLayer(); + private static final SortedMap ALL_ANALYSIS_MODULES; + + static { Review comment: The initialization of this class is only done on first call, which is not inside test's class laoding. the code initializes when the first test uses the getter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs
mayya-sharipova commented on a change in pull request #580: URL: https://github.com/apache/lucene/pull/580#discussion_r778314409 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -239,6 +239,9 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro if (fieldEntry == null || fieldEntry.dimension == 0) { return null; } +if (fieldEntry.size() == 0) { Review comment: Thanks Adrien, since the removal of this dead code is already addressed in your PR, I will revert my last commit from this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.
jtibshirani commented on a change in pull request #583: URL: https://github.com/apache/lucene/pull/583#discussion_r778317069 ## File path: lucene/core/src/java/org/apache/lucene/codecs/FieldsProducer.java ## @@ -42,6 +45,14 @@ protected FieldsProducer() {} */ public abstract void checkIntegrity() throws IOException; + /** + * Get the {@link Terms} for this field. The behavior is undefined if the field doesn't have Review comment: Do we need to update the javadoc on `Fields` too? It says "This will return null if the field does not exist." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover clas
sonatype-lift[bot] commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778334098 ## File path: lucene/core/src/java/org/apache/lucene/util/IgnoreRandomChains.java ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * Annotation to not test a class or constructor with {@code TestRandomChains} integration test. + * + * @lucene.internal Review comment: *InvalidBlockTag:* Tag name `lucene.internal` is unknown. If this is a commonly-used custom tag, please click 'not useful' and file a bug. [(details)](https://errorprone.info/bugpattern/InvalidBlockTag) (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] akkig opened a new pull request #2636: SOLR-15892: Adding Randomization in selecting shard leaders for delete and commit requests for RoutedAlias
akkig opened a new pull request #2636: URL: https://github.com/apache/lucene-solr/pull/2636 …e and commit requests for RoutedAlias -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #580: LUCENE-10351 Correct knn search failure with deleted docs
mayya-sharipova merged pull request #580: URL: https://github.com/apache/lucene/pull/580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10351) Correct knn search failure with all deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468847#comment-17468847 ] ASF subversion and git services commented on LUCENE-10351: -- Commit 78da7030370102d024afc9945965ba3ae7195823 in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=78da703 ] LUCENE-10351 Correct knn search failure with deleted docs (#580) Current when doing knn search on an segment where all documents with knn field were deleted, we get the following error: maxSize must be > 0 and < 2147483630; got: 0 java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0 at __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0) at org.apache.lucene.util.LongHeap.(LongHeap.java:57) at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69) at org.apache.lucene.util.LongHeap.create(LongHeap.java:69) at org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41) at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)# This patch fixes this error and ensures empty TopDocs are returned when knn field doesn't have any documents left. > Correct knn search failure with all deleted docs > - > > Key: LUCENE-10351 > URL: https://issues.apache.org/jira/browse/LUCENE-10351 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Trivial > Time Spent: 1h 20m > Remaining Estimate: 0h > > Current when doing knn search on an segment where all documents with knn > field were deleted, we get the following error: > maxSize must be > 0 and < 2147483630; got: 0 > java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; > got: 0 > at > __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0) > at org.apache.lucene.util.LongHeap.(LongHeap.java:57) > at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69) > at org.apache.lucene.util.LongHeap.create(LongHeap.java:69) > at > org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41) > at > org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)# > A desired behaviour: instead of an error, an empty TopDocs should be > returned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10351) Correct knn search failure with all deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468876#comment-17468876 ] ASF subversion and git services commented on LUCENE-10351: -- Commit 0a1cf3108468e31f06cabfd4154b69189dcf6e79 in lucene's branch refs/heads/branch_9x from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0a1cf31 ] LUCENE-10351 Correct knn search failure with deleted docs (#580) Current when doing knn search on an segment where all documents with knn field were deleted, we get the following error: maxSize must be > 0 and < 2147483630; got: 0 java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0 at __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0) at org.apache.lucene.util.LongHeap.(LongHeap.java:57) at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69) at org.apache.lucene.util.LongHeap.create(LongHeap.java:69) at org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41) at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)# This patch fixes this error and ensures empty TopDocs are returned when knn field doesn't have any documents left. > Correct knn search failure with all deleted docs > - > > Key: LUCENE-10351 > URL: https://issues.apache.org/jira/browse/LUCENE-10351 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Trivial > Time Spent: 1.5h > Remaining Estimate: 0h > > Current when doing knn search on an segment where all documents with knn > field were deleted, we get the following error: > maxSize must be > 0 and < 2147483630; got: 0 > java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; > got: 0 > at > __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0) > at org.apache.lucene.util.LongHeap.(LongHeap.java:57) > at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69) > at org.apache.lucene.util.LongHeap.create(LongHeap.java:69) > at > org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41) > at > org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)# > A desired behaviour: instead of an error, an empty TopDocs should be > returned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10351) Correct knn search failure with all deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova resolved LUCENE-10351. -- Fix Version/s: 9.1 Resolution: Fixed > Correct knn search failure with all deleted docs > - > > Key: LUCENE-10351 > URL: https://issues.apache.org/jira/browse/LUCENE-10351 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Trivial > Fix For: 9.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Current when doing knn search on an segment where all documents with knn > field were deleted, we get the following error: > maxSize must be > 0 and < 2147483630; got: 0 > java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; > got: 0 > at > __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0) > at org.apache.lucene.util.LongHeap.(LongHeap.java:57) > at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69) > at org.apache.lucene.util.LongHeap.create(LongHeap.java:69) > at > org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41) > at > org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)# > A desired behaviour: instead of an error, an empty TopDocs should be > returned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c
uschindler commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778421121 ## File path: lucene/core/src/java/org/apache/lucene/util/IgnoreRandomChains.java ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * Annotation to not test a class or constructor with {@code TestRandomChains} integration test. + * + * @lucene.internal Review comment: ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover clas
sonatype-lift[bot] commented on a change in pull request #582: URL: https://github.com/apache/lucene/pull/582#discussion_r778421146 ## File path: lucene/core/src/java/org/apache/lucene/util/IgnoreRandomChains.java ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * Annotation to not test a class or constructor with {@code TestRandomChains} integration test. + * + * @lucene.internal Review comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()
gsmiller commented on pull request #578: URL: https://github.com/apache/lucene/pull/578#issuecomment-1005246240 Nice find! With this kind of speedup just for avoiding the null check inside the counting loop, I wonder if it's worth generalizing beyond countAll? What about relaxing visibility of both `values` and `sparseValues` to `protected` and writing specialized counting logic for both `count` and `countAll` based on which is non-null? Essentially by-pass the `increment` abstraction provided by `IntTaxonomyFacets` for all cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module
rmuir commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1005247521 elsewhere we have the integration tests as `core.tests` and `distribution.tests`, should this be `analysis.tests`? Currently I am messing around in this branch with `./gradlew -p lucene/analysis/integration.tests test` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #574: LUCENE-10346: Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll()
gsmiller commented on pull request #574: URL: https://github.com/apache/lucene/pull/574#issuecomment-1005247653 Great find! For what it's worth, this was overlooked when specializing SSDV in #255 because it was still using binary doc values at the time. When I switched over to numeric doc values, it looks like I overlooked adding this specialization. Thanks for the improvement! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting
Greg Miller created LUCENE-10356: Summary: Special-case singleton doc values for general taxonomy facet counting Key: LUCENE-10356 URL: https://issues.apache.org/jira/browse/LUCENE-10356 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Greg Miller Inspired by [https://github.com/apache/lucene/pull/574,] we should also special-case singleton dvs in the general count path (#573 specialized it for countAll). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting
[ https://issues.apache.org/jira/browse/LUCENE-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468917#comment-17468917 ] Greg Miller commented on LUCENE-10356: -- PR coming shortly :) > Special-case singleton doc values for general taxonomy facet counting > - > > Key: LUCENE-10356 > URL: https://issues.apache.org/jira/browse/LUCENE-10356 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Priority: Minor > > Inspired by [https://github.com/apache/lucene/pull/574,] we should also > special-case singleton dvs in the general count path (#573 specialized it for > countAll). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting
[ https://issues.apache.org/jira/browse/LUCENE-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller reassigned LUCENE-10356: Assignee: Greg Miller > Special-case singleton doc values for general taxonomy facet counting > - > > Key: LUCENE-10356 > URL: https://issues.apache.org/jira/browse/LUCENE-10356 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Assignee: Greg Miller >Priority: Minor > > Inspired by [https://github.com/apache/lucene/pull/574,] we should also > special-case singleton dvs in the general count path (#573 specialized it for > countAll). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo
uschindler commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1005287391 > elsewhere we have the integration tests as `core.tests` and `distribution.tests`, should this be `analysis.tests`? Currently I am messing around in this branch with `./gradlew -p lucene/analysis/integration.tests test` The name does not matter, only the suffix `.tests` matters. There are 2 possibilities: - below the `analysis/` subdirectory (like at moment) with a more generic name `integration.tests` - next to `analysis` top level folder as `analysis.tests` To me it does not matter, both works. I like name "integration" more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo
uschindler commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1005287982 I ran the whole thing the following way: ``` gradlew :lucene:analysis:integration.tests:beast -Dtests.dups=1000 ``` No failures yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo
uschindler commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1005291698 @rmuir I found a failure: ``` $ gradlew :lucene:analysis:integration.tests:test --tests TestRandomChains.testRandomChainsWithLargeStrings -Dtests.seed=AA632771CC823702 -Dtests.slow=true -Dtests.locale=fr-MF -Dtests.timezone=America/Panama -Dtests.asserts=true -Dtests.file.encoding=UTF-8 org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved to C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, copied below: 2> stage 0: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> ppkarrpf<[7-14] +1> 1<[16-17] +1> 5<[18-19] +1> 2> stage 1: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] +1> 759700<[7-14] +0> 1<[16-17] +1> 5<[18-19] +1> 00<[18-19] +0> 2> stage 2: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] +1> 759700<[7-14] +0> 1<[16-17] +1> 00<[18-19] +0> 2> TEST FAIL: useCharFilter=true text='\ud801\udc96\ud801\udcaa\ud801\udc84 ppkarpf {1,5}g?)u em mbm hbil' 2> Exception from random analyzer: 2> charfilters= 2> org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@105e6aa7, true, false) 2> tokenizer= 2> org.apache.lucene.analysis.th.ThaiTokenizer() 2> filters= 2> Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@79889b7f term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, true) 2> org.apache.lucene.analysis.ja.JapaneseNumberFilter(ValidatingTokenFilter@53a9e96c term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false) 2> org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter(ValidatingTokenFilter@6cb4578d term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false, org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter$StemmerOverrideMap@51fc8124) > java.lang.IllegalStateException: stage 2: inconsistent startOffset at pos=3: 16 vs 18; token=00 > at __randomizedtesting.SeedInfo.seed([AA632771CC823702:C038986095CC17F1]:0) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) > at org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.incrementToken(StemmerOverrideFilter.java:67) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) > at org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) >
[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo
uschindler commented on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1005293671 Another one: ``` $ gradlew :lucene:analysis:integration.tests:test --tests TestRandomChains.testRandomChains -Dtests.seed=3A0D0E91E0CA5BFC -Dtests.slow=true -Dtests.locale=nmg-CM -Dtests.timezone=Antarctica/Vostok -Dtests.asserts=true -Dtests.file.encoding=UTF-8 org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved to C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test_17\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, copied below: 2> TEST FAIL: useCharFilter=false text='' 2> Exception from random analyzer: 2> charfilters= 2> org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@7ee7c045) 2> org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter@66bba53d, []) 2> tokenizer= 2> org.apache.lucene.analysis.core.KeywordTokenizer(27) 2> filters= 2> org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter(ValidatingTokenFilter@1a8623a7 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false, -21) > java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 32 > at __randomizedtesting.SeedInfo.seed([3A0D0E91E0CA5BFC:7EC27F0A7D8463C]:0) > at org.apache.lucene.analysis.kuromoji@10.0.0-SNAPSHOT/org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.stem(JapaneseKatakanaStemFilter.java:76) > at org.apache.lucene.analysis.kuromoji@10.0.0-SNAPSHOT/org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:59) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) > at org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains(TestRandomChains.java:911) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler edited a comment on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check
uschindler edited a comment on pull request #582: URL: https://github.com/apache/lucene/pull/582#issuecomment-1005291698 @rmuir I found a failure: ``` $ gradlew :lucene:analysis:integration.tests:test --tests TestRandomChains.testRandomChainsWithLargeStrings -Dtests.seed=AA632771CC823702 -Dtests.slow=true -Dtests.locale=fr-MF -Dtests.timezone=America/Panama -Dtests.asserts=true -Dtests.file.encoding=UTF-8 org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved to C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, copied below: 2> stage 0: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> ppkarrpf<[7-14] +1> 1<[16-17] +1> 5<[18-19] +1> 2> stage 1: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] +1> 759700<[7-14] +0> 1<[16-17] +1> 5<[18-19] +1> 00<[18-19] +0> 2> stage 2: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] +1> 759700<[7-14] +0> 1<[16-17] +1> 00<[18-19] +0> 2> TEST FAIL: useCharFilter=true text='\ud801\udc96\ud801\udcaa\ud801\udc84 ppkarpf {1,5}g?)u em mbm hbil' 2> Exception from random analyzer: 2> charfilters= 2> org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@105e6aa7, true, false) 2> tokenizer= 2> org.apache.lucene.analysis.th.ThaiTokenizer() 2> filters= 2> Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@79889b7f term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, true) 2> org.apache.lucene.analysis.ja.JapaneseNumberFilter(ValidatingTokenFilter@53a9e96c term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false) 2> org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter(ValidatingTokenFilter@6cb4578d term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false, org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter$StemmerOverrideMap@51fc8124) > java.lang.IllegalStateException: stage 2: inconsistent startOffset at pos=3: 16 vs 18; token=00 > at __randomizedtesting.SeedInfo.seed([AA632771CC823702:C038986095CC17F1]:0) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) > at org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.incrementToken(StemmerOverrideFilter.java:67) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) > at org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9537) Add Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468942#comment-17468942 ] Cameron VandenBerg commented on LUCENE-9537: Hello [~jpountz]! Thank you for finding this. I have fixed this issue in my latest PR (LUCENE-10157), which I just updated tonight. My original thought was that I needed the IndriScorer because I wanted to use the boosts outside of the Similarity just like we do in our Indri Search Engine. However, I now feel that it makes more sense for the Indri scoring to use boosts in the same way that lucene does (inside the Similarity), which allows me to remove the `if (scorer instanceof IndriScorer)` logic from the IndriScorers. I have added tests and found that this works as expected now. I have also added the IndriOr and the IndriWeightedSum scorers as well as an IndriQueryParser to this new PR. Thank you! > Add Indri Search Engine Functionality to Lucene > --- > > Key: LUCENE-9537 > URL: https://issues.apache.org/jira/browse/LUCENE-9537 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Cameron VandenBerg >Priority: Major > Labels: patch > Fix For: 8.9 > > Attachments: LUCENE-9537.patch, LUCENE-INDRI.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Indri ([http://lemurproject.org/indri.php]) is an academic search engine > developed by The University of Massachusetts and Carnegie Mellon University. > The major difference between Lucene and Indri is that Indri will give a > document a "smoothing score" to a document that does not contain the search > term, which has improved the search ranking accuracy in our experiments. I > have created an Indri patch, which adds the search code needed to implement > the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468945#comment-17468945 ] Cameron VandenBerg commented on LUCENE-10157: - Hello [~cpoerschke]! I have added and updated the new PR (LUCENE-10157). I have added test cases. I am happy to work with you if you have any suggestions. Thank you! > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()
gf2121 commented on pull request #578: URL: https://github.com/apache/lucene/pull/578#issuecomment-1005403431 Thanks @gsmiller for suggestion! > With this kind of speedup just for avoiding the null check inside the counting loop. Yes, this is indeed suprising. I thought `BrowseMonthTaxoFacets` should get a similar QPS as `BrowseMonthSSDVFacets` since SSDV is already using a `count[ord]++` and its total cardinality is just 12 (converting to global ordinal should be cheap). I also noticed that the stdDev is becoming much higher. It seems some of the JVM rounds are getting speed up but some not, I suspect some optimization in JVM was triggered from time to time but i'm not really sure what it is. I tried JDK11/16/17 and all get similar benchmark results (higher QPS & higer stdDev). > What about relaxing visibility of both values and sparseValues to protected and writing specialized counting logic for both count and countAll based on which is non-null? Nice idea, will do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org