[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468417#comment-17468417
 ] 

Adrien Grand commented on LUCENE-8739:
--

I updated block sizes so that ZSTD uses the same block sizes as 
BEST_COMPRESSION and it looks much better now.

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| BEST_SPEED (LZ4 with small blocks) | 35383 | 90.175 | 190.17524 |
| BEST_COMPRESSION (vanilla zlib, DEFLATE level=6) | 76671 | 58.682 | 
1910.42106 |
| BEST_COMPRESSION (Cloudflare zlib, DEFLATE level=6) | 54791 | 58.601 | 
1395.53593 |
| ZSTD dict (level=1) | 24687 | 63.324 | 928.73997 |
| ZSTD dict (level=2) | 24934 | 63.722 | 977.29911 |
| ZSTD dict (level=3) | 28285 | 62.072 | 938.10886 |
| ZSTD dict (level=4) | 37863 | 60.427 | 969.18655 |
| ZSTD dict (level=5) | 45479 | 59.317 | 941.20922 |
| ZSTD dict (level=6) | 57842 | 58.481 | 881.69049 |
| ZSTD dict (level=7) | 65796 | 58.107 | 886.42249 |

On this dataset, the main benefit seems to be the retrieval speed. Regarding 
indexing times and space efficiency, either you go with level 5 and you are 
faster to index data but less space-efficient than DEFLATE (with the Cloudflare 
zlib), or you go with level 6 and you are more space-efficient but slower to 
index.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Tobias Ibounig (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468482#comment-17468482
 ] 

Tobias Ibounig commented on LUCENE-8739:


Would it make sense to increase the block size until retrieval times approach 
those of zlib (between CF and vanilla)?
Would such an increase even make sense or would this cause other issues?

Then there also could be 3 presets

BEST_SPEED --> stays LZ4
BALANCED --> low level ZSTD + dict (maybe even slightly smaller block size, for 
slightly faster retrial)
BEST_COMPRESSION --> ZSTD with higher block size and higher level (maybe 5-9)

Or would 3 presets be too much choice?

Anyway I see potential for good tradeoffs here.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.

2022-01-04 Thread GitBox


uschindler commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1004639920


   > I've been wondering what to do with the security manager. If we're on Java 
17, where it's officially deprecated, perhaps we should just get rid of it? I 
know it sounds terrible but it's inevitable in the long run anyway - if main is 
the future version then perhaps it's the time to commit to such a decision?
   
   Hi, I was thinking about the same thing. But we should not doing this on 
this Commit. So let's start with Suppression and later decide how to fix the 
test sandbox.
   
   I agree, we should remove the AccessController parts in productive code. But 
one step after the other (first fix tests then remove).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)
Uwe Schindler created LUCENE-10352:
--

 Summary: Convert TestAllAnalyzersHaveFactories and 
TestRandomChains to a global integration test and discover classes to check 
from module system
 Key: LUCENE-10352
 URL: https://issues.apache.org/jira/browse/LUCENE-10352
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler


Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
analysis-commons module, but e.g. we do not do a random chain with kuromoji and 
ICU. Also both tests rely on some hacky classpath-inspection and the tests fail 
if ran on a JAR file.

This issue tracks progress I am currently doing to refactor this:
- Move those 2 classes to a new gradle subproject 
:lucene:analysis:integration.tests and add a module-info referring to all other 
analysis packages
- Rewrite the class discovery to use ModuleReader
- Run TestAllAnalyzersHaveFactories per module (using one module reader), so it 
discovers all classes and ensures that factory and stream are in same module 
(there are some core vs. analysis.common discrepancies)
- RunTestRandomChains on the whole module graph. The classes are discovered 
from all module readers in the graph (filtering on module name starting with 
"org.apache.lucene.analysis."
- Also compare that the SPI factories returned by discovery match those we have 
in the module graphs

While doing this I disovered some bad things:
- TestRandomChains depends on test-only resources. We may need to replicate 
those (it is about 5 files that are fed into the ctors)
- We have 5 different StrickMockResourceLoaders: Originally it was only in 
analysis common, now its everywhere. I will move this class to test-framework. 
This is unrelated but can be done here. The background of this was that 
analysis factories and resource loaders were not part of lucene core, so the 
resourceloader interface couldn't be in test-framework.

You can watch progress here: 
[https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains]

(it currently is a branch against LUCENE-10348 because it was not yet merged, 
but it depends on that one)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468498#comment-17468498
 ] 

Adrien Grand commented on LUCENE-8739:
--

bq. Would such an increase even make sense or would this cause other issues?

It would require reading more data from disk. This read would be sequential so 
I suspect it wouldn't hurt much, including on slower I/O. The main drawback is 
probably that it would trash a bit more of filesystem cache. That said I agree 
with you that we should probably look into increasing the block size with 
ZStandard. I just did a run with 1.5x larger blocks and level=6, it slightly 
outperforms our current BEST_COMPRESSION mode across indexing time, disk usage 
and compression.

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| ZSTD dict level=6 1.5x larger blocks | 43228 | 57.455 | 1269.22127 |

bq. Or would 3 presets be too much choice?

IMO it would be too much, but I like the fact that ZSTD could help us have two 
options for compression that share the exact same read logic, e.g. if we 
replaced BEST_SPEED with what you suggested for BALANCED: low level ZSTD 
compression with a small block size.

bq. Anyway I see potential for good tradeoffs here.

+1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, 
because lucene-core shouldn't have dependencies and we don't want to use JNI in 
the lucere-core build. Maybe we can reconsider when Project Panama lands and it 
gets easier to interact with native libraries.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.

2022-01-04 Thread GitBox


jpountz commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1004649045


   Sorry @uschindler I should have made it clear that this PR was not passing 
precommit checks yet.
   
   Agreed on handling the SecurityManager changes in a separate PR. I only 
fixed usage of Long/Integer constructors because it looked easier than I had 
thought initially.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #581: LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader

2022-01-04 Thread GitBox


uschindler commented on pull request #581:
URL: https://github.com/apache/lucene/pull/581#issuecomment-1004649040


   > Change looks good to me. I like that tests are improving. We may wish to 
someday migrate the `TestAllAnalyzerHaveFactories` and maybe even 
`TestRandomChains` to the modular layer. This way we know for sure all the 
things really work with the module system. But that's for another day.
   
   I started https://issues.apache.org/jira/browse/LUCENE-10352, you can watch 
progress here: 
https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468504#comment-17468504
 ] 

Uwe Schindler commented on LUCENE-10352:


You can watch progress here: 
[https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains]

(it currently is a branch against LUCENE-10348 because it was not yet merged, 
but it depends on that one)

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StrickMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468505#comment-17468505
 ] 

Uwe Schindler commented on LUCENE-10352:


CC [~rmuir]

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StrickMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10352:
---
Description: 
Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
analysis-commons module, but e.g. we do not do a random chain with kuromoji and 
ICU. Also both tests rely on some hacky classpath-inspection and the tests fail 
if ran on a JAR file.

This issue tracks progress I am currently doing to refactor this:
- Move those 2 classes to a new gradle subproject 
:lucene:analysis:integration.tests and add a module-info referring to all other 
analysis packages
- Rewrite the class discovery to use ModuleReader
- Run TestAllAnalyzersHaveFactories per module (using one module reader), so it 
discovers all classes and ensures that factory and stream are in same module 
(there are some core vs. analysis.common discrepancies)
- RunTestRandomChains on the whole module graph. The classes are discovered 
from all module readers in the graph (filtering on module name starting with 
"org.apache.lucene.analysis."
- Also compare that the SPI factories returned by discovery match those we have 
in the module graphs

While doing this I disovered some bad things:
- TestRandomChains depends on test-only resources. We may need to replicate 
those (it is about 5 files that are fed into the ctors)
- We have 5 different StrickMockResourceLoaders: Originally it was only in 
analysis common, now its everywhere. I will move this class to test-framework. 
This is unrelated but can be done here. The background of this was that 
analysis factories and resource loaders were not part of lucene core, so the 
resourceloader interface couldn't be in test-framework.

  was:
Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
analysis-commons module, but e.g. we do not do a random chain with kuromoji and 
ICU. Also both tests rely on some hacky classpath-inspection and the tests fail 
if ran on a JAR file.

This issue tracks progress I am currently doing to refactor this:
- Move those 2 classes to a new gradle subproject 
:lucene:analysis:integration.tests and add a module-info referring to all other 
analysis packages
- Rewrite the class discovery to use ModuleReader
- Run TestAllAnalyzersHaveFactories per module (using one module reader), so it 
discovers all classes and ensures that factory and stream are in same module 
(there are some core vs. analysis.common discrepancies)
- RunTestRandomChains on the whole module graph. The classes are discovered 
from all module readers in the graph (filtering on module name starting with 
"org.apache.lucene.analysis."
- Also compare that the SPI factories returned by discovery match those we have 
in the module graphs

While doing this I disovered some bad things:
- TestRandomChains depends on test-only resources. We may need to replicate 
those (it is about 5 files that are fed into the ctors)
- We have 5 different StrickMockResourceLoaders: Originally it was only in 
analysis common, now its everywhere. I will move this class to test-framework. 
This is unrelated but can be done here. The background of this was that 
analysis factories and resource loaders were not part of lucene core, so the 
resourceloader interface couldn't be in test-framework.

You can watch progress here: 
[https://github.com/uschindler/lucene/compare/jira/LUCENE-10348...uschindler:draft/random-chains]

(it currently is a branch against LUCENE-10348 because it was not yet merged, 
but it depends on that one)


> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> -

[jira] [Updated] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10352:
---
Description: 
Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
analysis-commons module, but e.g. we do not do a random chain with kuromoji and 
ICU. Also both tests rely on some hacky classpath-inspection and the tests fail 
if ran on a JAR file.

This issue tracks progress I am currently doing to refactor this:
- Move those 2 classes to a new gradle subproject 
:lucene:analysis:integration.tests and add a module-info referring to all other 
analysis packages
- Rewrite the class discovery to use ModuleReader
- Run TestAllAnalyzersHaveFactories per module (using one module reader), so it 
discovers all classes and ensures that factory and stream are in same module 
(there are some core vs. analysis.common discrepancies)
- RunTestRandomChains on the whole module graph. The classes are discovered 
from all module readers in the graph (filtering on module name starting with 
"org.apache.lucene.analysis."
- Also compare that the SPI factories returned by discovery match those we have 
in the module graphs

While doing this I disovered some bad things:
- TestRandomChains depends on test-only resources. We may need to replicate 
those (it is about 5 files that are fed into the ctors)
- We have 5 different StringMockResourceLoaders: Originally it was only in 
analysis common, now its everywhere. I will move this class to test-framework. 
This is unrelated but can be done here. The background of this was that 
analysis factories and resource loaders were not part of lucene core, so the 
resourceloader interface couldn't be in test-framework.

  was:
Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
analysis-commons module, but e.g. we do not do a random chain with kuromoji and 
ICU. Also both tests rely on some hacky classpath-inspection and the tests fail 
if ran on a JAR file.

This issue tracks progress I am currently doing to refactor this:
- Move those 2 classes to a new gradle subproject 
:lucene:analysis:integration.tests and add a module-info referring to all other 
analysis packages
- Rewrite the class discovery to use ModuleReader
- Run TestAllAnalyzersHaveFactories per module (using one module reader), so it 
discovers all classes and ensures that factory and stream are in same module 
(there are some core vs. analysis.common discrepancies)
- RunTestRandomChains on the whole module graph. The classes are discovered 
from all module readers in the graph (filtering on module name starting with 
"org.apache.lucene.analysis."
- Also compare that the SPI factories returned by discovery match those we have 
in the module graphs

While doing this I disovered some bad things:
- TestRandomChains depends on test-only resources. We may need to replicate 
those (it is about 5 files that are fed into the ctors)
- We have 5 different StrickMockResourceLoaders: Originally it was only in 
analysis common, now its everywhere. I will move this class to test-framework. 
This is unrelated but can be done here. The background of this was that 
analysis factories and resource loaders were not part of lucene core, so the 
resourceloader interface couldn't be in test-framework.


> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned 

[GitHub] [lucene] gf2121 commented on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()

2022-01-04 Thread GitBox


gf2121 commented on pull request #578:
URL: https://github.com/apache/lucene/pull/578#issuecomment-1004760012


   Indeed, I am very suprised to see such a big QPS improvement just by 
removing some null check and method call. It is also interesting that 
BrowseMonthTaxoFacets can be much faster then BrowseMonthSSDVFacets now but I 
do think they should have similar QPS... I noticed that the stddev is also 
becoming much higher, so I wonder if there is a amazing optimization in JVM 
that is triggered from time to time after this change? It seems delicious to 
find this out and try to make it stable.
   
   To double check, I rerun a wikimediumall benchmark, getting a similar result 
(higher QPS & higher stddev)
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
  BrowseMonthSSDVFacets4.67  (9.8%)4.50  
(6.0%)   -3.6% ( -17% -   13%) 0.161
BrowseRandomLabelSSDVFacets3.09  (5.6%)3.03  
(4.6%)   -2.0% ( -11% -8%) 0.211
 HighPhrase   15.15  (3.3%)   14.84  
(3.8%)   -2.0% (  -8% -5%) 0.074
  LowPhrase   37.67  (3.2%)   36.95  
(3.6%)   -1.9% (  -8% -5%) 0.080
LowTerm 1671.58  (4.1%) 1644.34  
(4.8%)   -1.6% ( -10% -7%) 0.245
   HighTerm 1210.15  (5.5%) 1191.05  
(7.5%)   -1.6% ( -13% -   12%) 0.448
  BrowseDayOfYearSSDVFacets4.19  (6.9%)4.12  
(5.5%)   -1.5% ( -13% -   11%) 0.436
LowSloppyPhrase   11.70  (3.4%)   11.52  
(1.7%)   -1.5% (  -6% -3%) 0.069
  MedPhrase   17.15  (2.5%)   16.90  
(3.6%)   -1.5% (  -7% -4%) 0.140
   HighSloppyPhrase8.52  (5.1%)8.40  
(3.2%)   -1.4% (  -9% -7%) 0.287
AndHighMedDayTaxoFacets   45.58  (2.8%)   44.97  
(2.2%)   -1.3% (  -6% -3%) 0.093
   OrNotHighLow  999.69  (3.1%)  987.25  
(3.6%)   -1.2% (  -7% -5%) 0.242
   AndHighHighDayTaxoFacets   11.16  (3.2%)   11.02  
(3.8%)   -1.2% (  -8% -6%) 0.271
 AndHighLow  662.82  (2.5%)  655.64  
(3.0%)   -1.1% (  -6% -4%) 0.212
Respell   41.99  (2.7%)   41.58  
(3.2%)   -1.0% (  -6% -5%) 0.290
   OrHighNotMed 1074.76  (4.8%) 1065.13  
(6.7%)   -0.9% ( -11% -   11%) 0.626
MedSloppyPhrase6.08  (3.4%)6.02  
(2.0%)   -0.9% (  -6% -4%) 0.317
AndHighHigh   30.48  (4.4%)   30.26  
(4.2%)   -0.7% (  -8% -8%) 0.588
  OrHighLow  451.34  (2.9%)  448.36  
(4.2%)   -0.7% (  -7% -6%) 0.562
  OrNotHighHigh  807.68  (4.5%)  803.96  
(5.2%)   -0.5% (  -9% -9%) 0.764
LowIntervalsOrdered   21.59  (5.9%)   21.49  
(5.6%)   -0.5% ( -11% -   11%) 0.804
 AndHighMed   82.97  (3.9%)   82.62  
(4.1%)   -0.4% (  -8% -7%) 0.737
   PKLookup  196.30  (4.4%)  195.53  
(4.0%)   -0.4% (  -8% -8%) 0.769
  OrHighNotHigh  815.15  (4.4%)  812.52  
(5.8%)   -0.3% ( -10% -   10%) 0.843
   OrNotHighMed 1132.97  (3.6%) 1130.84  
(6.7%)   -0.2% ( -10% -   10%) 0.912
MedIntervalsOrdered   18.77  (9.9%)   18.74  
(9.2%)   -0.2% ( -17% -   21%) 0.958
 OrHighHigh   15.86  (5.1%)   15.88  
(3.5%)0.1% (  -8% -9%) 0.916
 Fuzzy1   63.19  (2.6%)   63.33  
(2.8%)0.2% (  -5% -5%) 0.799
   HighIntervalsOrdered4.97 (10.2%)4.98  
(9.5%)0.2% ( -17% -   22%) 0.938
  OrHighMed   43.11  (4.8%)   43.25  
(3.8%)0.3% (  -7% -9%) 0.819
 IntNRQ   47.52 (27.6%)   47.67 
(29.2%)0.3% ( -44% -   79%) 0.971
 Fuzzy2   74.23  (2.7%)   74.48  
(3.3%)0.3% (  -5% -6%) 0.722
LowSpanNear   85.95  (3.7%)   86.34  
(4.5%)0.5% (  -7% -9%) 0.731
 OrHighMedDayTaxoFacets1.91  (6.9%)1.93  
(8.0%)1.1% ( -12% -   17%) 0.655
   OrHighNotLow 1181.50  (3.9%) 1194.56  
(6.5%)1.1% (  -8% -   11%) 0.511
MedTerm 1734.50  (4.6%) 1754.47  
(5.9%)1.2% (  -8% -   12%) 0.4

[GitHub] [lucene] gf2121 removed a comment on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()

2022-01-04 Thread GitBox


gf2121 removed a comment on pull request #578:
URL: https://github.com/apache/lucene/pull/578#issuecomment-1004760012


   Indeed, I am very suprised to see such a big QPS improvement just by 
removing some null check and method call. It is also interesting that 
BrowseMonthTaxoFacets can be much faster then BrowseMonthSSDVFacets now but I 
do think they should have similar QPS... I noticed that the stddev is also 
becoming much higher, so I wonder if there is a amazing optimization in JVM 
that is triggered from time to time after this change? It seems delicious to 
find this out and try to make it stable.
   
   To double check, I rerun a wikimediumall benchmark, getting a similar result 
(higher QPS & higher stddev)
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
  BrowseMonthSSDVFacets4.67  (9.8%)4.50  
(6.0%)   -3.6% ( -17% -   13%) 0.161
BrowseRandomLabelSSDVFacets3.09  (5.6%)3.03  
(4.6%)   -2.0% ( -11% -8%) 0.211
 HighPhrase   15.15  (3.3%)   14.84  
(3.8%)   -2.0% (  -8% -5%) 0.074
  LowPhrase   37.67  (3.2%)   36.95  
(3.6%)   -1.9% (  -8% -5%) 0.080
LowTerm 1671.58  (4.1%) 1644.34  
(4.8%)   -1.6% ( -10% -7%) 0.245
   HighTerm 1210.15  (5.5%) 1191.05  
(7.5%)   -1.6% ( -13% -   12%) 0.448
  BrowseDayOfYearSSDVFacets4.19  (6.9%)4.12  
(5.5%)   -1.5% ( -13% -   11%) 0.436
LowSloppyPhrase   11.70  (3.4%)   11.52  
(1.7%)   -1.5% (  -6% -3%) 0.069
  MedPhrase   17.15  (2.5%)   16.90  
(3.6%)   -1.5% (  -7% -4%) 0.140
   HighSloppyPhrase8.52  (5.1%)8.40  
(3.2%)   -1.4% (  -9% -7%) 0.287
AndHighMedDayTaxoFacets   45.58  (2.8%)   44.97  
(2.2%)   -1.3% (  -6% -3%) 0.093
   OrNotHighLow  999.69  (3.1%)  987.25  
(3.6%)   -1.2% (  -7% -5%) 0.242
   AndHighHighDayTaxoFacets   11.16  (3.2%)   11.02  
(3.8%)   -1.2% (  -8% -6%) 0.271
 AndHighLow  662.82  (2.5%)  655.64  
(3.0%)   -1.1% (  -6% -4%) 0.212
Respell   41.99  (2.7%)   41.58  
(3.2%)   -1.0% (  -6% -5%) 0.290
   OrHighNotMed 1074.76  (4.8%) 1065.13  
(6.7%)   -0.9% ( -11% -   11%) 0.626
MedSloppyPhrase6.08  (3.4%)6.02  
(2.0%)   -0.9% (  -6% -4%) 0.317
AndHighHigh   30.48  (4.4%)   30.26  
(4.2%)   -0.7% (  -8% -8%) 0.588
  OrHighLow  451.34  (2.9%)  448.36  
(4.2%)   -0.7% (  -7% -6%) 0.562
  OrNotHighHigh  807.68  (4.5%)  803.96  
(5.2%)   -0.5% (  -9% -9%) 0.764
LowIntervalsOrdered   21.59  (5.9%)   21.49  
(5.6%)   -0.5% ( -11% -   11%) 0.804
 AndHighMed   82.97  (3.9%)   82.62  
(4.1%)   -0.4% (  -8% -7%) 0.737
   PKLookup  196.30  (4.4%)  195.53  
(4.0%)   -0.4% (  -8% -8%) 0.769
  OrHighNotHigh  815.15  (4.4%)  812.52  
(5.8%)   -0.3% ( -10% -   10%) 0.843
   OrNotHighMed 1132.97  (3.6%) 1130.84  
(6.7%)   -0.2% ( -10% -   10%) 0.912
MedIntervalsOrdered   18.77  (9.9%)   18.74  
(9.2%)   -0.2% ( -17% -   21%) 0.958
 OrHighHigh   15.86  (5.1%)   15.88  
(3.5%)0.1% (  -8% -9%) 0.916
 Fuzzy1   63.19  (2.6%)   63.33  
(2.8%)0.2% (  -5% -5%) 0.799
   HighIntervalsOrdered4.97 (10.2%)4.98  
(9.5%)0.2% ( -17% -   22%) 0.938
  OrHighMed   43.11  (4.8%)   43.25  
(3.8%)0.3% (  -7% -9%) 0.819
 IntNRQ   47.52 (27.6%)   47.67 
(29.2%)0.3% ( -44% -   79%) 0.971
 Fuzzy2   74.23  (2.7%)   74.48  
(3.3%)0.3% (  -5% -6%) 0.722
LowSpanNear   85.95  (3.7%)   86.34  
(4.5%)0.5% (  -7% -9%) 0.731
 OrHighMedDayTaxoFacets1.91  (6.9%)1.93  
(8.0%)1.1% ( -12% -   17%) 0.655
   OrHighNotLow 1181.50  (3.9%) 1194.56  
(6.5%)1.1% (  -8% -   11%) 0.511
MedTerm 1734.50  (4.6%) 1754.47  
(5.9%)1.2% (  -8% -   

[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-01-04 Thread Tobias Ibounig (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468610#comment-17468610
 ] 

Tobias Ibounig commented on LUCENE-8739:


Ok this all sounds very good.

Just one more thing for further trade off considerations:
ZSTD also supports negative compression levels (but I don't know how those are 
exposed in JNI library), [see benchmark table|https://github.com/facebook/zstd].
So level=-1 could be another consideration to get closer to LZ4 Retrieval Speed 
for BEST_SPEED.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #581: LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader

2022-01-04 Thread GitBox


uschindler merged pull request #581:
URL: https://github.com/apache/lucene/pull/581


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10348) Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader

2022-01-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468644#comment-17468644
 ] 

ASF subversion and git services commented on LUCENE-10348:
--

Commit 4bacf93c7ee925fe42bde52af801ed6f494afdd4 in lucene's branch 
refs/heads/main from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4bacf93 ]

LUCENE-10348: Make stopwords resources from analyzers modules visible to 
ClasspathResourceLoader and ModuleResourceLoader (#581)



> Make stopwords resources from analyzers modules visible to 
> ClasspathResourceLoader and ModuleResourceLoader
> ---
>
> Key: LUCENE-10348
> URL: https://issues.apache.org/jira/browse/LUCENE-10348
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/other
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a followup after LUCENE-10335:
> In LUCENE-10335 we fixed all Analyzers static constructors to not rely on the 
> analysis modules to export stopwords files. Nevertheless, it is currently 
> impossible to use CustomAnalyzer and refer to stopwords files in the 
> analyzers module, because those are hidden, [~rmuir] showed some examples of 
> how this is uesed. This would also make it impossible for Solr to reuse the 
> resources files!
> Basically we should make the stopwords files (and other resources) visible. 
> We have to options:
> - open all packages with resources unconditionally
> - open all packages with resources only to module "org.apache.lucene.core"
> I prefer the second one. The problem with Java's module system is the 
> following: If you open a package, it is also open for (deep) reflection, 
> which we don't want. So my proposal is to only open those packages only to 
> lucene core, so the ModuleResourceLoader and ClasspathResourceLoader can see 
> those resources (and Solr would be able to delegate to these resource loaders 
> if needed).
> I will adds a test in the distribution.tests package that verifies that 
> resources in all analysis packages are open.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10348) Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader

2022-01-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468647#comment-17468647
 ] 

ASF subversion and git services commented on LUCENE-10348:
--

Commit d17c54b5ff56ffadd8692e45629215eaf479d8e3 in lucene's branch 
refs/heads/branch_9x from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d17c54b ]

LUCENE-10348: Make stopwords resources from analyzers modules visible to 
ClasspathResourceLoader and ModuleResourceLoader (#581)



> Make stopwords resources from analyzers modules visible to 
> ClasspathResourceLoader and ModuleResourceLoader
> ---
>
> Key: LUCENE-10348
> URL: https://issues.apache.org/jira/browse/LUCENE-10348
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/other
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a followup after LUCENE-10335:
> In LUCENE-10335 we fixed all Analyzers static constructors to not rely on the 
> analysis modules to export stopwords files. Nevertheless, it is currently 
> impossible to use CustomAnalyzer and refer to stopwords files in the 
> analyzers module, because those are hidden, [~rmuir] showed some examples of 
> how this is uesed. This would also make it impossible for Solr to reuse the 
> resources files!
> Basically we should make the stopwords files (and other resources) visible. 
> We have to options:
> - open all packages with resources unconditionally
> - open all packages with resources only to module "org.apache.lucene.core"
> I prefer the second one. The problem with Java's module system is the 
> following: If you open a package, it is also open for (deep) reflection, 
> which we don't want. So my proposal is to only open those packages only to 
> lucene core, so the ModuleResourceLoader and ClasspathResourceLoader can see 
> those resources (and Solr would be able to delegate to these resource loaders 
> if needed).
> I will adds a test in the distribution.tests package that verifies that 
> resources in all analysis packages are open.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo

2022-01-04 Thread GitBox


uschindler opened a new pull request #582:
URL: https://github.com/apache/lucene/pull/582


   See: https://issues.apache.org/jira/browse/LUCENE-10352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778117015



##
File path: gradle/java/modules.gradle
##
@@ -214,7 +214,7 @@ allprojects {
 }
 
 // Configure (tasks.test, sourceSets.test)
-tasks.matching { it.name == "test" }.all { Test task ->
+tasks.matching { it.name ==~ /test(_[0-9]+)?/ }.all { Test task ->

Review comment:
   Hi @dweiss,
   this was needed to make the `gradlew beast` working. I just patched this 
here. If you have a better way to fix this (maybe select tests based on type 
not name) just commit to main.
   It may also be that you have rewritten it anyways in your PR.
   This is just to inform you that something was missing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778121113



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/da_UTF8.xml
##
@@ -0,0 +1,1208 @@
+

Review comment:
   Tis file was copied from common's tests. Maybe we can use a more 
simplified one




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778121113



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/da_UTF8.xml
##
@@ -0,0 +1,1208 @@
+

Review comment:
   This file was copied from common's tests. Maybe we can use a more 
simplified one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778121625



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/simple.aff
##
@@ -0,0 +1,20 @@
+SET UTF-8

Review comment:
   This file was copied from common's tests.

##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/simple.dic
##
@@ -0,0 +1,11 @@
+9

Review comment:
   This file was copied from common's tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10348) Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader

2022-01-04 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-10348.

Fix Version/s: 9.1
   10.0 (main)
   Resolution: Fixed

> Make stopwords resources from analyzers modules visible to 
> ClasspathResourceLoader and ModuleResourceLoader
> ---
>
> Key: LUCENE-10348
> URL: https://issues.apache.org/jira/browse/LUCENE-10348
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/other
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: 9.1, 10.0 (main)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a followup after LUCENE-10335:
> In LUCENE-10335 we fixed all Analyzers static constructors to not rely on the 
> analysis modules to export stopwords files. Nevertheless, it is currently 
> impossible to use CustomAnalyzer and refer to stopwords files in the 
> analyzers module, because those are hidden, [~rmuir] showed some examples of 
> how this is uesed. This would also make it impossible for Solr to reuse the 
> resources files!
> Basically we should make the stopwords files (and other resources) visible. 
> We have to options:
> - open all packages with resources unconditionally
> - open all packages with resources only to module "org.apache.lucene.core"
> I prefer the second one. The problem with Java's module system is the 
> following: If you open a package, it is also open for (deep) reflection, 
> which we don't want. So my proposal is to only open those packages only to 
> lucene core, so the ModuleResourceLoader and ClasspathResourceLoader can see 
> those resources (and Solr would be able to delegate to these resource loaders 
> if needed).
> I will adds a test in the distribution.tests package that verifies that 
> resources in all analysis packages are open.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468664#comment-17468664
 ] 

Uwe Schindler commented on LUCENE-10352:


Here is now a first PR: https://github.com/apache/lucene/pull/582

- It compiles and I only had to copy 3 test resource files. We may simplify 
them, so the modular integration test can use a limited set of resources.
- The StringMockResourceLoader was deduplicated and moved to test-framework
- A unused class inside TestRandomChains used only by TestBugInSomething was 
moved
- The available snowball stemmers are also discovered from module system with 
same code. This spares to copy the stemmers.txt file (which is autogenerated 
and I did not want to change this)

What's broken:
- A few components have no factory. The problem was that tests were only 
running in common, not all modules
- We need more argument providers for TestRandomChains, e.g. for ICU or 
Morphologic Ctors.

To do a quick test, you can restrict the module discovery to only read 
org.apache.lucene.core and org.apache.lucene.analysis.common. I marked a place 
to comment in/out code

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778140193



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.tests;
+
+import java.io.IOException;
+import java.lang.module.ResolvedModule;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.junit.Assert;
+
+/** Discovers all classes from the module graph and loads them (without 
initialization) */
+abstract class ModuleClassDiscovery {
+
+  private static final Module THIS_MODULE = 
ModuleClassDiscovery.class.getModule();
+  private static final ModuleLayer LAYER = THIS_MODULE.getLayer();
+  private static final SortedMap ALL_ANALYSIS_MODULES;
+
+  static {
+Assert.assertTrue(
+"Analysis integration tests must run in Java Module System as named 
module",
+THIS_MODULE.isNamed());
+Assert.assertNotNull("Module layer is missing", LAYER);
+
+var mods = new TreeMap();
+discoverAnalysisModules(LAYER, mods);
+ALL_ANALYSIS_MODULES = Collections.unmodifiableSortedMap(mods);
+if (LuceneTestCase.VERBOSE) {
+  System.out.println(
+  "Discovered the following analysis modules: " + 
ALL_ANALYSIS_MODULES.keySet());
+}
+  }
+
+  private static void discoverAnalysisModules(
+  ModuleLayer layer, Map result) {
+for (var mod : layer.configuration().modules()) {
+  String name = mod.name();
+  if (Objects.equals(name, THIS_MODULE.getName())) {
+continue;
+  }
+  if (name.equals("org.apache.lucene.core") || 
name.equals("org.apache.lucene.analysis.common")) {
+  //nocommit (enable all modules): if 
(name.equals("org.apache.lucene.core") || 
name.startsWith("org.apache.lucene.analysis.")) {

Review comment:
   enable all modules here!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10328) Module path for compiling and running tests is wrong

2022-01-04 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10328:
-
Attachment: image-2022-01-04-16-04-56-563.png

> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10328) Module path for compiling and running tests is wrong

2022-01-04 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468676#comment-17468676
 ] 

Dawid Weiss commented on LUCENE-10328:
--

I've been working on this for too long and I am nowhere nearer a perfect 
solution so I think we should just merge what's currently done on PR 571. Here 
is what is currently implemented:

!image-2022-01-04-16-04-56-563.png|width=589,height=532!

The diagram above shows two of gradle's main source sets: the main source set 
(a project's "jar") and the test source set (test classes). In theory (and 
often in practice) there can be more source sets but let's limit ourselves to 
just this.

Each source set can be class-based or modular (containing a top-level 
module-info.java). On an intersection of the matching option you can see how 
this particular arrangement of the main and test source set is currently 
resolved for compilation and runtime tasks. In short: we do have full support 
for either full classpath mode testing (right lower corner) or full modular 
tests (upper left corner). A mix of the two requires module patching magic that 
I simply don't believe the infrastructure (or people) are ready for. I couldn't 
get IDEs to work with those options, I couldn't get other tasks and 
dependencies like ECJ to parse them gracefully. I am exhausted looking through 
various options and, in all honesty, I don't think this bit is so much needed. 
Let me explain.

The current way tests work is fine in my opinion. They work in classpath mode 
(so they can access package-scope, etc.) with the main source set also working 
in "classpath mode". This is what majority of downstream Lucene consumers will 
be using for a long time in the future (my guess) so leaving tests there seems 
like a reasonable option.

For any tests requiring a full modular setup, we also have a working 
infrastructure - you have to create a separate subproject with a suffix 
".tests", then create a fully modular test source set referencing the module 
you're interested in testing. Examples are provided in the form of 
"lucene/core.tests" and "lucene/analysis/morfologik.tests" - these are fully 
fledged modular tests, running on of the test-framework (module, yay!).

I realize this is a bit inconvenient but if you want the upper right corner of 
the diagram to work I have to warn you I've tried all the options below and 
none of them worked well (again: IDE support, various other build tasks):
 * main source set patching,
 * test source set patching,
 * combining main+test into a separate artifact,
 * plus perhaps a gazillion other try-reverse attempts I made.

 

> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module

2022-01-04 Thread GitBox


rmuir commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1004897909


   This is gonna be fun :) Thanks for starting this! It is almost guaranteed to 
find some bugs. Previously only `analyzers-common` got this level of testing. I 
will help wrestle these tests some this evening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module

2022-01-04 Thread GitBox


dweiss commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1004900949


   @uschindler - can you take a look at #571? Or we can merge that branch here 
too. I've converted non-distribution .tests projects there to utilize the test 
framework and generally improved the handling of modular paths (even though the 
module patching defeated me).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs

2022-01-04 Thread GitBox


mayya-sharipova commented on a change in pull request #580:
URL: https://github.com/apache/lucene/pull/580#discussion_r778170401



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java
##
@@ -559,6 +561,12 @@ public void testDeleteAllVectorDocs() throws Exception {
 VectorValues values = getOnlyLeafReader(r).getVectorValues("v");
 assertNotNull(values);
 assertEquals(0, values.size());
+
+// assert that knn search doesn't fail on a field with all deleted docs
+IndexSearcher searcher = newSearcher(r);

Review comment:
   @jtibshirani Thanks for the review. I've adjusted the test in 
b4dfe087bf4a2f8686c1d90f1f658389b94f3f1f




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check

2022-01-04 Thread GitBox


dweiss commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778171354



##
File path: lucene/analysis/integration.tests/src/test/module-info.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Test module for global integration tests of all {@code 
org.apache.lucene.analysis}
+ * packages/modules.
+ */
+@SuppressWarnings({"requires-automatic"})
+module org.apache.lucene.analysis.tests {
+  requires java.xml;
+  requires org.apache.lucene.core;
+  requires org.apache.lucene.analysis.common;
+  requires org.apache.lucene.analysis.icu;
+  requires org.apache.lucene.analysis.kuromoji;
+  requires org.apache.lucene.analysis.morfologik;
+  requires org.apache.lucene.analysis.nori;
+  requires org.apache.lucene.analysis.opennlp;
+  requires org.apache.lucene.analysis.phonetic;
+  requires org.apache.lucene.analysis.smartcn;
+  requires org.apache.lucene.analysis.stempel;
+  requires org.apache.lucene.test_framework;
+  requires junit;

Review comment:
   Nice! I made the test_framework expose this as a transitive module (and 
randomized testing). It's the API layer of that module and makes other 
descriptors simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778140193



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.tests;
+
+import java.io.IOException;
+import java.lang.module.ResolvedModule;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.junit.Assert;
+
+/** Discovers all classes from the module graph and loads them (without 
initialization) */
+abstract class ModuleClassDiscovery {
+
+  private static final Module THIS_MODULE = 
ModuleClassDiscovery.class.getModule();
+  private static final ModuleLayer LAYER = THIS_MODULE.getLayer();
+  private static final SortedMap ALL_ANALYSIS_MODULES;
+
+  static {
+Assert.assertTrue(
+"Analysis integration tests must run in Java Module System as named 
module",
+THIS_MODULE.isNamed());
+Assert.assertNotNull("Module layer is missing", LAYER);
+
+var mods = new TreeMap();
+discoverAnalysisModules(LAYER, mods);
+ALL_ANALYSIS_MODULES = Collections.unmodifiableSortedMap(mods);
+if (LuceneTestCase.VERBOSE) {
+  System.out.println(
+  "Discovered the following analysis modules: " + 
ALL_ANALYSIS_MODULES.keySet());
+}
+  }
+
+  private static void discoverAnalysisModules(
+  ModuleLayer layer, Map result) {
+for (var mod : layer.configuration().modules()) {
+  String name = mod.name();
+  if (Objects.equals(name, THIS_MODULE.getName())) {
+continue;
+  }
+  if (name.equals("org.apache.lucene.core") || 
name.equals("org.apache.lucene.analysis.common")) {
+  //nocommit (enable all modules): if 
(name.equals("org.apache.lucene.core") || 
name.startsWith("org.apache.lucene.analysis.")) {

Review comment:
   enable all modules here!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778172990



##
File path: lucene/analysis/integration.tests/src/test/module-info.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Test module for global integration tests of all {@code 
org.apache.lucene.analysis}
+ * packages/modules.
+ */
+@SuppressWarnings({"requires-automatic"})
+module org.apache.lucene.analysis.tests {
+  requires java.xml;
+  requires org.apache.lucene.core;
+  requires org.apache.lucene.analysis.common;
+  requires org.apache.lucene.analysis.icu;
+  requires org.apache.lucene.analysis.kuromoji;
+  requires org.apache.lucene.analysis.morfologik;
+  requires org.apache.lucene.analysis.nori;
+  requires org.apache.lucene.analysis.opennlp;
+  requires org.apache.lucene.analysis.phonetic;
+  requires org.apache.lucene.analysis.smartcn;
+  requires org.apache.lucene.analysis.stempel;
+  requires org.apache.lucene.test_framework;
+  requires junit;

Review comment:
   Ok I wanted to cleanup the requires clauses at end. We may need to add 
more transitive stuff, especially when we expose APIs or required modules, 
otherwise we will get "exports" warnings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo

2022-01-04 Thread GitBox


uschindler commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1004909845


   > @uschindler - can you take a look at #571? Or we can merge that branch 
here too. I've converted non-distribution .tests projects there to utilize the 
test framework and generally improved the handling of modular paths (even 
though the module patching defeated me).
   
   I will check later, I am a bit busy with argument providers now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check

2022-01-04 Thread GitBox


dweiss commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778173223



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.tests;
+
+import java.io.IOException;
+import java.lang.module.ResolvedModule;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.junit.Assert;
+
+/** Discovers all classes from the module graph and loads them (without 
initialization) */
+abstract class ModuleClassDiscovery {
+
+  private static final Module THIS_MODULE = 
ModuleClassDiscovery.class.getModule();
+  private static final ModuleLayer LAYER = THIS_MODULE.getLayer();
+  private static final SortedMap ALL_ANALYSIS_MODULES;
+
+  static {

Review comment:
   it'd be great if this wasn't a static final but a lazily-computed static 
- static finals are, from test framework point of view, sort of evil because 
they cannot be ordered reasonably with respect to test execution lifecycle. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check

2022-01-04 Thread GitBox


dweiss commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778176703



##
File path: lucene/analysis/integration.tests/src/test/module-info.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Test module for global integration tests of all {@code 
org.apache.lucene.analysis}
+ * packages/modules.
+ */
+@SuppressWarnings({"requires-automatic"})
+module org.apache.lucene.analysis.tests {
+  requires java.xml;
+  requires org.apache.lucene.core;
+  requires org.apache.lucene.analysis.common;
+  requires org.apache.lucene.analysis.icu;
+  requires org.apache.lucene.analysis.kuromoji;
+  requires org.apache.lucene.analysis.morfologik;
+  requires org.apache.lucene.analysis.nori;
+  requires org.apache.lucene.analysis.opennlp;
+  requires org.apache.lucene.analysis.phonetic;
+  requires org.apache.lucene.analysis.smartcn;
+  requires org.apache.lucene.analysis.stempel;
+  requires org.apache.lucene.test_framework;
+  requires junit;

Review comment:
   I needed that particular one so I piggybacked it on the PR. There is 
LUCENE-10323 to tackle the entire set of module descriptors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778176445



##
File path: lucene/analysis/integration.tests/src/test/module-info.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Test module for global integration tests of all {@code 
org.apache.lucene.analysis}
+ * packages/modules.
+ */
+@SuppressWarnings({"requires-automatic"})
+module org.apache.lucene.analysis.tests {
+  requires java.xml;
+  requires org.apache.lucene.core;
+  requires org.apache.lucene.analysis.common;
+  requires org.apache.lucene.analysis.icu;
+  requires org.apache.lucene.analysis.kuromoji;
+  requires org.apache.lucene.analysis.morfologik;
+  requires org.apache.lucene.analysis.nori;
+  requires org.apache.lucene.analysis.opennlp;
+  requires org.apache.lucene.analysis.phonetic;
+  requires org.apache.lucene.analysis.smartcn;
+  requires org.apache.lucene.analysis.stempel;
+  requires org.apache.lucene.test_framework;
+  requires junit;

Review comment:
   I will wait for the other branch to be merged and remove the requires 
here. If I see other places that needs transitive, i will fix. E.g. ICU may 
need to be transitive.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778176916



##
File path: lucene/analysis/integration.tests/src/test/module-info.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Test module for global integration tests of all {@code 
org.apache.lucene.analysis}
+ * packages/modules.
+ */
+@SuppressWarnings({"requires-automatic"})
+module org.apache.lucene.analysis.tests {
+  requires java.xml;
+  requires org.apache.lucene.core;
+  requires org.apache.lucene.analysis.common;
+  requires org.apache.lucene.analysis.icu;
+  requires org.apache.lucene.analysis.kuromoji;
+  requires org.apache.lucene.analysis.morfologik;
+  requires org.apache.lucene.analysis.nori;
+  requires org.apache.lucene.analysis.opennlp;
+  requires org.apache.lucene.analysis.phonetic;
+  requires org.apache.lucene.analysis.smartcn;
+  requires org.apache.lucene.analysis.stempel;
+  requires org.apache.lucene.test_framework;
+  requires junit;

Review comment:
   At least this "integration test" shows very well what is missing in our 
modules.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] (LUCENE-10344) ECJ batch compiler does not support --patch-module at all

2022-01-04 Thread Dawid Weiss (Jira)


[ https://issues.apache.org/jira/browse/LUCENE-10344 ]


Dawid Weiss deleted comment on LUCENE-10344:
--

was (Author: JIRAUSER282805):
My account is hacked. I'm not sure what is going on.




> ECJ batch compiler does not support --patch-module at all
> -
>
> Key: LUCENE-10344
> URL: https://issues.apache.org/jira/browse/LUCENE-10344
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Minor
>
> There is no way to use module patching with ecj. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10344) ECJ batch compiler does not support --patch-module at all

2022-01-04 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468698#comment-17468698
 ] 

Dawid Weiss commented on LUCENE-10344:
--

Sorry, I just noticed your comment. There is no easy way to untie all the 
relationships between tasks, gradle configurations and sourcesets - so if you 
modify one, it typically entails all others. I gave up on module patching - 
couldn't get it to work on many other fronts as well.

> ECJ batch compiler does not support --patch-module at all
> -
>
> Key: LUCENE-10344
> URL: https://issues.apache.org/jira/browse/LUCENE-10344
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Minor
>
> There is no way to use module patching with ecj. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10344) ECJ batch compiler does not support --patch-module at all

2022-01-04 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10344.
--
Resolution: Won't Fix

> ECJ batch compiler does not support --patch-module at all
> -
>
> Key: LUCENE-10344
> URL: https://issues.apache.org/jira/browse/LUCENE-10344
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Minor
>
> There is no way to use module patching with ecj. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs

2022-01-04 Thread GitBox


mayya-sharipova commented on a change in pull request #580:
URL: https://github.com/apache/lucene/pull/580#discussion_r778185295



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##
@@ -239,6 +239,9 @@ public TopDocs search(String field, float[] target, int k, 
Bits acceptDocs) thro
 if (fieldEntry == null || fieldEntry.dimension == 0) {
   return null;
 }
+if (fieldEntry.size() == 0) {

Review comment:
   @jpountz Thanks for the review. 
   
   > the above code that checks if (fieldEntry == null || fieldEntry.dimension 
== 0) is dead code
   
   I've removed this dead code in f380e54fea020dd4502d36043d33d09996f72277




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10353) Add null injection to analyzer integration tests (e.g. TestRandomChains)

2022-01-04 Thread Robert Muir (Jira)
Robert Muir created LUCENE-10353:


 Summary: Add null injection to analyzer integration tests (e.g. 
TestRandomChains)
 Key: LUCENE-10353
 URL: https://issues.apache.org/jira/browse/LUCENE-10353
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


These tests inject random parameter values (from argumentProviders). Some 
generated values may be illegal and IllegalArgumentException is "allowed" if 
the constructor returns it. None of the values should cause failures at runtime.

But for object types, we never inject null values (unless the argumentProvider 
were to do it itself). We should do this some low % of the time, and "allow" 
ctors to return NPE too.

I see bugs in some of the analyzers where they are just a missing null check in 
the constructor. It is important to fail on invalid configuration up-front in 
the ctor, rather than failing e.g. at index time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10354) Clarify contract of codec APIs with missing/disabled fields

2022-01-04 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-10354:
-

 Summary: Clarify contract of codec APIs with missing/disabled 
fields
 Key: LUCENE-10354
 URL: https://issues.apache.org/jira/browse/LUCENE-10354
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand


The question has come up  a few times of how codec APIs should react to fields 
that are missing or do not have the relevant feature enabled.

This issue proposes that we improve javadocs and AssertingCodec following the 
same model as doc values and norms:
 - The behavior of codec APIs on fields that are missing or don't have the 
feature enabled is undefined.
 - CodecReader is responsible for checking FieldInfos before delegating to 
codec APIs.
 - AssertingCodec ensures that we never call codec APIs on missing/disabled 
fields.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz opened a new pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.

2022-01-04 Thread GitBox


jpountz opened a new pull request #583:
URL: https://github.com/apache/lucene/pull/583


   See https://issues.apache.org/jira/browse/LUCENE-10354


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs

2022-01-04 Thread GitBox


jpountz commented on a change in pull request #580:
URL: https://github.com/apache/lucene/pull/580#discussion_r778216878



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##
@@ -239,6 +239,9 @@ public TopDocs search(String field, float[] target, int k, 
Bits acceptDocs) thro
 if (fieldEntry == null || fieldEntry.dimension == 0) {
   return null;
 }
+if (fieldEntry.size() == 0) {

Review comment:
   Thanks Mayya. I opened an issue to clarify these contracts. 
https://issues.apache.org/jira/browse/LUCENE-10354




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-04 Thread Feng Guo (Jira)
Feng Guo created LUCENE-10355:
-

 Summary: Remove EMPTY LongValues in favor of LongValues#ZERO
 Key: LUCENE-10355
 URL: https://issues.apache.org/jira/browse/LUCENE-10355
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Feng Guo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-04 Thread Feng Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10355:
--
Component/s: core/codecs

> Remove EMPTY LongValues in favor of LongValues#ZERO
> ---
>
> Key: LUCENE-10355
> URL: https://issues.apache.org/jira/browse/LUCENE-10355
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-04 Thread Feng Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10355:
--
Description: Remove EMPTY LongValues in favor of LongValues#ZEROS

> Remove EMPTY LongValues in favor of LongValues#ZERO
> ---
>
> Key: LUCENE-10355
> URL: https://issues.apache.org/jira/browse/LUCENE-10355
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>
> Remove EMPTY LongValues in favor of LongValues#ZEROS



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 closed pull request #584: LUCENE-10355: Remove EMPTY LongValues in favor of LongValues#ZEROS

2022-01-04 Thread GitBox


gf2121 closed pull request #584:
URL: https://github.com/apache/lucene/pull/584


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468745#comment-17468745
 ] 

Uwe Schindler commented on LUCENE-10352:


I found a missing TokenFilter in META_INF/services and also the module: 
{{org.apache.lucene.analysis.ko.KoreanNumberFilterFactory}}

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system

2022-01-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468760#comment-17468760
 ] 

Uwe Schindler commented on LUCENE-10352:


I also added a new annotation to lucene.core that allows to mark classes and 
constructors that should not be tested with random chains: 
{{@IgnoreRandomChains}}

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778280755



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.tests;
+
+import java.io.IOException;
+import java.lang.module.ResolvedModule;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.junit.Assert;
+
+/** Discovers all classes from the module graph and loads them (without 
initialization) */
+abstract class ModuleClassDiscovery {
+
+  private static final Module THIS_MODULE = 
ModuleClassDiscovery.class.getModule();
+  private static final ModuleLayer LAYER = THIS_MODULE.getLayer();
+  private static final SortedMap ALL_ANALYSIS_MODULES;
+
+  static {

Review comment:
   This is not a test!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778283613



##
File path: 
lucene/analysis/integration.tests/src/test/org/apache/lucene/analysis/tests/ModuleClassDiscovery.java
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.tests;
+
+import java.io.IOException;
+import java.lang.module.ResolvedModule;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.junit.Assert;
+
+/** Discovers all classes from the module graph and loads them (without 
initialization) */
+abstract class ModuleClassDiscovery {
+
+  private static final Module THIS_MODULE = 
ModuleClassDiscovery.class.getModule();
+  private static final ModuleLayer LAYER = THIS_MODULE.getLayer();
+  private static final SortedMap ALL_ANALYSIS_MODULES;
+
+  static {

Review comment:
   The initialization of this class is only done on first call, which is 
not inside test's class laoding. the code initializes when the first test uses 
the getter.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a change in pull request #580: LUCENE-10351 Correct knn search failure with deleted docs

2022-01-04 Thread GitBox


mayya-sharipova commented on a change in pull request #580:
URL: https://github.com/apache/lucene/pull/580#discussion_r778314409



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##
@@ -239,6 +239,9 @@ public TopDocs search(String field, float[] target, int k, 
Bits acceptDocs) thro
 if (fieldEntry == null || fieldEntry.dimension == 0) {
   return null;
 }
+if (fieldEntry.size() == 0) {

Review comment:
   Thanks Adrien, since the removal of this dead code is already addressed 
in your PR, I will revert my last commit from this PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a change in pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.

2022-01-04 Thread GitBox


jtibshirani commented on a change in pull request #583:
URL: https://github.com/apache/lucene/pull/583#discussion_r778317069



##
File path: lucene/core/src/java/org/apache/lucene/codecs/FieldsProducer.java
##
@@ -42,6 +45,14 @@ protected FieldsProducer() {}
*/
   public abstract void checkIntegrity() throws IOException;
 
+  /**
+   * Get the {@link Terms} for this field. The behavior is undefined if the 
field doesn't have

Review comment:
   Do we need to update the javadoc on `Fields` too? It says "This will 
return null if the field does not exist."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover clas

2022-01-04 Thread GitBox


sonatype-lift[bot] commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778334098



##
File path: lucene/core/src/java/org/apache/lucene/util/IgnoreRandomChains.java
##
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotation to not test a class or constructor with {@code TestRandomChains} 
integration test.
+ *
+ * @lucene.internal

Review comment:
   *InvalidBlockTag:*  Tag name `lucene.internal` is unknown. If this is a 
commonly-used custom tag, please click 'not useful' and file a bug. 
[(details)](https://errorprone.info/bugpattern/InvalidBlockTag)
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] akkig opened a new pull request #2636: SOLR-15892: Adding Randomization in selecting shard leaders for delete and commit requests for RoutedAlias

2022-01-04 Thread GitBox


akkig opened a new pull request #2636:
URL: https://github.com/apache/lucene-solr/pull/2636


   …e and commit requests for RoutedAlias


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova merged pull request #580: LUCENE-10351 Correct knn search failure with deleted docs

2022-01-04 Thread GitBox


mayya-sharipova merged pull request #580:
URL: https://github.com/apache/lucene/pull/580


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10351) Correct knn search failure with all deleted docs

2022-01-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468847#comment-17468847
 ] 

ASF subversion and git services commented on LUCENE-10351:
--

Commit 78da7030370102d024afc9945965ba3ae7195823 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=78da703 ]

LUCENE-10351 Correct knn search failure with deleted docs (#580)

Current when doing knn search on an segment where all documents
with knn field were deleted, we get the following error:

maxSize must be > 0 and < 2147483630; got: 0
java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0
at 
__randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
at org.apache.lucene.util.LongHeap.(LongHeap.java:57)
at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69)
at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
at 
org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41)
at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#

This patch fixes this error and ensures empty TopDocs are returned when
knn field doesn't have any documents left.

> Correct knn search failure with  all deleted docs
> -
>
> Key: LUCENE-10351
> URL: https://issues.apache.org/jira/browse/LUCENE-10351
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Trivial
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current when doing knn search on an segment where all documents with knn 
> field were deleted, we get the following error:
> maxSize must be > 0 and < 2147483630; got: 0
> java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; 
> got: 0
> at 
> __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
> at org.apache.lucene.util.LongHeap.(LongHeap.java:57)
> at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69)
> at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
> at 
> org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41)
> at 
> org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#
> A desired behaviour: instead of an error,  an empty TopDocs should be 
> returned. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10351) Correct knn search failure with all deleted docs

2022-01-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468876#comment-17468876
 ] 

ASF subversion and git services commented on LUCENE-10351:
--

Commit 0a1cf3108468e31f06cabfd4154b69189dcf6e79 in lucene's branch 
refs/heads/branch_9x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0a1cf31 ]

LUCENE-10351 Correct knn search failure with deleted docs (#580)

Current when doing knn search on an segment where all documents
with knn field were deleted, we get the following error:

maxSize must be > 0 and < 2147483630; got: 0
java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0
at 
__randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
at org.apache.lucene.util.LongHeap.(LongHeap.java:57)
at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69)
at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
at 
org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41)
at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#

This patch fixes this error and ensures empty TopDocs are returned when
knn field doesn't have any documents left.

> Correct knn search failure with  all deleted docs
> -
>
> Key: LUCENE-10351
> URL: https://issues.apache.org/jira/browse/LUCENE-10351
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Trivial
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current when doing knn search on an segment where all documents with knn 
> field were deleted, we get the following error:
> maxSize must be > 0 and < 2147483630; got: 0
> java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; 
> got: 0
> at 
> __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
> at org.apache.lucene.util.LongHeap.(LongHeap.java:57)
> at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69)
> at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
> at 
> org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41)
> at 
> org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#
> A desired behaviour: instead of an error,  an empty TopDocs should be 
> returned. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10351) Correct knn search failure with all deleted docs

2022-01-04 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova resolved LUCENE-10351.
--
Fix Version/s: 9.1
   Resolution: Fixed

> Correct knn search failure with  all deleted docs
> -
>
> Key: LUCENE-10351
> URL: https://issues.apache.org/jira/browse/LUCENE-10351
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current when doing knn search on an segment where all documents with knn 
> field were deleted, we get the following error:
> maxSize must be > 0 and < 2147483630; got: 0
> java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; 
> got: 0
> at 
> __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
> at org.apache.lucene.util.LongHeap.(LongHeap.java:57)
> at org.apache.lucene.util.LongHeap$1.(LongHeap.java:69)
> at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
> at 
> org.apache.lucene.util.hnsw.NeighborQueue.(NeighborQueue.java:41)
> at 
> org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#
> A desired behaviour: instead of an error,  an empty TopDocs should be 
> returned. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to c

2022-01-04 Thread GitBox


uschindler commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778421121



##
File path: lucene/core/src/java/org/apache/lucene/util/IgnoreRandomChains.java
##
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotation to not test a class or constructor with {@code TestRandomChains} 
integration test.
+ *
+ * @lucene.internal

Review comment:
   ignore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a change in pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover clas

2022-01-04 Thread GitBox


sonatype-lift[bot] commented on a change in pull request #582:
URL: https://github.com/apache/lucene/pull/582#discussion_r778421146



##
File path: lucene/core/src/java/org/apache/lucene/util/IgnoreRandomChains.java
##
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotation to not test a class or constructor with {@code TestRandomChains} 
integration test.
+ *
+ * @lucene.internal

Review comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()

2022-01-04 Thread GitBox


gsmiller commented on pull request #578:
URL: https://github.com/apache/lucene/pull/578#issuecomment-1005246240


   Nice find! With this kind of speedup just for avoiding the null check inside 
the counting loop, I wonder if it's worth generalizing beyond countAll? What 
about relaxing visibility of both `values` and `sparseValues` to `protected` 
and writing specialized counting logic for both `count` and `countAll` based on 
which is non-null? Essentially by-pass the `increment` abstraction provided by 
`IntTaxonomyFacets` for all cases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module

2022-01-04 Thread GitBox


rmuir commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1005247521


   elsewhere we have the integration tests as `core.tests` and 
`distribution.tests`, should this be `analysis.tests`? Currently I am messing 
around in this branch with `./gradlew -p lucene/analysis/integration.tests test`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #574: LUCENE-10346: Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll()

2022-01-04 Thread GitBox


gsmiller commented on pull request #574:
URL: https://github.com/apache/lucene/pull/574#issuecomment-1005247653


   Great find! For what it's worth, this was overlooked when specializing SSDV 
in #255 because it was still using binary doc values at the time. When I 
switched over to numeric doc values, it looks like I overlooked adding this 
specialization. Thanks for the improvement!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting

2022-01-04 Thread Greg Miller (Jira)
Greg Miller created LUCENE-10356:


 Summary: Special-case singleton doc values for general taxonomy 
facet counting
 Key: LUCENE-10356
 URL: https://issues.apache.org/jira/browse/LUCENE-10356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Greg Miller


Inspired by [https://github.com/apache/lucene/pull/574,] we should also 
special-case singleton dvs in the general count path (#573 specialized it for 
countAll).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting

2022-01-04 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468917#comment-17468917
 ] 

Greg Miller commented on LUCENE-10356:
--

PR coming shortly :)

> Special-case singleton doc values for general taxonomy facet counting
> -
>
> Key: LUCENE-10356
> URL: https://issues.apache.org/jira/browse/LUCENE-10356
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
>
> Inspired by [https://github.com/apache/lucene/pull/574,] we should also 
> special-case singleton dvs in the general count path (#573 specialized it for 
> countAll).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting

2022-01-04 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller reassigned LUCENE-10356:


Assignee: Greg Miller

> Special-case singleton doc values for general taxonomy facet counting
> -
>
> Key: LUCENE-10356
> URL: https://issues.apache.org/jira/browse/LUCENE-10356
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
>
> Inspired by [https://github.com/apache/lucene/pull/574,] we should also 
> special-case singleton dvs in the general count path (#573 specialized it for 
> countAll).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo

2022-01-04 Thread GitBox


uschindler commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1005287391


   > elsewhere we have the integration tests as `core.tests` and 
`distribution.tests`, should this be `analysis.tests`? Currently I am messing 
around in this branch with `./gradlew -p lucene/analysis/integration.tests test`
   
   The name does not matter, only the suffix `.tests` matters.
   
   There are 2 possibilities:
   - below the `analysis/` subdirectory (like at moment) with a more generic 
name `integration.tests`
   - next to `analysis` top level folder as `analysis.tests`
   
   To me it does not matter, both works. I like name "integration" more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo

2022-01-04 Thread GitBox


uschindler commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1005287982


   I ran the whole thing the following way:
   
   ```
   gradlew :lucene:analysis:integration.tests:beast -Dtests.dups=1000
   ```
   
   No failures yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo

2022-01-04 Thread GitBox


uschindler commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1005291698


   @rmuir I found a failure:
   
   ```
   $ gradlew :lucene:analysis:integration.tests:test --tests 
TestRandomChains.testRandomChainsWithLargeStrings -Dtests.seed=AA632771CC823702 
-Dtests.slow=true -Dtests.locale=fr-MF -Dtests.timezone=America/Panama 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   org.apache.lucene.analysis.tests.TestRandomChains > test suite's output 
saved to C:\Users\Uwe 
Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
 copied below:
 2> stage 0: ­ÉÆû<[0-2] +1> ­ÉÆä<[4-6] +1> ppkarrpf<[7-14] +1> 1<[16-17] 
+1> 5<[18-19] +1>
 2> stage 1: ­ÉÆû<[0-2] +1> ­ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] 
+1> 759700<[7-14] +0> 1<[16-17] +1> 5<[18-19] +1> 00<[18-19] +0>
 2> stage 2: ­ÉÆû<[0-2] +1> ­ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] 
+1> 759700<[7-14] +0> 1<[16-17] +1> 00<[18-19] +0>
 2> TEST FAIL: useCharFilter=true 
text='\ud801\udc96\ud801\udcaa\ud801\udc84 ppkarpf {1,5}g?)u em mbm hbil'
 2> Exception from random analyzer:
 2> charfilters=
 2>   
org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@105e6aa7,
 true, false)
 2> tokenizer=
 2>   org.apache.lucene.analysis.th.ThaiTokenizer()
 2> filters=
 2>   
Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@79889b7f
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
 true)
 2>   
org.apache.lucene.analysis.ja.JapaneseNumberFilter(ValidatingTokenFilter@53a9e96c
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
 2>   
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter(ValidatingTokenFilter@6cb4578d
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false,
 
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter$StemmerOverrideMap@51fc8124)
  > java.lang.IllegalStateException: stage 2: inconsistent startOffset 
at pos=3: 16 vs 18; token=00
  > at 
__randomizedtesting.SeedInfo.seed([AA632771CC823702:C038986095CC17F1]:0)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138)
  > at 
org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.incrementToken(StemmerOverrideFilter.java:67)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922)
  > at 
org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943)
  > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  > at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  > at 
randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
  > at 
randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
  > at 
randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
  > at 
randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  >   

[GitHub] [lucene] uschindler commented on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from mo

2022-01-04 Thread GitBox


uschindler commented on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1005293671


   Another one:
   
   ```
   $ gradlew :lucene:analysis:integration.tests:test --tests 
TestRandomChains.testRandomChains -Dtests.seed=3A0D0E91E0CA5BFC 
-Dtests.slow=true -Dtests.locale=nmg-CM -Dtests.timezone=Antarctica/Vostok 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   org.apache.lucene.analysis.tests.TestRandomChains > test suite's output 
saved to C:\Users\Uwe 
Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test_17\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
 copied below:
 2> TEST FAIL: useCharFilter=false text=''
 2> Exception from random analyzer:
 2> charfilters=
 2>   
org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@7ee7c045)
 2>   
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter@66bba53d,
 [])
 2> tokenizer=
 2>   org.apache.lucene.analysis.core.KeywordTokenizer(27)
 2> filters=
 2>   
org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter(ValidatingTokenFilter@1a8623a7
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false,
 -21)
  > java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds 
for length 32
  > at 
__randomizedtesting.SeedInfo.seed([3A0D0E91E0CA5BFC:7EC27F0A7D8463C]:0)
  > at 
org.apache.lucene.analysis.kuromoji@10.0.0-SNAPSHOT/org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.stem(JapaneseKatakanaStemFilter.java:76)
  > at 
org.apache.lucene.analysis.kuromoji@10.0.0-SNAPSHOT/org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:59)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922)
  > at 
org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains(TestRandomChains.java:911)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler edited a comment on pull request #582: LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check

2022-01-04 Thread GitBox


uschindler edited a comment on pull request #582:
URL: https://github.com/apache/lucene/pull/582#issuecomment-1005291698


   @rmuir I found a failure:
   
   ```
   $ gradlew :lucene:analysis:integration.tests:test --tests 
TestRandomChains.testRandomChainsWithLargeStrings -Dtests.seed=AA632771CC823702 
-Dtests.slow=true -Dtests.locale=fr-MF -Dtests.timezone=America/Panama 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   org.apache.lucene.analysis.tests.TestRandomChains > test suite's output 
saved to C:\Users\Uwe 
Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
 copied below:
 2> stage 0: ­ÉÆû<[0-2] +1> ­ÉÆä<[4-6] +1> ppkarrpf<[7-14] +1> 1<[16-17] 
+1> 5<[18-19] +1>
 2> stage 1: ­ÉÆû<[0-2] +1> ­ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] 
+1> 759700<[7-14] +0> 1<[16-17] +1> 5<[18-19] +1> 00<[18-19] +0>
 2> stage 2: ­ÉÆû<[0-2] +1> ­ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] 
+1> 759700<[7-14] +0> 1<[16-17] +1> 00<[18-19] +0>
 2> TEST FAIL: useCharFilter=true 
text='\ud801\udc96\ud801\udcaa\ud801\udc84 ppkarpf {1,5}g?)u em mbm hbil'
 2> Exception from random analyzer:
 2> charfilters=
 2>   
org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@105e6aa7,
 true, false)
 2> tokenizer=
 2>   org.apache.lucene.analysis.th.ThaiTokenizer()
 2> filters=
 2>   
Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@79889b7f
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
 true)
 2>   
org.apache.lucene.analysis.ja.JapaneseNumberFilter(ValidatingTokenFilter@53a9e96c
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
 2>   
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter(ValidatingTokenFilter@6cb4578d
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false,
 
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter$StemmerOverrideMap@51fc8124)
  > java.lang.IllegalStateException: stage 2: inconsistent startOffset 
at pos=3: 16 vs 18; token=00
  > at 
__randomizedtesting.SeedInfo.seed([AA632771CC823702:C038986095CC17F1]:0)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138)
  > at 
org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.incrementToken(StemmerOverrideFilter.java:67)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028)
  > at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922)
  > at 
org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9537) Add Indri Search Engine Functionality to Lucene

2022-01-04 Thread Cameron VandenBerg (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468942#comment-17468942
 ] 

Cameron VandenBerg commented on LUCENE-9537:


Hello [~jpountz]!  Thank you for finding this.  I have fixed this issue in my 
latest PR (LUCENE-10157), which I just updated tonight.  My original thought 
was that I needed the IndriScorer because I wanted to use the boosts outside of 
the Similarity just like we do in our Indri Search Engine.  However, I now feel 
that it makes more sense for the Indri scoring to use boosts in the same way 
that lucene does (inside the Similarity), which allows me to remove the `if 
(scorer instanceof IndriScorer)` logic from the IndriScorers.  I have added 
tests and found that this works as expected now.  I have also added the IndriOr 
and the IndriWeightedSum scorers as well as an IndriQueryParser to this new PR. 
 Thank you!

> Add Indri Search Engine Functionality to Lucene
> ---
>
> Key: LUCENE-9537
> URL: https://issues.apache.org/jira/browse/LUCENE-9537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Cameron VandenBerg
>Priority: Major
>  Labels: patch
> Fix For: 8.9
>
> Attachments: LUCENE-9537.patch, LUCENE-INDRI.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Indri ([http://lemurproject.org/indri.php]) is an academic search engine 
> developed by The University of Massachusetts and Carnegie Mellon University.  
> The major difference between Lucene and Indri is that Indri will give a 
> document a "smoothing score" to a document that does not contain the search 
> term, which has improved the search ranking accuracy in our experiments.  I 
> have created an Indri patch, which adds the search code needed to implement 
> the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene

2022-01-04 Thread Cameron VandenBerg (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468945#comment-17468945
 ] 

Cameron VandenBerg commented on LUCENE-10157:
-

Hello [~cpoerschke]!  I have added and updated the new PR (LUCENE-10157).  I 
have added test cases.  I am happy to work with you if you have any 
suggestions.  Thank you!

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 commented on pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()

2022-01-04 Thread GitBox


gf2121 commented on pull request #578:
URL: https://github.com/apache/lucene/pull/578#issuecomment-1005403431


   Thanks @gsmiller for suggestion!
   
   > With this kind of speedup just for avoiding the null check inside the 
counting loop.
   
   Yes, this is indeed suprising. I thought `BrowseMonthTaxoFacets` should get 
a similar QPS as `BrowseMonthSSDVFacets` since SSDV is already using a 
`count[ord]++` and its total cardinality is just 12 (converting to global 
ordinal should be cheap).  
   
   I also noticed that the stdDev is becoming much higher. It seems some of the 
JVM rounds are getting speed up but some not, I suspect some optimization in 
JVM was triggered from time to time but i'm not really sure what it is. I tried 
JDK11/16/17 and all get similar benchmark results (higher QPS & higer stdDev).
   
   > What about relaxing visibility of both values and sparseValues to 
protected and writing specialized counting logic for both count and countAll 
based on which is non-null?
   
   Nice idea, will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org