[GitHub] [lucene-solr] dweiss commented on a change in pull request #1950: LUCENE-9564: add spotless and gjf.

2020-10-08 Thread GitBox


dweiss commented on a change in pull request #1950:
URL: https://github.com/apache/lucene-solr/pull/1950#discussion_r501492773



##
File path: gradle/validation/spotless.gradle
##
@@ -0,0 +1,31 @@
+
+def resources = scriptResources(buildscript)
+ 
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'com.diffplug.spotless'
+
+spotless {
+  java {
+licenseHeaderFile file("${resources}/asl-header.txt"), '^(\\s*package)'
+lineEndings 'UNIX'
+endWithNewline()
+googleJavaFormat('1.9')
+
+// Known problematic files.
+targetExclude "**/HTMLStripCharFilter.java", 
"**/UAX29URLEmailTokenizerImpl.java", 
+   "**/PatternParser.java", "**/BuildNavDataFiles.java", 
"**/CheckLinksAndAnchors.java",
+   "**/TestSubQueryTransformer.java"

Review comment:
   Ok, fair enough. I also suggested elsewhere that we might want to just 
move those generated files into a different sourceset so that it's clear they 
are generated and they can be wiped easier upon regeneration. Or is there value 
in keeping them together, do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9570) Review code diffs after automatic formatting and correct problems before it is applied

2020-10-08 Thread Dawid Weiss (Jira)
Dawid Weiss created LUCENE-9570:
---

 Summary: Review code diffs after automatic formatting and correct 
problems before it is applied
 Key: LUCENE-9570
 URL: https://issues.apache.org/jira/browse/LUCENE-9570
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Dawid Weiss
Assignee: Dawid Weiss


Review and correct all the javadocs before they're messed up by automatic 
formatting. Apply project-by-project, review diff, correct. Lots of diffs but 
it should be relatively quick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9571) Handle generated files (exclude from automated formatting)

2020-10-08 Thread Dawid Weiss (Jira)
Dawid Weiss created LUCENE-9571:
---

 Summary: Handle generated files (exclude from automated formatting)
 Key: LUCENE-9571
 URL: https://issues.apache.org/jira/browse/LUCENE-9571
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Dawid Weiss
Assignee: Dawid Weiss


Handle generated files by excluding them from automatic formatting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on a change in pull request #1950: LUCENE-9564: add spotless and gjf.

2020-10-08 Thread GitBox


rmuir commented on a change in pull request #1950:
URL: https://github.com/apache/lucene-solr/pull/1950#discussion_r501511928



##
File path: gradle/validation/spotless.gradle
##
@@ -0,0 +1,31 @@
+
+def resources = scriptResources(buildscript)
+ 
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'com.diffplug.spotless'
+
+spotless {
+  java {
+licenseHeaderFile file("${resources}/asl-header.txt"), '^(\\s*package)'
+lineEndings 'UNIX'
+endWithNewline()
+googleJavaFormat('1.9')
+
+// Known problematic files.
+targetExclude "**/HTMLStripCharFilter.java", 
"**/UAX29URLEmailTokenizerImpl.java", 
+   "**/PatternParser.java", "**/BuildNavDataFiles.java", 
"**/CheckLinksAndAnchors.java",
+   "**/TestSubQueryTransformer.java"

Review comment:
   its worth thinking about yeah. if there was a consolidated list of them 
maybe it would make other tasks more intuitive: e.g. remove the crazy 
"Generated" matching that's in RAT and simply exclude them from license 
analysis too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1950: LUCENE-9564: add spotless and gjf.

2020-10-08 Thread GitBox


dweiss commented on a change in pull request #1950:
URL: https://github.com/apache/lucene-solr/pull/1950#discussion_r501519479



##
File path: gradle/validation/spotless.gradle
##
@@ -0,0 +1,31 @@
+
+def resources = scriptResources(buildscript)
+ 
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'com.diffplug.spotless'
+
+spotless {
+  java {
+licenseHeaderFile file("${resources}/asl-header.txt"), '^(\\s*package)'
+lineEndings 'UNIX'
+endWithNewline()
+googleJavaFormat('1.9')
+
+// Known problematic files.
+targetExclude "**/HTMLStripCharFilter.java", 
"**/UAX29URLEmailTokenizerImpl.java", 
+   "**/PatternParser.java", "**/BuildNavDataFiles.java", 
"**/CheckLinksAndAnchors.java",
+   "**/TestSubQueryTransformer.java"

Review comment:
   Ok, I'll see what can be done when I get there.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14691) Metrics reporting should avoid creating objects

2020-10-08 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14691.
-
Resolution: Fixed

Thanks Noble for review and fixes.

> Metrics reporting should avoid creating objects
> ---
>
> Key: SOLR-14691
> URL: https://issues.apache.org/jira/browse/SOLR-14691
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 8.7
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {{MetricUtils}} unnecessarily creates a lot of short-lived objects (maps and 
> lists). This affects GC, especially since metrics are frequently polled by 
> clients. We should refactor it to use {{MapWriter}} as much as possible.
> Alternatively we could provide our wrappers or subclasses of Codahale metrics 
> that implement {{MapWriter}}, then a lot of complexity in {{MetricUtils}} 
> wouldn't be needed at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram opened a new pull request #1962: SOLR-14749 Provide a clean API for cluster-level event processing

2020-10-08 Thread GitBox


sigram opened a new pull request #1962:
URL: https://github.com/apache/lucene-solr/pull/1962


   This is just the API part of the ticket, separated from PR-1758 to 
facilitate reviews of pure API.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-10-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210131#comment-17210131
 ] 

Andrzej Bialecki commented on SOLR-14749:
-

[~noble.paul] please see PR-1962 that contains just the proposed APIs, no other 
changes (and no implementation).

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on pull request #1951: SOLR-14691: Reduce object creation by using MapWriter / IteratorWriter.

2020-10-08 Thread GitBox


sigram commented on pull request #1951:
URL: https://github.com/apache/lucene-solr/pull/1951#issuecomment-705484706


   Merged - thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram closed pull request #1951: SOLR-14691: Reduce object creation by using MapWriter / IteratorWriter.

2020-10-08 Thread GitBox


sigram closed pull request #1951:
URL: https://github.com/apache/lucene-solr/pull/1951


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-10-08 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210134#comment-17210134
 ] 

Noble Paul commented on SOLR-14749:
---

[~ab]

The APIs themselves are pretty clean .

Can we have {{ClusterSingleton}} impl with testcase and merge it in a single PR?

That way we can get one thing out of the way soon.

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on pull request #1953: SOLR-14917: Move DOMUtil and PropertiesUtil to SolrJ

2020-10-08 Thread GitBox


noblepaul commented on pull request #1953:
URL: https://github.com/apache/lucene-solr/pull/1953#issuecomment-705486732


   I have no objection to moving them. 
   
   But is there any particular reason why you wish to move them? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14094) TestSolrCachePerf is flaky

2020-10-08 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14094.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

This hasn't been failing for a while - resolving as Fixed.

> TestSolrCachePerf is flaky
> --
>
> Key: SOLR-14094
> URL: https://issues.apache.org/jira/browse/SOLR-14094
> Project: Solr
>  Issue Type: Test
>Reporter: Adrien Grand
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> I hit the below failure while building the RC.
> {noformat}
>[junit4] Suite: org.apache.solr.search.TestSolrCachePerf
>[junit4]   2> 921086 INFO  
> (SUITE-TestSolrCachePerf-seed#[1D407DF9BFA38129]-worker) [ ] 
> o.a.s.SolrTestCaseJ4 Created dataDir: 
> /home/jpountz/.lucene-releases/8.4.0/lucene-solr/solr/build/solr-core/test/J2/temp/solr.search.TestSolrCachePerf_1D407DF9BFA38129-001/data-dir-97-001
>[junit4]   2> 921086 WARN  
> (SUITE-TestSolrCachePerf-seed#[1D407DF9BFA38129]-worker) [ ] 
> o.a.s.SolrTestCaseJ4 startTrackingSearchers: numOpens=16 numCloses=16
>[junit4]   2> 921086 INFO  
> (SUITE-TestSolrCachePerf-seed#[1D407DF9BFA38129]-worker) [ ] 
> o.a.s.SolrTestCaseJ4 Using PointFields (NUMERIC_POINTS_SYSPROP=true) 
> w/NUMERIC_DOCVALUES_SYSPROP=false
>[junit4]   2> 921087 INFO  
> (SUITE-TestSolrCachePerf-seed#[1D407DF9BFA38129]-worker) [ ] 
> o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) via: 
> @org.apache.solr.util.RandomizeSSL(reason=, value=NaN, ssl=NaN, 
> clientAuth=NaN)
>[junit4]   2> 921087 INFO  
> (SUITE-TestSolrCachePerf-seed#[1D407DF9BFA38129]-worker) [ ] 
> o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: 
> test.solr.allowed.securerandom=null & java.security.egd=file:/dev/./urandom
>[junit4]   2> 921088 INFO  
> (TEST-TestSolrCachePerf.testGetPutCompute-seed#[1D407DF9BFA38129]) [ ] 
> o.a.s.SolrTestCaseJ4 ###Starting testGetPutCompute
>[junit4]   2> 927857 INFO  
> (TEST-TestSolrCachePerf.testGetPutCompute-seed#[1D407DF9BFA38129]) [ ] 
> o.a.s.SolrTestCaseJ4 ###Ending testGetPutCompute
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSolrCachePerf 
> -Dtests.method=testGetPutCompute -Dtests.seed=1D407DF9BFA38129 
> -Dtests.slow=true -Dtests.locale=ar-QA -Dtests.timezone=AGT 
> -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
>[junit4] FAILURE 6.77s J2 | TestSolrCachePerf.testGetPutCompute <<<
>[junit4]> Throwable #1: java.lang.AssertionError: Cache FastLRUCache: 
> compute ratio should be higher or equal to get/put ratio: 0.9917 >= 0.9941
>[junit4]>at 
> __randomizedtesting.SeedInfo.seed([1D407DF9BFA38129:9DE8FAF1672A99B6]:0)
>[junit4]>at 
> org.apache.solr.search.TestSolrCachePerf.assertGreaterThanOrEqual(TestSolrCachePerf.java:84)
>[junit4]>at 
> org.apache.solr.search.TestSolrCachePerf.lambda$testGetPutCompute$0(TestSolrCachePerf.java:75)
>[junit4]>at java.util.HashMap.forEach(HashMap.java:1289)
>[junit4]>at 
> org.apache.solr.search.TestSolrCachePerf.testGetPutCompute(TestSolrCachePerf.java:73)
>[junit4]>at java.lang.Thread.run(Thread.java:748)
>[junit4]   2> NOTE: leaving temporary files on disk at: 
> /home/jpountz/.lucene-releases/8.4.0/lucene-solr/solr/build/solr-core/test/J2/temp/solr.search.TestSolrCachePerf_1D407DF9BFA38129-001
>[junit4]   2> NOTE: test params are: codec=Asserting(Lucene84): {}, 
> docValues:{}, maxPointsInLeafNode=825, maxMBSortInHeap=7.419122986645501, 
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@d7facc8),
>  locale=ar-QA, timezone=AGT
>[junit4]   2> NOTE: Linux 4.4.0-104-generic amd64/Oracle Corporation 
> 1.8.0_151 (64-bit)/cpus=12,threads=1,free=188791576,total=528482304
>[junit4]   2> NOTE: All tests run in this JVM: [TestCryptoKeys, 
> AtomicUpdatesTest, TestNestedUpdateProcessor, SolrPluginUtilsTest, 
> TestStressCloudBlindAtomicUpdates, DirectoryFactoryTest, TaggerTest, 
> IndexSizeTriggerMixedBoundsTest, NumberUtilsTest, TestZkChroot, 
> HdfsDirectoryTest, RollingRestartTest, TestSolrCLIRunExample, 
> ClusterStateTest, DocValuesMultiTest, TestLeaderElectionWithEmptyReplica, 
> TestCustomSort, TestSchemaManager, TestInPlaceUpdatesDistrib, 
> TestReloadAndDeleteDocs, HighlighterConfigTest, MBeansHandlerTest, 
> PeerSyncWithIndexFingerprintCachingTest, SubstringBytesRefFilterTest, 
> TestChildDocTransformer, IndexSizeTriggerTest, HealthCheckHandlerTest, 
> SuggesterFSTTest, TestLuceneIndexBackCompat, TestDelegationWithHadoopAuth, 
> TestSolrCoreProperties, DistanceFunctionTest, 
> SignatureUpdateProcessorFactoryTest, CdcrBootstrapTest, DebugComponentTest, 
> TriggerEventQueueTest, TestLegacyBM25SimilarityFactory, 
> Solr

[jira] [Commented] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210149#comment-17210149
 ] 

ASF subversion and git services commented on SOLR-14576:


Commit 8c41418c0fb33fdc0ee6aedbd52917c63b447756 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8c41418 ]

SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap (#1586)



> HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key
> -
>
> Key: SOLR-14576
> URL: https://issues.apache.org/jira/browse/SOLR-14576
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GC performance is affected when the key is a complex data structure. We can 
> make it
> {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
> instead of
>  {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul opened a new pull request #1963: SOLR-14827: Refactor schema loading to not use XPath

2020-10-08 Thread GitBox


noblepaul opened a new pull request #1963:
URL: https://github.com/apache/lucene-solr/pull/1963


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-10-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210165#comment-17210165
 ] 

Andrzej Bialecki commented on SOLR-14749:
-

[~noble.paul]  PR-1964 contains just the {{ClusterSingleton}} part with the 
testcase and the changes needed to support loading this type of plugins via 
{{CustomContainerPlugins}}.

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-10-08 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210168#comment-17210168
 ] 

Noble Paul commented on SOLR-14749:
---

Thanks [~ab] . I shall review it soon

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210154#comment-17210154
 ] 

ASF subversion and git services commented on SOLR-14576:


Commit ad7ad02238de8a61bc32673e553178792f711445 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad7ad02 ]

SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap (#1586)



> HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key
> -
>
> Key: SOLR-14576
> URL: https://issues.apache.org/jira/browse/SOLR-14576
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GC performance is affected when the key is a complex data structure. We can 
> make it
> {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
> instead of
>  {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210157#comment-17210157
 ] 

ASF subversion and git services commented on SOLR-14576:


Commit ad7ad02238de8a61bc32673e553178792f711445 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad7ad02 ]

SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap (#1586)



> HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key
> -
>
> Key: SOLR-14576
> URL: https://issues.apache.org/jira/browse/SOLR-14576
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GC performance is affected when the key is a complex data structure. We can 
> make it
> {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
> instead of
>  {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14920) Format code automatically and enforce it in Solr

2020-10-08 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14920:
-

 Summary: Format code automatically and enforce it in Solr
 Key: SOLR-14920
 URL: https://issues.apache.org/jira/browse/SOLR-14920
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Erick Erickson
Assignee: Erick Erickson


See the discussion at: LUCENE-9564.

This is a placeholder for the present, I'm reluctant to do this to the Solr 
code base until after:
 * we have some Solr-specific consensus
 * we have some clue what this means for the reference impl.

Reconciling the reference impl will be difficult enough without a zillion 
format changes to add to the confusion.

So my proposal is

1> do this.

2> Postpone this until after the reference impl is merged.

3> do this in one single commit for reasons like being able to conveniently 
have this separated out from git blame.

Assigning to myself so it doesn't get lost, but anyone who wants to take it 
over please feel free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210159#comment-17210159
 ] 

ASF subversion and git services commented on SOLR-14576:


Commit ad7ad02238de8a61bc32673e553178792f711445 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad7ad02 ]

SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap (#1586)



> HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key
> -
>
> Key: SOLR-14576
> URL: https://issues.apache.org/jira/browse/SOLR-14576
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GC performance is affected when the key is a complex data structure. We can 
> make it
> {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
> instead of
>  {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210160#comment-17210160
 ] 

ASF subversion and git services commented on SOLR-14576:


Commit ad7ad02238de8a61bc32673e553178792f711445 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad7ad02 ]

SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap (#1586)



> HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key
> -
>
> Key: SOLR-14576
> URL: https://issues.apache.org/jira/browse/SOLR-14576
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GC performance is affected when the key is a complex data structure. We can 
> make it
> {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
> instead of
>  {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210161#comment-17210161
 ] 

ASF subversion and git services commented on SOLR-14576:


Commit ad7ad02238de8a61bc32673e553178792f711445 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad7ad02 ]

SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap (#1586)



> HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key
> -
>
> Key: SOLR-14576
> URL: https://issues.apache.org/jira/browse/SOLR-14576
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> GC performance is affected when the key is a complex data structure. We can 
> make it
> {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
> instead of
>  {code}
> private static WeakIdentityMap etagCoreCache = 
> WeakIdentityMap.newConcurrentHashMap();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram opened a new pull request #1964: SOLR-14749: Cluster singleton part of PR-1785

2020-10-08 Thread GitBox


sigram opened a new pull request #1964:
URL: https://github.com/apache/lucene-solr/pull/1964


   This is just the `ClusterSingleton` part, with an example implementation 
that loads the ClusterSingleton plugins via `CustomerContainerPlugins` API. See 
`TestContainerPlugin` to see how it works.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #1953: SOLR-14917: Move DOMUtil and PropertiesUtil to SolrJ

2020-10-08 Thread GitBox


dsmiley commented on pull request #1953:
URL: https://github.com/apache/lucene-solr/pull/1953#issuecomment-705546872


   I leave JIRA for context/motivation and PRs for the details, and so my PRs 
are often lacking anything other than a JIRA link.  This issue, SOLR-14917 is a 
child task of https://issues.apache.org/jira/browse/SOLR-14915 which removes 
the Prometheus Exporter's dependency on Solr-core.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #1963: SOLR-14827: Refactor schema loading to not use XPath

2020-10-08 Thread GitBox


muse-dev[bot] commented on a change in pull request #1963:
URL: https://github.com/apache/lucene-solr/pull/1963#discussion_r501702700



##
File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java
##
@@ -474,23 +481,28 @@ protected Analyzer getWrappedAnalyzer(String fieldName) {
 }
   }
 
-  protected void readSchema(InputSource is) {
+  protected void readSchema(ConfigSetService.ConfigResource is) {
 assert null != is : "schema InputSource should never be null";
 try {
-  // pass the config resource loader to avoid building an empty one for no 
reason:
-  // in the current case though, the stream is valid so we wont load the 
resource by name
-  XmlConfigFile schemaConf = new XmlConfigFile(loader, SCHEMA, is, 
SLASH+SCHEMA+SLASH, substitutableProperties);
-  Document document = schemaConf.getDocument();
-  final XPath xpath = schemaConf.getXPath();
-  String expression = stepsToPath(SCHEMA, AT + NAME);
-  Node nd = (Node) xpath.evaluate(expression, document, 
XPathConstants.NODE);
+  rootNode = is.getParsed();
+  if(rootNode == null) {
+// pass the config resource loader to avoid building an empty one for 
no reason:
+// in the current case though, the stream is valid so we wont load the 
resource by name
+XmlConfigFile schemaConf = new XmlConfigFile(loader, SCHEMA, 
is.getSource(), SLASH+SCHEMA+SLASH, null);
+//  Document document = schemaConf.getDocument();
+//  final XPath xpath = schemaConf.getXPath();
+//  String expression = stepsToPath(SCHEMA, AT + NAME);
+//  Node nd = (Node) xpath.evaluate(expression, document, 
XPathConstants.NODE);
+rootNode = new DataConfigNode(new 
DOMConfigNode(schemaConf.getDocument().getDocumentElement())) ;

Review comment:
   *NULL_DEREFERENCE:*  object returned by `getDocument(schemaConf)` could 
be null and is dereferenced at line 496.

##
File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java
##
@@ -608,7 +629,7 @@ protected void readSchema(InputSource is) {
   // expression = "/schema/copyField";
 
   dynamicCopyFields = new DynamicCopy[] {};

Review comment:
   *THREAD_SAFETY_VIOLATION:*  Unprotected write. Non-private method 
`IndexSchema.readSchema(...)` writes to field `this.dynamicCopyFields` outside 
of synchronization.
Reporting because another access to the same memory occurs on a background 
thread, although this access may not.

##
File path: 
solr/core/src/java/org/apache/solr/schema/ManagedIndexSchemaFactory.java
##
@@ -174,8 +175,8 @@ public ManagedIndexSchema create(String resourceName, 
SolrConfig config) {
 }
 InputSource inputSource = new InputSource(schemaInputStream);
 
inputSource.setSystemId(SystemIdResolver.createSystemIdFromResourceName(loadedResource));

Review comment:
   *THREAD_SAFETY_VIOLATION:*  Read/Write race. Non-private method 
`ManagedIndexSchemaFactory.create(...)` reads without synchronization from 
`this.loadedResource`. Potentially races with write in method 
`ManagedIndexSchemaFactory.create(...)`.
Reporting because this access may occur on a background thread.

##
File path: 
solr/core/src/java/org/apache/solr/schema/ManagedIndexSchemaFactory.java
##
@@ -174,8 +175,8 @@ public ManagedIndexSchema create(String resourceName, 
SolrConfig config) {
 }
 InputSource inputSource = new InputSource(schemaInputStream);
 
inputSource.setSystemId(SystemIdResolver.createSystemIdFromResourceName(loadedResource));
-schema = new ManagedIndexSchema(config, loadedResource, inputSource, 
isMutable,
-managedSchemaResourceName, 
schemaZkVersion, getSchemaUpdateLock());
+schema = new ManagedIndexSchema(config, loadedResource, () -> inputSource, 
isMutable,
+managedSchemaResourceName, schemaZkVersion, getSchemaUpdateLock());
 if (shouldUpgrade) {

Review comment:
   *THREAD_SAFETY_VIOLATION:*  Read/Write race. Non-private method 
`ManagedIndexSchemaFactory.create(...)` reads without synchronization from 
`this.shouldUpgrade`. Potentially races with write in method 
`ManagedIndexSchemaFactory.create(...)`.
Reporting because this access may occur on a background thread.

##
File path: 
solr/core/src/java/org/apache/solr/schema/ManagedIndexSchemaFactory.java
##
@@ -174,8 +175,8 @@ public ManagedIndexSchema create(String resourceName, 
SolrConfig config) {
 }
 InputSource inputSource = new InputSource(schemaInputStream);
 
inputSource.setSystemId(SystemIdResolver.createSystemIdFromResourceName(loadedResource));
-schema = new ManagedIndexSchema(config, loadedResource, inputSource, 
isMutable,
-managedSchemaResourceName, 
schemaZkVersion, getSchemaUpdateLock());
+schema = new ManagedIndexSchema(config, loadedResource, () -> inputSource, 
isMutable,
+managedSchemaResourceName, s

[GitHub] [lucene-solr] noblepaul merged pull request #1586: SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap

2020-10-08 Thread GitBox


noblepaul merged pull request #1586:
URL: https://github.com/apache/lucene-solr/pull/1586


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210252#comment-17210252
 ] 

Gus Heck commented on SOLR-14787:
-

New Syntax with latest change (one less parameter, can check multiple tokens): 

for payloads such as
{code:java}
"one|1.0 two|2.0 three|3.0"
{code}
This does not match
{code:java}
{!payload_check f=vals_dpf payloads='0.75 3' op='gt'}one two
{code}
but this does match
{code:java}
{!payload_check f=vals_dpf payloads='0.75 1.5' op='gt'}one two
{code}

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210254#comment-17210254
 ] 

Chris M. Hostetter commented on SOLR-14870:
---

{quote}...If you can make it work then please go ahead and I can follow up with 
a functionality-preserving cleanup later on.
{quote}
Oh, oh ... i'm sorry if i wsan't clear before: it 100% works right now, it 
works exactly like i hoped it would work.  I'm _SUPER_ happy with the 
functionality (i just wasn't sure If i was doing something horribly "bad") ...
 * syncing the source files and building the data files is now a single task – 
which is ideal in my opnion
 ** the jekyll build flat out can't work w/o the data files, so this way 
there's no "partially usable" content directory
 ** the only reason they were distinct ant tasks in the past was because the 
"bare bones" build didn't need them
 * all of the input/output caching seems dialed in on the right things
 ** checking file contents, not just existence of directory, no unneccessary 
"replace/delete this file on ever run, even if the 'source' hasn't changed, 
etc...
 ** running any task twice back to back is a no-op the second time
 * dependencies on top level documentation hooked into the "local javadoc" link 
checking
 * no deplication in the definitions of the "dual" tasks – no risk (i can see) 
that someone can breaks the "html" site in a way that won't cause a failure 
when checking the "local links" site

{quote}I rarely venture into writing Java classes for gradle builds. The 
benefit of doing this is precompilation (if you move it to buildSrc)...
{quote}
Right ... that's interesting ... maybe that's why i was seeing docs suggesting 
building java classes for re-usable logic in tasks that you want multiple 
instances? ... with that possibility in mind i'm now less inclined to convert 
this to DSL, and more interested in figuring out how to pre-compile it down the 
road :)

At this point the main thing i need clarity on is less about gradle, and more 
about how our javadoc "output" structure seems to have changed with the move to 
gradle  – and if that's intentional and expected (so i can fix the links in the 
ref-guide to point to the new paths) or if it isn't expected and needs fixed 
(so the existing links start working again)

/cc: [~uschindler] (see previous comment about ".../8_6_0/solr-core/..." vs 
".../9_0_0/core/...")

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210262#comment-17210262
 ] 

Uwe Schindler commented on SOLR-14870:
--

Hi [~hossman],
the javadocs wont change anymore. We cleaned it up to have module names and 
directory names identical to gradle build module names. This mainly affected 
analyzers and the solr prefix now gone.

So fix refguide to correctly link to docs.

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210264#comment-17210264
 ] 

Dawid Weiss commented on SOLR-14870:


bq.  with that possibility in mind i'm now less inclined to convert this to 
DSL, and more interested in figuring out how to pre-compile it down the road

Not everything you can read in the docs is worth adhering to... My opinion on 
the matter is that moving things to precompiled classes and plugins makes the 
build less clear, actually... But mileage may vary of course.

The paths/ output folders for javadocs is expected, Chris. They reflect module 
names.

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210265#comment-17210265
 ] 

Uwe Schindler commented on SOLR-14870:
--

bq. My opinion on the matter is that moving things to precompiled classes and 
plugins makes the build less clear, actually... 

+1

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210268#comment-17210268
 ] 

Chris M. Hostetter commented on SOLR-14870:
---

{quote}My opinion on the matter is that moving things to precompiled classes 
and plugins makes the build less clear, actually... But mileage may vary of 
course.
{quote}
Sure sure ... understood – i was just saying that _I'm_ less inclined to go 
Java->DSL knowing thta going the opposite direction (Java->precompiled) is also 
possible.  I certainly don't object if you want to pursue Java->DSL to see if 
it makes the build more clear.
{quote}The paths/ output folders for javadocs is expected, Chris. They reflect 
module names.
{quote}
perfect, thanks for confirming guys ... i'm heads down in something else right 
now but i'll come back and change/fix all the javadoc links and remove the 
nocommits ASAP.

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210277#comment-17210277
 ] 

Dawid Weiss commented on SOLR-14870:


For the record - you can also write precompiled tasks/ classes in Groovy (or 
Kotlin) if this makes you happier. 

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds

2020-10-08 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210281#comment-17210281
 ] 

Dawid Weiss commented on SOLR-14759:


I took a tentative look at how difficult it would be. Code-wise it's fairly 
simple. I have created two branches from the current master - one with just 
Lucene and one with Solr, here:

https://github.com/dweiss/lucene-solr/tree/lucene-standalone
https://github.com/dweiss/lucene-solr/tree/solr-standalone

These branches remove the counterpart project but leave much of "everything 
else" in the same places. Projects compile and pass tests although full check 
doesn't work because of documentation and site generation inter-dependencies 
(which I will need some help with I think).

Solr fully depends on Lucene binary snapshot JARs fetched from Apache Nexus 
(snapshots repo). I had to comment out 2 or 3 classes which had direct 
dependencies on Lucene test classes but these can be sorted out later I think.

The question is when to move forward with this; I'm guessing it has to be done 
atomically with the TLP move because of site and documentation generation?

> Separate the Lucene and Solr builds
> ---
>
> Key: SOLR-14759
> URL: https://issues.apache.org/jira/browse/SOLR-14759
> Project: Solr
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Jan Høydahl
>Assignee: Dawid Weiss
>Priority: Major
>
> While still in same git repo, separate the builds, so Lucene and Solr can be 
> built independently.
> This is a preparation step which will make it easier to prune the new git 
> repos post-split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2020-10-08 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9572:


 Summary: Allow TypeAsSynonymFilter to propagate selected flags and 
Ignore some types
 Key: LUCENE-9572
 URL: https://issues.apache.org/jira/browse/LUCENE-9572
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis, modules/test-framework
Reporter: Gus Heck
Assignee: Gus Heck


(Breaking this off of SOLR-14597 for independent review)

TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
original token may have already had flags set on it and it may be useful to 
propagate some or all of those flags to the synonym we are generating. This 
ticket provides that ability and allows the user to specify a bitmask to 
specify which flags are retained.

Additionally there may be some set of types that should not be converted to 
synonyms, and this change allows the user to specify a comma separated list of 
types to ignore (most common case will be to ignore a common default type of 
'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds

2020-10-08 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210287#comment-17210287
 ] 

Ishan Chattopadhyaya commented on SOLR-14759:
-

This sounds very cool! Can we move forward with this immediately after 8.7 
release? Or maybe 8.8? Or is there any reason why we need to wait until 9.0?

> Separate the Lucene and Solr builds
> ---
>
> Key: SOLR-14759
> URL: https://issues.apache.org/jira/browse/SOLR-14759
> Project: Solr
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Jan Høydahl
>Assignee: Dawid Weiss
>Priority: Major
>
> While still in same git repo, separate the builds, so Lucene and Solr can be 
> built independently.
> This is a preparation step which will make it easier to prune the new git 
> repos post-split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9573) Add back compat tests for VectorFormat to TestBackwardsCompatibility

2020-10-08 Thread Michael Sokolov (Jira)
Michael Sokolov created LUCENE-9573:
---

 Summary: Add back compat tests for VectorFormat to 
TestBackwardsCompatibility
 Key: LUCENE-9573
 URL: https://issues.apache.org/jira/browse/LUCENE-9573
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael Sokolov


In LUCENE-9322 we add a new VectorFormat to the index. This issue is about 
adding backwards compatibility tests for it once the index format has 
crystallized into its 9.0 form



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-08 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r501828351



##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValues.java
##
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Access to per-document vector value.
+ */
+public abstract class VectorValues extends DocIdSetIterator {
+
+  /** The maximum length of a vector */
+  public static int MAX_DIMENSIONS = 1024;
+
+  /** Sole constructor */
+  protected VectorValues() {}
+
+  /**
+   * Return the dimension of the vectors
+   */
+  public abstract int dimension();
+
+  /**
+   * TODO: should we use cost() for this? We rely on its always being exactly 
the number
+   * of documents having a value for this field, which is not guaranteed by 
the cost() contract,
+   * but in all the implementations so far they are the same.
+   * @return the number of vectors returned by this iterator
+   */
+  public abstract int size();
+
+  /**
+   * Return the score function used to compare these vectors
+   */
+  public abstract ScoreFunction scoreFunction();
+
+  /**
+   * Return the vector value for the current document ID.
+   * It is illegal to call this method after the iterator failed to advance.
+   * @return the vector value
+   */
+  public abstract float[] vectorValue() throws IOException;
+
+  /**
+   * Return the binary encoded vector value for the current document ID.
+   * It is illegal to call this method after the iterator failed to advance.
+   * @return the binary value
+   */
+  public BytesRef binaryValue() throws IOException {
+throw new UnsupportedOperationException();
+  }
+
+  /**
+   * Return a random access interface over this iterator's vectors.
+   */
+  public abstract RandomAccess randomAccess();
+
+  /**
+   * Provides random access to vectors by dense ordinal
+   */
+  public interface RandomAccess {
+
+/**
+ * Return the vector value as a floating point array.
+ * @param targetOrd a valid ordinal, ≥ 0 and < {@link #size()}.
+ */
+float[] vectorValue(int targetOrd) throws IOException;
+
+/**
+ * Return the vector value as a byte array; these are the bytes 
corresponding to the float array
+ * encoded using little-endian byte order.
+ * @param targetOrd a valid ordinal, ≥ 0 and < {@link #size()}.
+ */
+BytesRef binaryValue(int targetOrd) throws IOException;
+
+/**
+ * Return the k nearest neighbor documents as determined by comparison of 
their vector values
+ * for this field, to the given vector, by the field's score function. If 
the score function is
+ * reversed, lower values indicate nearer vectors, otherwise higher scores 
indicate nearer
+ * vectors. Unlike relevance scores, vector scores may be negative.
+ * @param target the vector-valued query
+ * @param k  the number of docs to return
+ * @param fanout control the accuracy/speed tradeoff - larger values give 
better recall at higher cost
+ * @return the k nearest neighbor documents, along with their 
(scoreFunction-specific) scores.
+ */
+TopDocs search(float[] target, int k, int fanout) throws IOException;
+  }
+
+  /**
+   * Score function. This is used during indexing and searching of the vectors 
to determine the nearest neighbors.
+   * Score values may be negative. By default high scores indicate nearer 
documents, unless the function is reversed.
+   */
+  public enum ScoreFunction {
+/** No distance function is used. Note: {@link 
VectorValues.RandomAccess#search(float[], int, int)}

Review comment:
   OK I opened LUCENE-9573. I have to admit I don't fully understand the 
timing constraints/dependencies here. Maybe you could comment on that issue?  
Re: the ids I opted to move to using the enum ordinal as you suggested later. I 
can't see how that restricts us in any meaningful way. Perhaps we add a 
back-compat test to verify that the enum ordinals don't change




--

[jira] [Commented] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-08 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210303#comment-17210303
 ] 

Uwe Schindler commented on SOLR-14870:
--

I don't think in the current state we should add tasks or plugins to build-src. 
I like it much more to add the subclass of task to live in the Gradle file. We 
find this many times and I like it. The RenderJavadocsTask is a subclass of 
DefaultTask. It's also reused for each module, and also handles global site 
javadocs, but also Maven javadocs. I see no reason to move it to build-src. So 
reusability is no reason to precompile.

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ links i 
> nthe ref-guide, even those to (local) javadocs
> While the "local.javadocs" property logic _inside_ the 
> solr-ref-guide/build.xml was ported to build.gradle, the logic to leverage 
> this functionality from the "solr" project doesn't seem to have been 
> preserved -- so currently, {{gradle check}} doesn't know/care if someone adds 
> a nonsense javadoc link to the ref-guide (or removes a class/method whose 
> javadoc is already currently to from the ref guide)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf opened a new pull request #1965: LUCENE-9572 - TypeAsSynonymFilter gains selective flag transfer and an ignore list.

2020-10-08 Thread GitBox


gus-asf opened a new pull request #1965:
URL: https://github.com/apache/lucene-solr/pull/1965


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9574:


 Summary: Add a token filter to drop tokens based on flags.
 Key: LUCENE-9574
 URL: https://issues.apache.org/jira/browse/LUCENE-9574
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Gus Heck
Assignee: Gus Heck


A filter that tests flags on tokens vs a bitmask and drops tokens that have all 
specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated LUCENE-9574:
-
Description: 
(Breaking this off of SOLR-14597 for independent review)

A filter that tests flags on tokens vs a bitmask and drops tokens that have all 
specified flags.

  was:A filter that tests flags on tokens vs a bitmask and drops tokens that 
have all specified flags.


> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210332#comment-17210332
 ] 

Gus Heck commented on LUCENE-9572:
--

Since this is blocking SIP-9 and SOLR-14597 I'll be presuming silent consensus 
if there are no comments by Monday

> Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
> ---
>
> Key: LUCENE-9572
> URL: https://issues.apache.org/jira/browse/LUCENE-9572
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis, modules/test-framework
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
> original token may have already had flags set on it and it may be useful to 
> propagate some or all of those flags to the synonym we are generating. This 
> ticket provides that ability and allows the user to specify a bitmask to 
> specify which flags are retained.
> Additionally there may be some set of types that should not be converted to 
> synonyms, and this change allows the user to specify a comma separated list 
> of types to ignore (most common case will be to ignore a common default type 
> of 'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210333#comment-17210333
 ] 

Gus Heck commented on LUCENE-9574:
--

One interesting corner case came up when the first token in the stream matched 
the flags, but had already had a synonym added. The synonym of course had 
position increment 0 and so dropping the token caused compliants about first 
token not having a position increment > 0. I could think of no way to reach 
forward in the stream and adjust the synonym token to account for the dropping 
of it's parent. So the workaround I came up with was to create a random token 
that will effectively never match anything and thus be invisible to to replace 
instead of drop if the first token in the stream is being dropped. Not crazy 
about it and would like to ask why the restriction on position increment is 
there... it feels like for some reason downstream code expects token positions 
be be starting with 1 instead of zero or something? Open to suggestions for a 
better solution too.

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf opened a new pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


gus-asf opened a new pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210335#comment-17210335
 ] 

Gus Heck commented on LUCENE-9574:
--

Since this is blocking SIP-9 and SOLR-14597 I'll be presuming silent consensus 
if there are no comments by Monday

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-08 Thread GitBox


muse-dev[bot] commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r501886824



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexingChain.java
##
@@ -562,6 +614,12 @@ private int processField(int docID, IndexableField field, 
long fieldGen, int fie
   }
   indexPoint(docID, fp, field);
 }
+if (fieldType.vectorDimension() != 0) {
+  if (fp == null) {
+fp = getOrAddField(fieldName, fieldType, false);
+  }
+  indexVector(docID, fp, field);

Review comment:
   *NULL_DEREFERENCE:*  object `fp` last assigned on line 619 could be null 
and is dereferenced by call to `indexVector(...)` at line 621.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705717342


   Hi @madrob are you planning to further work on this. I have seen one TODO 
regarding a README. If you don't have time, I can merge this and enable Jenkins 
Artifact builds.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler edited a comment on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705717342


   Hi @madrob are you planning to further work on this? I have seen one TODO 
regarding a README. If you don't have time, I can merge this and enable Jenkins 
Artifact builds.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9575) Add PatternTypingFilter

2020-10-08 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9575:


 Summary: Add PatternTypingFilter
 Key: LUCENE-9575
 URL: https://issues.apache.org/jira/browse/LUCENE-9575
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Gus Heck
Assignee: Gus Heck


One of the key asks when the Library of Congress was asking me to develop the 
Advanced Query Parser was to be able to recognize arbitrary patterns that 
included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
wanted 401k and 401(k) to match documents with either style reference, and NOT 
match documents that happen to have isolated 401 or k tokens (i.e. not 
documents about the http status code) And of course we wanted to give up as 
little of the text analysis features they were already using.

This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
arbitrary analyzer defined for a type in the solr schema, combine to achieve 
this. 

This filter has the job of spotting the patterns, and adding the intended 
synonym as at type to the token (from which minimal punctuation has been 
removed). It also sets flags on the token which are retained through the 
analysis chain, and at the very end the type is converted to a synonym and the 
original token(s) for that type are dropped avoiding the match on 401 (for 
example) 

The pattern matching is specified in a file that looks like: 
{code}
2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
2 C\+\+ ::: c_plus_plus
{code}

That file would match match legal reference patterns such as 401(k), 401k, 
501(c)3 and C++ The format is:

  ::: 

and groups in the pattern are substituted into the replacement so the first 
line above would create synonyms such as:

{code}
401k   --> legal2_401_k
401(k) --> legal2_401_k
503(c) --> legal2_503_c
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210350#comment-17210350
 ] 

Gus Heck commented on LUCENE-9572:
--

The test framework changes in this ticket are also required by LUCENE-9575

> Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
> ---
>
> Key: LUCENE-9572
> URL: https://issues.apache.org/jira/browse/LUCENE-9572
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis, modules/test-framework
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
> original token may have already had flags set on it and it may be useful to 
> propagate some or all of those flags to the synonym we are generating. This 
> ticket provides that ability and allows the user to specify a bitmask to 
> specify which flags are retained.
> Additionally there may be some set of types that should not be converted to 
> synonyms, and this change allows the user to specify a comma separated list 
> of types to ignore (most common case will be to ignore a common default type 
> of 'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


rmuir commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705724494


   please use 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/analysis/FilteringTokenFilter.java
 as a subclass and just implement accept() logic to determine whether a token 
should survive. It handles all the hairy parts, no securerandom needed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705735154


   Yes, @rmuir is right. The TokenFilter should only subclass 
FilteringTokenFilter and implement accept(). By that all logic is obsolete and 
it get's a one-liner.
   
   I am not sure what SecureRandom or UUID has to do here. We should maybe 
allow to pass fillter token as parameter. If not given maybe default to empty 
token?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705736515


   The fillter token is obsolete. If you implement FilteringTokenFilter's 
accept don't try to inject tokens like this. Tests will pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705738171


   The problem why you hit the exception is wrong implementation. Don't look at 
tokens before calling incrementToken().
   In addition, you must call addAttribute() before the loop, so the 
FlagsAttribute instance is created before. Please drop the whole PR. It's plain 
wrong.
   
   Most of tests are not needed if you implement FilteringTokenFilter.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on a change in pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#discussion_r501917428



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilter.java
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.miscellaneous;
+
+import java.io.IOException;
+import java.util.UUID;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+
+/**
+ * Allows Tokens with a given combination of flags to be dropped.
+ *
+ * @see DropIfFlaggedFilterFactory
+ */
+public class DropIfFlaggedFilter extends TokenFilter {
+
+  private int dropFlags;
+
+  private CharTermAttribute attribute = getAttribute(CharTermAttribute.class);
+  private boolean firstToken = true;
+
+  /**
+   * Construct a token stream filtering the given input.
+   *
+   * @param input the source stream
+   * @param dropFlags a combination of flags that indicates that the token 
should be dropped.
+   */
+  @SuppressWarnings("WeakerAccess")
+  protected DropIfFlaggedFilter(TokenStream input, int dropFlags) {
+super(input);
+this.dropFlags = dropFlags;
+  }
+
+  @Override
+  public final boolean incrementToken() throws IOException {
+boolean result;
+boolean dropToken;
+do {
+  result = input.incrementToken();
+  dropToken = (getAttribute(FlagsAttribute.class).getFlags() & dropFlags) 
== dropFlags;

Review comment:
   you have to move this to final fiel and use FlagsAttribute flagsAttr = 
addAttribute(FlagsAttribute.class)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on a change in pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#discussion_r501917428



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilter.java
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.miscellaneous;
+
+import java.io.IOException;
+import java.util.UUID;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+
+/**
+ * Allows Tokens with a given combination of flags to be dropped.
+ *
+ * @see DropIfFlaggedFilterFactory
+ */
+public class DropIfFlaggedFilter extends TokenFilter {
+
+  private int dropFlags;
+
+  private CharTermAttribute attribute = getAttribute(CharTermAttribute.class);
+  private boolean firstToken = true;
+
+  /**
+   * Construct a token stream filtering the given input.
+   *
+   * @param input the source stream
+   * @param dropFlags a combination of flags that indicates that the token 
should be dropped.
+   */
+  @SuppressWarnings("WeakerAccess")
+  protected DropIfFlaggedFilter(TokenStream input, int dropFlags) {
+super(input);
+this.dropFlags = dropFlags;
+  }
+
+  @Override
+  public final boolean incrementToken() throws IOException {
+boolean result;
+boolean dropToken;
+do {
+  result = input.incrementToken();
+  dropToken = (getAttribute(FlagsAttribute.class).getFlags() & dropFlags) 
== dropFlags;

Review comment:
   you have to move this to final field and use `final FlagsAttribute 
flagsAttr = addAttribute(FlagsAttribute.class);` as instance member.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705749147


   FilteringTokenFilter also corrects positions. No filler tokens needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705753920


   This would be your filter, plain simple, correct positions, no fillter 
tokens needed. All bug-specific tests are obsolete, as this type of Filter is 
tested to hell (we have many token-dropping filters):
   
   ```java
   public final class DropIfFlaggedFilter extends FilteringTokenFilter {
 private final FlagsAttribute flagsAtt = addAttribute(FlagsAttribute.class);
   
 private final int dropFlags;
   
 public DropIfFlaggedFilter (TokenStream in, int dropFlags) {
   super(in);
   this.dropFlags = dropFlags;
 }
   
 @Override
 public boolean accept() {
   return (flagsAtt.getFlags() & dropFlags) != dropFlags; // TODO maybe == 
0 if all or any flag is enough
 }
   }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on a change in pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#discussion_r501937630



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilter.java
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.miscellaneous;
+
+import java.io.IOException;
+import java.util.UUID;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+
+/**
+ * Allows Tokens with a given combination of flags to be dropped.
+ *
+ * @see DropIfFlaggedFilterFactory
+ */
+public class DropIfFlaggedFilter extends TokenFilter {
+
+  private int dropFlags;

Review comment:
   must be final





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


gus-asf commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705761838


   HI @uschindler , your comments sound like you think I will be unwilling to 
take feedback. I'm not sure why you feel this way and seem to need to say the 
same thing repeatedly. This is broken out from the original contribution ticket 
for the very purpose of getting feedback. That said I do have a concern with 
the solution that FilteringTokenFilter provides... Consider this case:
   
   Text: "January 401(k) contribution"
   Whitespace tok: "January"(pi:1) "401l"(pi1) "contribution"(pi1)
   PatternTypingFilter: "January"(pi:1), "401(k)"(pi1:flag:2, 
type:legal2_401_k), "contribution"(pi:1)
   TokenAnalyzerFilter:"january"(pi:1),"401"(pi1:flag:2, 
type:legal2_401_k),"k",  "contribution"(pi:1)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf edited a comment on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


gus-asf edited a comment on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705761838


   HI @uschindler , your comments sound like you think I will be unwilling to 
take feedback. I'm not sure why you feel this way and seem to need to say the 
same thing repeatedly. This is broken out from the original contribution ticket 
for the very purpose of getting feedback. That said I do have a concern with 
the solution that FilteringTokenFilter provides... Consider this case:
   
   Text: "January 401(k) contribution"
   Whitespace tok: "January"(pi:1) "401l"(pi1) "contribution"(pi1)
   PatternTypingFilter: "January"(pi:1), "401(k)"(pi1:flag:2, 
type:legal2_401_k), "contribution"(pi:1)
   TokenAnalyzerFilter:"january"(pi:1),"401"(pi1:flag:2, 
type:legal2_401_k),"k",  "contribution"(pi:1) 
   accidental submit... still editing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705765924


   Read the code if FilteringTokenFilter. It will fix the positions. What you 
describes is not an issue, as this also affects StopFilter or others.
   
   If the first token is removed, FilteringTokenFilter will fix the position 
increment. Inserting a bullshit token is not needed.
   
   There are tons of tests for this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14588) Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker

2020-10-08 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210401#comment-17210401
 ] 

Cassandra Targett commented on SOLR-14588:
--

[~atris], This issue is marked as fixed in 8.7, but I don't see the changes 
made in the various commits in branch_8x, it seems they are on master only. Is 
this really intended for 8.7, or is that simply an error?

> Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker
> --
>
> Key: SOLR-14588
> URL: https://issues.apache.org/jira/browse/SOLR-14588
> Project: Solr
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Atri Sharma
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> This Jira tracks addition of circuit breakers in the search path and 
> implements JVM based circuit breaker which rejects incoming search requests 
> if the JVM heap usage exceeds a defined percentage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705768959


   Your filter does not handle positions at all, so it will also corrupt the 
graph of later tokens are removed.
   It's a state machine and you have to understand it. Your problem was thought 
about years ago, there's no need for discussion.
   
   Sorry for being harsh, but I fortunately I have to say: your work does not 
follow any documentation about how a tokenfilters should behave. If you read 
javadocs it's all explained. The whole thing is plain wrong an using that at 
the Library of Congress would feel me really bad...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


madrob commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705772790


   Rebasing, squashing, and pushing later today



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705773298


   You can just click the button...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


madrob merged pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9488) Update release process to work with Gradle.

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210411#comment-17210411
 ] 

ASF subversion and git services commented on LUCENE-9488:
-

Commit 08e38d3452d548189edba691c2853b63b1fa55ae in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=08e38d3 ]

LUCENE-9488 Create Release Artifacts with Gradle (#1905)

* Build Lucene binary distribution using Gradle
* Generate SHA-512 checksums for all release artifacts
* Update documentation artifacts included in binaries
* Delete some additional Ant relics

Co-authored-by: Dawid Weiss 
Co-authored-by: Uwe Schindler 

> Update release process to work with Gradle.
> ---
>
> Key: LUCENE-9488
> URL: https://issues.apache.org/jira/browse/LUCENE-9488
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Erick Erickson
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> The release process needs to reflect using Gradle rather than Ant. I suspect 
> this will be a significant task, thus it has its own JIRA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705777022


   Thanks. I will soon activate jenkins jobs. 👍🏻😉



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds

2020-10-08 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210426#comment-17210426
 ] 

Dawid Weiss commented on SOLR-14759:


We don't need to wait This only affects master anyway (I can't think of a 
way to split the ant-based build...). What I think I I'm saying is that 
code-wise we're ready. We just need to figure out how to switch and separate 
other infrastructural elements - site building, documentation... But as the 
above branches show, it is a very realistic and not too complicated goal to 
split the build into two independent branches (which we can then rearrange in 
any way each corresponding project wishes).

> Separate the Lucene and Solr builds
> ---
>
> Key: SOLR-14759
> URL: https://issues.apache.org/jira/browse/SOLR-14759
> Project: Solr
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Jan Høydahl
>Assignee: Dawid Weiss
>Priority: Major
>
> While still in same git repo, separate the builds, so Lucene and Solr can be 
> built independently.
> This is a preparation step which will make it easier to prune the new git 
> repos post-split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705788992


   Lucene artifact builds work: 
https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-master/
   
   What's missing ist src.tgz files,are they not yet handled or do I miss a 
task?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


madrob commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705790424


   They're not handled yet, I suspect we should just be doing `git archive` 
rather than defining a gradle task, but I'll have more time to follow up next 
week



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9576) IndexWriter::commit() hangs when the server has a stale NFS volume.

2020-10-08 Thread Girish Nayak (Jira)
Girish Nayak created LUCENE-9576:


 Summary: IndexWriter::commit() hangs when the server has a stale 
NFS volume.
 Key: LUCENE-9576
 URL: https://issues.apache.org/jira/browse/LUCENE-9576
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 8.5.2
Reporter: Girish Nayak


Noticed IndexWriter::commit() hangs when the server has one or more stale NFS 
mounts.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


dweiss commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705794061


   This isn't so easy as the current gradle build can't be separated into 
Lucene and Solr-only parts (as it was the case with ant). We could distribute 
everything (Solr and Lucene sources) or wait until builds are separated (which 
I showed is not that much of a big deal).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705795294


   Yeah. At the moment both artifacts in source form would be the same. So I 
agree, just zip too level.
   
   At Ant times Solr always was also a combined zip. Lucene was just Lucene 
subfolder, but it was somehow incomplete. It was able to build, but no real 
releases.
   
   IMHO we should have always released a combined src.tgz and separate binaries 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-08 Thread GitBox


uschindler edited a comment on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-705795294


   Yeah. At the moment both artifacts in source form would be the same. So I 
agree, just zip top level.
   
   At Ant times Solr always was also a combined zip. Lucene was just Lucene 
subfolder, but it was somehow incomplete. It was able to build, but no real 
releases.
   
   IMHO we should have always released a combined src.tgz and separate binaries 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds

2020-10-08 Thread Anshum Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210433#comment-17210433
 ] 

Anshum Gupta commented on SOLR-14759:
-

Thanks for doing this, [~dweiss]! 

This looks promising and a step in the direction we want to move in. 
Considering this doesn't impact anything but master, I think we should be able 
to do this prior to 9.0 unless someone has a reason to wait.

> Separate the Lucene and Solr builds
> ---
>
> Key: SOLR-14759
> URL: https://issues.apache.org/jira/browse/SOLR-14759
> Project: Solr
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Jan Høydahl
>Assignee: Dawid Weiss
>Priority: Major
>
> While still in same git repo, separate the builds, so Lucene and Solr can be 
> built independently.
> This is a preparation step which will make it easier to prune the new git 
> repos post-split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds

2020-10-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210437#comment-17210437
 ] 

Jan Høydahl commented on SOLR-14759:


This is encouraging. There's nothing wrong with doing more of the split before 
the actual TLP split.

If we proceed with SOLR-14762 now, then we create a 'lucene' git repo with 
lucene-standalone branch as master, and a 'solr' repo with solr-standalone as 
master. Then all master/9.0 development would happen in these two repos, while 
all 8.x development would happen in the old 'lucene-solr' repo. Back-porting to 
8.x would be a pain, but perhaps most new features could be 9.x only, or 
perhaps someone could cook up a script that converts a 'solr' repo patch into a 
'lucene-solr' patch?

We could also use the repo split as an opportunity to rename master branch as 
'main'.

> Separate the Lucene and Solr builds
> ---
>
> Key: SOLR-14759
> URL: https://issues.apache.org/jira/browse/SOLR-14759
> Project: Solr
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Jan Høydahl
>Assignee: Dawid Weiss
>Priority: Major
>
> While still in same git repo, separate the builds, so Lucene and Solr can be 
> built independently.
> This is a preparation step which will make it easier to prune the new git 
> repos post-split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9576) IndexWriter::commit() hangs when the server has a stale NFS volume.

2020-10-08 Thread Girish Nayak (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Girish Nayak updated LUCENE-9576:
-
Attachment: IndexWriter-commit.PNG

> IndexWriter::commit() hangs when the server has a stale NFS volume.
> ---
>
> Key: LUCENE-9576
> URL: https://issues.apache.org/jira/browse/LUCENE-9576
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.2
>Reporter: Girish Nayak
>Priority: Major
> Attachments: IndexWriter-commit.PNG
>
>
> Noticed IndexWriter::commit() hangs when the server has one or more stale NFS 
> mounts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9576) IndexWriter::commit() hangs when the server has a stale NFS volume.

2020-10-08 Thread Girish Nayak (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210439#comment-17210439
 ] 

Girish Nayak commented on LUCENE-9576:
--

See IndexWriter-commit.PNG for call stack.

> IndexWriter::commit() hangs when the server has a stale NFS volume.
> ---
>
> Key: LUCENE-9576
> URL: https://issues.apache.org/jira/browse/LUCENE-9576
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.2
>Reporter: Girish Nayak
>Priority: Major
> Attachments: IndexWriter-commit.PNG
>
>
> Noticed IndexWriter::commit() hangs when the server has one or more stale NFS 
> mounts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9576) IndexWriter::commit() hangs when the server has a stale NFS mount.

2020-10-08 Thread Girish Nayak (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Girish Nayak updated LUCENE-9576:
-
Summary: IndexWriter::commit() hangs when the server has a stale NFS mount. 
 (was: IndexWriter::commit() hangs when the server has a stale NFS volume.)

> IndexWriter::commit() hangs when the server has a stale NFS mount.
> --
>
> Key: LUCENE-9576
> URL: https://issues.apache.org/jira/browse/LUCENE-9576
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.2
>Reporter: Girish Nayak
>Priority: Major
> Attachments: IndexWriter-commit.PNG
>
>
> Noticed IndexWriter::commit() hangs when the server has one or more stale NFS 
> mounts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14656) Deprecate current autoscaling framework, remove from master

2020-10-08 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210445#comment-17210445
 ] 

Cassandra Targett commented on SOLR-14656:
--

[~ab], I notice this is not in CHANGES.txt, do you think it should be? I'm 
adding it now to the solr-upgrade-notes.adoc page, which I didn't do earlier 
since the code had not been done yet.

> Deprecate current autoscaling framework, remove from master
> ---
>
> Key: SOLR-14656
> URL: https://issues.apache.org/jira/browse/SOLR-14656
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Andrzej Bialecki
>Priority: Blocker
> Fix For: 8.7
>
> Attachments: Screenshot from 2020-07-18 07-49-01.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The autoscaling framework is being re-designed in SOLR-14613 (SIP: 
> https://cwiki.apache.org/confluence/display/SOLR/SIP-8+Autoscaling+policy+engine+V2).
> The current autoscaling framework is very inefficient, improperly designed 
> and too bloated and doesn't receive the level of support we aspire to provide 
> for all components that we ship.
> This issue is to deprecate current autoscaling framework in 8x, so we can 
> focus on the new autoscaling framework afresh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14708) Backward-Compatible Replication

2020-10-08 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210455#comment-17210455
 ] 

Cassandra Targett commented on SOLR-14708:
--

[~tflobbe], [~marcussorealheis] - this change didn't get added to the Upgrade 
Notes for 8.7, so I'm working on that now. If I understand this work correctly, 
there should be no compatibility issues for someone moving from 8.6 to 8.7 if 
they do not change their solrconfig.xml, is that correct? 

I'm wondering, though, that we should tell them to update their solrconfig.xml 
files manually because if they don't, they will have issues upgrading to 9.0? I 
mean, at some point they will need to excise the old terms we're trying to get 
rid of or else they'll keep carrying it along forever. The message I'm thinking 
would be something like "go ahead and do your rolling upgrade, but you should 
fix your configs after at a convenient time".

> Backward-Compatible Replication
> ---
>
> Key: SOLR-14708
> URL: https://issues.apache.org/jira/browse/SOLR-14708
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Marcus Eagan
>Priority: Critical
>
> In [SOLR-14702|https://issues.apache.org/jira/browse/SOLR-14702] I proposed 
> that we remove master/slave terminology from the Solr codebase. Now that's 
> complete, we need to ensure it is backward compatible to support rolling 
> upgrades from 8.7.x to 9.x because we really ought not to make it harder to 
> upgrade Solr. 
> Tomas offered a helpful path in a now abandoned PR: 
> {quote}One way to get back compatibility and rolling upgrades could be to 
> make 9.x code be able to read previous formats, but write new format, and 
> make 8.x (since 8.7) read new and old, but write old? Anyone wanting to do a 
> rolling upgrade to 9 would have to be on at least 8.7. Rolling upgrades to 
> 8.7 would still work.
> All the code other than the requests/responses could be changed in 8_x 
> branch, in addition to master.
> {quote}
> The approach that we will take is to add a ternary operator in 9_X to accept 
> parameter values for the legacy verbiage, or leader/follower, but only write 
> leader/follower. We need to then make 8_x work in the inverse way. The burden 
> here is not on that proposal or on the code in my view. Instead, the burden 
> is on the test plan.
> If anyone has any guidance please share but here are my thoughts:
> Case A:
> Test the case where a user is running a standalone cluster in 8 with three 
> nodes but then updates one of the nodes.
> Case B:
> Test the case where a user is running a mixed cluster standalone cluster, and 
> the leader node is forced to fail and then is brought back.
> Case C: 
> A SolrCloud cluster that has a mix of 8 and 9 nodes goes down during a 
> rolling upgrade and a follower needs to become leader. 
> I know haven't listed all possible scenarios or everything that could happen. 
> Please let me know if you have thoughts or guidance on how best to accomplish 
> this work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14708) Backward-Compatible Replication

2020-10-08 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210455#comment-17210455
 ] 

Cassandra Targett edited comment on SOLR-14708 at 10/8/20, 9:06 PM:


[~tflobbe], [~marcussorealheis] - this change didn't get added to the Upgrade 
Notes for 8.7, so I'm working on that now. If I understand this work correctly, 
there should be no compatibility issues for someone moving from 8.x to 8.7 if 
they do not change their solrconfig.xml, is that correct? 

I'm wondering, though, that we should tell them to update their solrconfig.xml 
files manually because if they don't, they will have issues upgrading to 9.0? I 
mean, at some point they will need to excise the old terms we're trying to get 
rid of or else they'll keep carrying it along forever. The message I'm thinking 
would be something like "go ahead and do your rolling upgrade, but you should 
fix your configs after at a convenient time".


was (Author: ctargett):
[~tflobbe], [~marcussorealheis] - this change didn't get added to the Upgrade 
Notes for 8.7, so I'm working on that now. If I understand this work correctly, 
there should be no compatibility issues for someone moving from 8.6 to 8.7 if 
they do not change their solrconfig.xml, is that correct? 

I'm wondering, though, that we should tell them to update their solrconfig.xml 
files manually because if they don't, they will have issues upgrading to 9.0? I 
mean, at some point they will need to excise the old terms we're trying to get 
rid of or else they'll keep carrying it along forever. The message I'm thinking 
would be something like "go ahead and do your rolling upgrade, but you should 
fix your configs after at a convenient time".

> Backward-Compatible Replication
> ---
>
> Key: SOLR-14708
> URL: https://issues.apache.org/jira/browse/SOLR-14708
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Marcus Eagan
>Priority: Critical
>
> In [SOLR-14702|https://issues.apache.org/jira/browse/SOLR-14702] I proposed 
> that we remove master/slave terminology from the Solr codebase. Now that's 
> complete, we need to ensure it is backward compatible to support rolling 
> upgrades from 8.7.x to 9.x because we really ought not to make it harder to 
> upgrade Solr. 
> Tomas offered a helpful path in a now abandoned PR: 
> {quote}One way to get back compatibility and rolling upgrades could be to 
> make 9.x code be able to read previous formats, but write new format, and 
> make 8.x (since 8.7) read new and old, but write old? Anyone wanting to do a 
> rolling upgrade to 9 would have to be on at least 8.7. Rolling upgrades to 
> 8.7 would still work.
> All the code other than the requests/responses could be changed in 8_x 
> branch, in addition to master.
> {quote}
> The approach that we will take is to add a ternary operator in 9_X to accept 
> parameter values for the legacy verbiage, or leader/follower, but only write 
> leader/follower. We need to then make 8_x work in the inverse way. The burden 
> here is not on that proposal or on the code in my view. Instead, the burden 
> is on the test plan.
> If anyone has any guidance please share but here are my thoughts:
> Case A:
> Test the case where a user is running a standalone cluster in 8 with three 
> nodes but then updates one of the nodes.
> Case B:
> Test the case where a user is running a mixed cluster standalone cluster, and 
> the leader node is forced to fail and then is brought back.
> Case C: 
> A SolrCloud cluster that has a mix of 8 and 9 nodes goes down during a 
> rolling upgrade and a follower needs to become leader. 
> I know haven't listed all possible scenarios or everything that could happen. 
> Please let me know if you have thoughts or guidance on how best to accomplish 
> this work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf edited a comment on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


gus-asf edited a comment on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705761838


   HI @uschindler , your comments sound like you think I will be unwilling to 
take feedback. I'm not sure why you feel this way and seem to need to say the 
same thing repeatedly. This is broken out from the original contribution ticket 
for the very purpose of getting feedback. That said I do have a concern with 
the solution that FilteringTokenFilter provides... Consider this case:
   
   **Text**: "January 401(k) contribution"
   **Whitespace tok**: "January"(pi:1) "401l"(pi:1) "contribution"(pi:1)
   **PatternTypingFilter**: "January"(pi:1), "401(k)"(pi:1, flag:2, 
type:legal2_401_k), "contribution"(pi:1)
   **TokenAnalyzerFilter**: "january"(pi:1),"401"(pi:1, flag:2, 
type:legal2_401_k),"k"(pi:1, flag:2, type:legal2_401_k)",  "contribution"(pi:1) 
   
   _(for the record uwe's next 2 comments were unaware of the remainder of this 
comment because I had made an inadvertent submission of the comment at this 
point and he chose to respond before I finished, possibly because it was late 
evening where he is and he didn't want to stay up to wait for me to finish 
which would be understandable)_
   
   Note that TokenAnalyzerFilter is a new class that is and must always be a 
solr concept, (see the solr ticket for that) because it runs an analyzer 
defined in the SolrSchema against the tokens and emits 1 to N replacement 
tokens, (and yes I have adjusted positions, and mapped flags and types etc 
accordingly :) very happy to have feedback if I missed something there). In the 
ccase above I've presumed it includes a standard tokenizer and lowercase filter 
factory. Also See the description of the PatternTypingFilter ticket as well for 
more background on the motivation.
   
   **TypeAsSynonymFilter**: "january"(pi:1),"401"(pi:1, flag:2, 
type:legal2_401_k),"legal2_401_k"(pi:0) "k"(pi:1, flag:2, type:legal2_401_k)", 
"legal2_401_k"(pi:0), "contribution"(pi:1)
   
   Then:
   **DropIfFlaggedFilteringFilterVersion**:   
"january"(pi:1),"legal2_401_k"(pi:1), "legal2_401_k"(pi:1), "contribution"(pi:1)
   **DropIfFlaggedAsWritten**:   "january"(pi:1),"legal2_401_k"(pi:0), 
"legal2_401_k"(pi:0), "contribution"(pi:1)
   
   Either case causes an inaccuracy, The AQP will be adding a convenient user 
friendly syntax for span queries, and the user feedback generally ran in the 
direction of complaining when such queries missed things that were clearly 
within range and rarely complained if something one too far away came back. I 
agree that the case with the first token is handled by the 
FilteringTokenFilter, but it creates a silent problem in the use case that is 
important to the users. So what I've tried to avoid is a bad interaction with 
other expected configuration.
   
   If I have the synonym tokens gain a flag (further enhancement to 
TypeAsSynonymFilter) and create something to remove sequential tokens with the 
same flag then the FilteringTokenFilter solution probably would get to the 
proper position increments, but this additional functionality was not 
achievable within the original timeline. One possibility is that I go back to 
them and (as I told them might happen) indicate that changes will be necessary 
for community acceptance. They are aware of this possibility.
   
   I'll be the first to admit I've spent more time in the Solr areas of the 
project than building Lucene Token Filters so I am of course happy to have 
feedback from folks like you who have been deep into Lucene for many years. I 
certainly wish to improve how I use all aspects of the project including 
Lucene. Learning is never finished. My large comments in the code are there to 
draw attention to what I already knew was an iffy solution. That said I will 
note that for all test cases at the LOC (the solr submission has over 100 unit 
tests and the LOC's original dev branch has another 50 or so that dealt with 
their specific configurations) what I've built does work to their satisfaction, 
so let's try to be cordial and make things better. Improvements and review by 
others is one of the major benefits of community process.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf edited a comment on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


gus-asf edited a comment on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705761838


   HI @uschindler , your comments sound like you think I will be unwilling to 
take feedback. I'm not sure why you feel this way and seem to need to say the 
same thing repeatedly. This is broken out from the original contribution ticket 
for the very purpose of getting feedback. That said I do have a concern with 
the solution that FilteringTokenFilter provides... Consider this case:
   
   **Text**: "January 401(k) contribution"
   **Whitespace tok**: "January"(pi:1) "401(k)"(pi:1) "contribution"(pi:1)
   **PatternTypingFilter**: "January"(pi:1), "401(k)"(pi:1, flag:2, 
type:legal2_401_k), "contribution"(pi:1)
   **TokenAnalyzerFilter**: "january"(pi:1),"401"(pi:1, flag:2, 
type:legal2_401_k),"k"(pi:1, flag:2, type:legal2_401_k)",  "contribution"(pi:1) 
   
   _(for the record uwe's next 2 comments were unaware of the remainder of this 
comment because I had made an inadvertent submission of the comment at this 
point and he chose to respond before I finished, possibly because it was late 
evening where he is and he didn't want to stay up to wait for me to finish 
which would be understandable)_
   
   Note that TokenAnalyzerFilter is a new class that is and must always be a 
solr concept, (see the solr ticket for that) because it runs an analyzer 
defined in the SolrSchema against the tokens and emits 1 to N replacement 
tokens, (and yes I have adjusted positions, and mapped flags and types etc 
accordingly :) very happy to have feedback if I missed something there). In the 
ccase above I've presumed it includes a standard tokenizer and lowercase filter 
factory. Also See the description of the PatternTypingFilter ticket as well for 
more background on the motivation.
   
   **TypeAsSynonymFilter**: "january"(pi:1),"401"(pi:1, flag:2, 
type:legal2_401_k),"legal2_401_k"(pi:0) "k"(pi:1, flag:2, type:legal2_401_k)", 
"legal2_401_k"(pi:0), "contribution"(pi:1)
   
   Then:
   **DropIfFlaggedFilteringFilterVersion**:   
"january"(pi:1),"legal2_401_k"(pi:1), "legal2_401_k"(pi:1), "contribution"(pi:1)
   **DropIfFlaggedAsWritten**:   "january"(pi:1),"legal2_401_k"(pi:0), 
"legal2_401_k"(pi:0), "contribution"(pi:1)
   
   Either case causes an inaccuracy, The AQP will be adding a convenient user 
friendly syntax for span queries, and the user feedback generally ran in the 
direction of complaining when such queries missed things that were clearly 
within range and rarely complained if something one too far away came back. I 
agree that the case with the first token is handled by the 
FilteringTokenFilter, but it creates a silent problem in the use case that is 
important to the users. So what I've tried to avoid is a bad interaction with 
other expected configuration.
   
   If I have the synonym tokens gain a flag (further enhancement to 
TypeAsSynonymFilter) and create something to remove sequential tokens with the 
same flag then the FilteringTokenFilter solution probably would get to the 
proper position increments, but this additional functionality was not 
achievable within the original timeline. One possibility is that I go back to 
them and (as I told them might happen) indicate that changes will be necessary 
for community acceptance. They are aware of this possibility.
   
   I'll be the first to admit I've spent more time in the Solr areas of the 
project than building Lucene Token Filters so I am of course happy to have 
feedback from folks like you who have been deep into Lucene for many years. I 
certainly wish to improve how I use all aspects of the project including 
Lucene. Learning is never finished. My large comments in the code are there to 
draw attention to what I already knew was an iffy solution. That said I will 
note that for all test cases at the LOC (the solr submission has over 100 unit 
tests and the LOC's original dev branch has another 50 or so that dealt with 
their specific configurations) what I've built does work to their satisfaction, 
so let's try to be cordial and make things better. Improvements and review by 
others is one of the major benefits of community process.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler commented on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705832388


   Hi maybe let me explain it:
   With the filter how you implemented it originally, you have the problem of 
the first token (that has a synonym) leading to a 0 position increment on the 
second token. If all token filters dropping tokens would be implemented like 
yours (just dropping away tkens), also StopFilter removing a stop word with 
synonym would lead to the fact. We also have many other token filters like that.
   
   So my suggestion here is the following: Please use the abstract 
implementation for those dropping filters, called `FilteringTokenFilter`. 
Although you think you need to look forward/backwards (which is indeed not 
possible), the correct way to handle this is to change the 
PositionIncrementAttribute on the token following the dropped tokens. If you 
remove a token with no increment (the second synonym), nothing needs to be done 
(increment is 0). If you remove the first token (which has increment 1 or maybe 
2), the increment needs to be moved to the following token. So if the very 
first token of the very first synonmy needs to be removed, the position 
increment (1) needs to be added to the second token. If the second token is a 
synonym, then it gets "upgraded" to first token (0 + 1 = 1). If the second 
token is a standard token, then it get increment (1 + 1 = 2). So the 
tokenstream knows that theres a gap at the first position.
   
   FilteringTokenFilter handles this by taking care of all positions and fixing 
those of following tokens. Read the source code, it just sums up increments of 
dropped tokens and adds them to the next token that is kept. This is 
thouroughly testes.
   
   The proposed TokenFilter here is nothing special, it just behaves like any 
other StopFilter-like filter. The bug you mentuon was fixed in StopFilter about 
10 years ago (this was one of my first commits with @rmuir back at that time).
   
   So please redo this PR:
   - Copypaste the TokenFilter as posted above (very easy)
   - Fix your tests to not expect a "dummy token".
   
   If you have a problem with your other filter, fix that one.
   
   I read your comment above, the discussion is still not correct. The 
tokenfilter graph must be correct. SpanQueries work correctly with 
FilteringTokenFilter.
   
   If you want a different behaviour, open a separate issue, as a change around 
that needs to be done in FilteringTokenFilter, not a subclass or a 
reimplemented filter.
   
   Sorry for being aggressive, but - sorry - the code you posted had really bad 
problems and was not accoring to any coding standards.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler edited a comment on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705832388


   Hi maybe let me explain it:
   With the filter how you implemented it originally, you have the problem of 
the first token (that has a synonym) leading to a 0 position increment on the 
second token. If all token filters dropping tokens would be implemented like 
yours (just dropping away tkens), also StopFilter removing a stop word with 
synonym would lead to the fact. We also have many other token filters like that.
   
   So my suggestion here is the following: Please use the abstract 
implementation for those dropping filters, called `FilteringTokenFilter`. 
Although you think you need to look forward/backwards (which is indeed not 
possible), the correct way to handle this is to change the 
PositionIncrementAttribute on the token following the dropped tokens. If you 
remove a token with no increment (the second synonym), nothing needs to be done 
(increment is 0). If you remove the first token (which has increment 1 or maybe 
2), the increment needs to be moved to the following token. So if the very 
first token of the very first synonmy needs to be removed, the position 
increment (1) needs to be added to the second token. If the second token is a 
synonym, then it gets "upgraded" to first token (0 + 1 = 1). If the second 
token is a standard token, then it get increment (1 + 1 = 2). So the 
tokenstream knows that theres a gap at the first position.
   
   FilteringTokenFilter handles this by taking care of all positions and fixing 
those of following tokens. Read the source code, it just sums up increments of 
dropped tokens and adds them to the next token that is kept. This is 
thouroughly tested, and whole of Lucene's Graph APIs and span queries rely on 
that.
   
   The proposed TokenFilter here is nothing special, it just behaves like any 
other StopFilter-like filter. The bug you mention was fixed in StopFilter about 
10 years ago (this was one of my first commits with @rmuir back at that time).
   
   So please redo this PR:
   - Copypaste the TokenFilter as posted above (very easy)
   - Fix your tests to not expect a "dummy token".
   
   If you have a problem with your other filter, fix that one, the problem is 
not in this filter if it correctly implements FilteringTokenFilter.
   
   I read your comment above, the workaround you propose is still not correct. 
The tokenfilter graph must be correct by fixing positions, not adding dummy 
tokens. SpanQueries work correctly with FilteringTokenFilter, whcih not only 
fixes the very first token (like yours), but also gaps coming later. Your token 
filter makes real gaps disappear.
   
   If for some reason, you don't want to have real gaps (increment > 1) in your 
stream, another tokenfilter making gaps sequential would be needed. So, if you 
want a different behaviour regarding gaps, open a separate issue, as a change 
around that needs to be done in FilteringTokenFilter, not a subclass or a 
reimplemented filter.
   
   Sorry for being aggressive, but - sorry - the code you posted had really bad 
problems and was not accoring to any coding standards.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on pull request #1966: LUCENE-9574 Add DropIfFlaggedFilterFactory

2020-10-08 Thread GitBox


uschindler edited a comment on pull request #1966:
URL: https://github.com/apache/lucene-solr/pull/1966#issuecomment-705832388


   Hi maybe let me explain it:
   With the filter how you implemented it originally, you have the problem of 
the first token (that has a synonym) leading to a 0 position increment on the 
second token. If all token filters dropping tokens would be implemented like 
yours (just dropping away tkens), also StopFilter removing a stop word with 
synonym would lead to the fact. We also have many other token filters like that.
   
   So my suggestion here is the following: Please use the abstract 
implementation for those dropping filters, called `FilteringTokenFilter`. 
Although you think you need to look forward/backwards (which is indeed not 
possible), the correct way to handle this is to change the 
PositionIncrementAttribute on the token following the dropped tokens. If you 
remove a token with no increment (the second synonym), nothing needs to be done 
(increment is 0). If you remove the first token (which has increment 1 or maybe 
2), the increment needs to be moved to the following token. So if the very 
first token of the very first synonmy needs to be removed, the position 
increment (1) needs to be added to the second token. If the second token is a 
synonym, then it gets "upgraded" to first token (0 + 1 = 1). If the second 
token is a standard token, then it get increment (1 + 1 = 2). So the 
tokenstream knows that theres a gap at the first position.
   
   FilteringTokenFilter handles this by taking care of all positions and fixing 
those of following tokens. Read the source code, it just sums up increments of 
dropped tokens and adds them to the next token that is kept. This is 
thouroughly tested, and whole of Lucene's Graph APIs and span queries rely on 
that.
   
   The proposed TokenFilter here is nothing special, it just behaves like any 
other StopFilter-like filter. The bug you mention was fixed in StopFilter about 
10 years ago (this was one of my first commits with @rmuir back at that time).
   
   So please redo this PR:
   - Copypaste the TokenFilter as posted above (very easy)
   - Fix your tests to not expect a "dummy token".
   
   If you have a problem with your other filter, fix that one, the problem is 
not in this filter if it correctly implements FilteringTokenFilter.
   
   I read your comment above, the workaround you propose is still not correct. 
The tokenfilter graph must be correct by fixing positions, not adding dummy 
tokens. SpanQueries work correctly with FilteringTokenFilter, whcih not only 
fixes the very first token (like yours), but also gaps coming later. Your token 
filter makes real gaps disappear.
   
   If for some reason, you don't want to have real gaps (increment > 1) in your 
stream, another tokenfilter making gaps sequential would be needed. So, if you 
want a different behaviour regarding gaps, open a separate issue, as a change 
around that needs to be done in FilteringTokenFilter (unlikely), not a subclass 
or a reimplemented filter. IMHO, a filter removing gaps should be a separate 
one.
   
   Sorry for being aggressive, but - sorry - the code you posted had really bad 
problems and was not accoring to any coding standards.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9474) Change Jenkins jobs to use Gradle for trunk

2020-10-08 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-9474.
---
Resolution: Fixed

All jobs are now using Gradle. Obsolete Jobs removed.

> Change Jenkins jobs to use Gradle for trunk
> ---
>
> Key: LUCENE-9474
> URL: https://issues.apache.org/jira/browse/LUCENE-9474
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/build
>Reporter: Erick Erickson
>Assignee: Uwe Schindler
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png
>
>
> I rushed the gate and pushed LUCENE-9433 without coordinating, my apologies 
> for the confusion.
> Meanwhile, Uwe has disabled Jenkins jobs for the weekend and we'll fix this 
> up Real Soon Now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14659) Remove restlet from Solr

2020-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210484#comment-17210484
 ] 

ASF subversion and git services commented on SOLR-14659:


Commit 02a4c2848d2d0f6f327aac58ff78bce57181dbe9 in lucene-solr's branch 
refs/heads/reference_impl_dev from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=02a4c28 ]

SOLR-14659: Remove restlet as dependency for the ManagedResource API (#1938)

Co-authored-by: noblepaul 


> Remove restlet from Solr
> 
>
> Key: SOLR-14659
> URL: https://issues.apache.org/jira/browse/SOLR-14659
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> restlet is only used by managed resources. We can support that even without a 
> restlet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9577) Move HTML Documentation to a separate subproject

2020-10-08 Thread Uwe Schindler (Jira)
Uwe Schindler created LUCENE-9577:
-

 Summary: Move HTML Documentation to a separate subproject
 Key: LUCENE-9577
 URL: https://issues.apache.org/jira/browse/LUCENE-9577
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Affects Versions: master (9.0)
Reporter: Uwe Schindler
Assignee: Uwe Schindler


Currently Lucene/Solr have some subdirectory "site" containing some fragments 
for CHANGES.txt formatting and the Markdown to produce the site. The global 
documentation for both Lucene and Solr should be built with its own subproject 
{{:lucene:documentation}} and {{:solr:documentation}}.

I will provide a PR that does the following:
- Move Changes.html formatting scripts to gradle subfolder, so they are 
correctly shared between Lucene and Gradle and allows to split project 
(currently they live in Lucene only)
- Move the site contents with correct names a {{src/assets}}, {{src/markdown}} 
into the new documentation projects
- Make packaging of documentation in the projects build.gradle. 
{{:lucene:documentation:assemble}} should assemble the documentation (same for 
Solr) and export it as configuration
- The main packaging will use the artifacts provided to put it into TGZ. Also 
Release manager can build documentation using the above gradlew calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #1961: LUCENE-9567: JPOSSFF loads built-in stop tags by default

2020-10-08 Thread GitBox


msokolov commented on pull request #1961:
URL: https://github.com/apache/lucene-solr/pull/1961#issuecomment-705849891


   looks good - I will merge soon if nobody objects



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14708) Backward-Compatible Replication

2020-10-08 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210498#comment-17210498
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14708:
--

bq. If I understand this work correctly, there should be no compatibility 
issues for someone moving from 8.x to 8.7 if they do not change their 
solrconfig.xml, is that correct? 
Right, that was the intent.

bq. I'm wondering, though, that we should tell them to update their 
solrconfig.xml files manually because if they don't, they will have issues 
upgrading to 9.0? I mean, at some point they will need to excise the old terms 
we're trying to get rid of or else they'll keep carrying it along forever. The 
message I'm thinking would be something like "go ahead and do your rolling 
upgrade, but you should fix your configs after at a convenient time".
The thing is that they can't upgrade their configuration until all the nodes 
are at least in 8.7, because 8.6 won't know how to read those conf changes. 
There are some [upgrade notes in 
9|https://github.com/apache/lucene-solr/pull/1718/files] (which I'm noticing 
don't say anything about solrconfig, so they may need some improvement). The 
point when someone HAS to make changes is when upgrading to 9 (I believe only 
in metrics). Note that Solr 9 can still read the old nomenclature in both, 
configs and parameters.



> Backward-Compatible Replication
> ---
>
> Key: SOLR-14708
> URL: https://issues.apache.org/jira/browse/SOLR-14708
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Marcus Eagan
>Priority: Critical
>
> In [SOLR-14702|https://issues.apache.org/jira/browse/SOLR-14702] I proposed 
> that we remove master/slave terminology from the Solr codebase. Now that's 
> complete, we need to ensure it is backward compatible to support rolling 
> upgrades from 8.7.x to 9.x because we really ought not to make it harder to 
> upgrade Solr. 
> Tomas offered a helpful path in a now abandoned PR: 
> {quote}One way to get back compatibility and rolling upgrades could be to 
> make 9.x code be able to read previous formats, but write new format, and 
> make 8.x (since 8.7) read new and old, but write old? Anyone wanting to do a 
> rolling upgrade to 9 would have to be on at least 8.7. Rolling upgrades to 
> 8.7 would still work.
> All the code other than the requests/responses could be changed in 8_x 
> branch, in addition to master.
> {quote}
> The approach that we will take is to add a ternary operator in 9_X to accept 
> parameter values for the legacy verbiage, or leader/follower, but only write 
> leader/follower. We need to then make 8_x work in the inverse way. The burden 
> here is not on that proposal or on the code in my view. Instead, the burden 
> is on the test plan.
> If anyone has any guidance please share but here are my thoughts:
> Case A:
> Test the case where a user is running a standalone cluster in 8 with three 
> nodes but then updates one of the nodes.
> Case B:
> Test the case where a user is running a mixed cluster standalone cluster, and 
> the leader node is forced to fail and then is brought back.
> Case C: 
> A SolrCloud cluster that has a mix of 8 and 9 nodes goes down during a 
> rolling upgrade and a follower needs to become leader. 
> I know haven't listed all possible scenarios or everything that could happen. 
> Please let me know if you have thoughts or guidance on how best to accomplish 
> this work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler opened a new pull request #1967: LUCENE-9577: Move Lucene/Solr Documentation assembly to subproject

2020-10-08 Thread GitBox


uschindler opened a new pull request #1967:
URL: https://github.com/apache/lucene-solr/pull/1967


   This PR moves the changes.txt formatting, site Javadocs, markdown formatting 
to subprojects (`:lucene:documentation`, `:solr:documentation`.
   
   This is still WIP, not everything works.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9577) Move HTML Documentation to a separate subproject

2020-10-08 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210520#comment-17210520
 ] 

Uwe Schindler commented on LUCENE-9577:
---

First draft PR: https://github.com/apache/lucene-solr/pull/1967

> Move HTML Documentation to a separate subproject
> 
>
> Key: LUCENE-9577
> URL: https://issues.apache.org/jira/browse/LUCENE-9577
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Affects Versions: master (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently Lucene/Solr have some subdirectory "site" containing some fragments 
> for CHANGES.txt formatting and the Markdown to produce the site. The global 
> documentation for both Lucene and Solr should be built with its own 
> subproject {{:lucene:documentation}} and {{:solr:documentation}}.
> I will provide a PR that does the following:
> - Move Changes.html formatting scripts to gradle subfolder, so they are 
> correctly shared between Lucene and Gradle and allows to split project 
> (currently they live in Lucene only)
> - Move the site contents with correct names a {{src/assets}}, 
> {{src/markdown}} into the new documentation projects
> - Make packaging of documentation in the projects build.gradle. 
> {{:lucene:documentation:assemble}} should assemble the documentation (same 
> for Solr) and export it as configuration
> - The main packaging will use the artifacts provided to put it into TGZ. Also 
> Release manager can build documentation using the above gradlew calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1967: LUCENE-9577: Move Lucene/Solr Documentation assembly to subproject

2020-10-08 Thread GitBox


uschindler commented on pull request #1967:
URL: https://github.com/apache/lucene-solr/pull/1967#issuecomment-705884519


   This also removed some unnecessary assets from Solr's documentation. Those 
are leftovers from time before the refguide was included.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9576) IndexWriter::commit() hangs when the server has a stale NFS mount.

2020-10-08 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210553#comment-17210553
 ] 

Robert Muir commented on LUCENE-9576:
-

It hangs in JDK code trying to access the FileStore... so I'm not sure there is 
anything we can do about that. If you try to run software against a stale nfs 
mount, shit is gonna hang.

> IndexWriter::commit() hangs when the server has a stale NFS mount.
> --
>
> Key: LUCENE-9576
> URL: https://issues.apache.org/jira/browse/LUCENE-9576
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.2
>Reporter: Girish Nayak
>Priority: Major
> Attachments: IndexWriter-commit.PNG
>
>
> Noticed IndexWriter::commit() hangs when the server has one or more stale NFS 
> mounts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >