[jira] [Comment Edited] (SOLR-14161) System.ArgumentNullException: Value cannot be null. Parameter name: fieldNameTranslator

2020-01-06 Thread Mohammed (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008603#comment-17008603
 ] 

Mohammed edited comment on SOLR-14161 at 1/6/20 7:59 AM:
-

Thx Erick, 

For now I use the following workaround:

I stopped the Solr service and start it again, then I can delete an item from 
sitecore.


was (Author: lazar):
Thx Erick, 

For now I use the following workaround:

I stopped the Solr service and start it again, then I can delete item from 
sitecore.

> System.ArgumentNullException: Value cannot be null. Parameter name: 
> fieldNameTranslator
> ---
>
> Key: SOLR-14161
> URL: https://issues.apache.org/jira/browse/SOLR-14161
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: website
>Affects Versions: 6.6.2
> Environment: versie:
> Solr 6.6.2
> Sitecore: 9.1
>Reporter: Mohammed
>Priority: Major
>
> {{}}
> Tijdens delete van een item in sitecore 9, treedt de volgende error.
> Graag kan iemand helpen met oplossen van dit issue.
>  
> [ArgumentNullException: Value cannot be null. Parameter name: 
> fieldNameTranslator] 
> Sitecore.ContentSearch.Linq.Solr.SolrIndexParameters..ctor(IIndexValueFormatter
>  valueFormatter, IFieldQueryTranslatorMap`1 fieldQueryTranslators, 
> FieldNameTranslator fieldNameTranslator, IExecutionContext[] 
> executionContexts, IFieldMapReaders fieldMap, Boolean convertQueryDatesToUtc) 
> +328 
> Sitecore.ContentSearch.SolrProvider.LinqToSolrIndex`1..ctor(SolrSearchContext 
> context, IExecutionContext[] executionContexts) +200 
> Sitecore.ContentSearch.SolrProvider.SolrSearchContext.GetQueryable(IExecutionContext[]
>  executionContexts) +271 
> Sitecore.ContentTesting.ContentSearch.TestingSearch.GetRunningTestsInAllLanguages(Item
>  hostItem) +1064 
> Sitecore.ContentTesting.Pipelines.DeleteItems.DeleteTestDefinitionItems.GetConfirmMessage(Item[]
>  contentItems) +53 
> Sitecore.ContentTesting.Pipelines.DeleteItems.DeleteTestDefinitionItems.CheckActiveTests(ClientPipelineArgs
>  args) +140
>  
> [TargetInvocationException: Exception has been thrown by the target of an 
> invocation.] System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] 
> arguments, Signature sig, Boolean constructor) +0 
> System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] 
> parameters, Object[] arguments) +132 
> System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags 
> invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) +146 
> Sitecore.Reflection.ReflectionUtil.InvokeMethod(MethodInfo method, Object[] 
> parameters, Object obj) +89 
> Sitecore.Nexus.Pipelines.NexusPipelineApi.Resume(PipelineArgs args, Pipeline 
> pipeline) +313 Sitecore.Web.UI.Sheer.ClientPage.ResumePipeline() +215 
> Sitecore.Web.UI.Sheer.ClientPage.OnPreRender(EventArgs e) +806 
> Sitecore.Shell.Applications.ContentManager.ContentEditorPage.OnPreRender(EventArgs
>  e) +24 System.Web.UI.Control.PreRenderRecursiveInternal() +132 
> System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, 
> Boolean includeStagesAfterAsyncPoint) +4005
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on issue #1136: LUCENE-9113: Speed up merging doc values' terms dictionaries.

2020-01-06 Thread GitBox
jpountz commented on issue #1136: LUCENE-9113: Speed up merging doc values' 
terms dictionaries.
URL: https://github.com/apache/lucene-solr/pull/1136#issuecomment-571043073
 
 
   Thanks @rmuir !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1136: LUCENE-9113: Speed up merging doc values' terms dictionaries.

2020-01-06 Thread GitBox
jpountz merged pull request #1136: LUCENE-9113: Speed up merging doc values' 
terms dictionaries.
URL: https://github.com/apache/lucene-solr/pull/1136
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9113) Speed up merging doc values terms dictionaries

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008604#comment-17008604
 ] 

ASF subversion and git services commented on LUCENE-9113:
-

Commit dcc01fdaa6841a94613f68b419799523a157fe4a in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=dcc01fd ]

LUCENE-9113: Speed up merging doc values' terms dictionaries. (#1136)



> Speed up merging doc values terms dictionaries
> --
>
> Key: LUCENE-9113
> URL: https://issues.apache.org/jira/browse/LUCENE-9113
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The default {{DocValuesConsumer#mergeSortedField}} and 
> {{DocValuesConsumer#mergeSortedSetField}} implementations create a merged 
> view of the doc values producers to merge. Unfortunately, it doesn't override 
> {{termsEnum()}}, whose default implementation of {{next()}} increments the 
> ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
> {{lookupOrd()}} doesn't take advantage of its current position, and would 
> seek to the block start and then call {{next()}} up to 16 times to go to the 
> desired term. While there are discussions to optimize lookups to take 
> advantage of the current ord (LUCENE-8836), it shouldn't be required for 
> merging to be efficient and we should instead make {{next()}} call {{next()}} 
> on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1074: BlockTreeTermsWriter should compute prefix lengths using Arrays#mismatch.

2020-01-06 Thread GitBox
jpountz merged pull request #1074: BlockTreeTermsWriter should compute prefix 
lengths using Arrays#mismatch.
URL: https://github.com/apache/lucene-solr/pull/1074
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1047: MINOR: Fix Incorrect Constant Name in Codec Docs

2020-01-06 Thread GitBox
jpountz merged pull request #1047: MINOR: Fix Incorrect Constant Name in Codec 
Docs
URL: https://github.com/apache/lucene-solr/pull/1047
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] janhoy commented on issue #8: Simple build script

2020-01-06 Thread GitBox
janhoy commented on issue #8: Simple build script
URL: https://github.com/apache/lucene-site/pull/8#issuecomment-571044089
 
 
   So any more feedback on this script? Should we try to do this "the pelican 
way" or just a simple script? Publishing of the site will happen through 
merging to staging/production branch so no need for fancy scripts there...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #964: LUCENE-9023: GlobalOrdinalsWithScore should not compute occurrences when the provided min is 1

2020-01-06 Thread GitBox
jpountz commented on a change in pull request #964: LUCENE-9023: 
GlobalOrdinalsWithScore should not compute occurrences when the provided min is 
1
URL: https://github.com/apache/lucene-solr/pull/964#discussion_r363191004
 
 

 ##
 File path: lucene/CHANGES.txt
 ##
 @@ -52,6 +52,8 @@ Improvements
 
 * LUCENE-8937: Avoid agressive stemming on numbers in the FrenchMinimalStemmer.
   (Adrien Gallou via Tomoko Uchida)
+
+* LUCENE-9023: GlobalOrdinalsWithScore should not compute occurrences when the 
provided min is 1
 
 Review comment:
   ```suggestion
   * LUCENE-9023: GlobalOrdinalsWithScore should not compute occurrences when 
the
 provided min is 1. (Jim Ferenczi)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1125: LUCENE-9096: Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread GitBox
jpountz merged pull request #1125: LUCENE-9096: Implementation of 
CompressingTermVectorsWriter.flushOffsets can be simpler
URL: https://github.com/apache/lucene-solr/pull/1125
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008613#comment-17008613
 ] 

ASF subversion and git services commented on LUCENE-9096:
-

Commit 2db4c909ca10c0d7edda0c94622fa1369833 in lucene-solr's branch 
refs/heads/master from kkewwei
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2db4c90 ]

LUCENE-9096:Simplify CompressingTermVectorsWriter#flushOffsets. (#1125)



> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008618#comment-17008618
 ] 

ASF subversion and git services commented on LUCENE-9096:
-

Commit 6bb1f6cbbe8accefbfd30b8ee74924ad43ddc356 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6bb1f6c ]

LUCENE-9096: CHANGES entry.


> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9096.
--
Fix Version/s: 8.5
   Resolution: Fixed

Thanks [~kkewwei].

> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9113) Speed up merging doc values terms dictionaries

2020-01-06 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9113.
--
Fix Version/s: 8.5
   Resolution: Fixed

> Speed up merging doc values terms dictionaries
> --
>
> Key: LUCENE-9113
> URL: https://issues.apache.org/jira/browse/LUCENE-9113
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The default {{DocValuesConsumer#mergeSortedField}} and 
> {{DocValuesConsumer#mergeSortedSetField}} implementations create a merged 
> view of the doc values producers to merge. Unfortunately, it doesn't override 
> {{termsEnum()}}, whose default implementation of {{next()}} increments the 
> ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
> {{lookupOrd()}} doesn't take advantage of its current position, and would 
> seek to the block start and then call {{next()}} up to 16 times to go to the 
> desired term. While there are discussions to optimize lookups to take 
> advantage of the current ord (LUCENE-8836), it shouldn't be required for 
> merging to be efficient and we should instead make {{next()}} call {{next()}} 
> on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008628#comment-17008628
 ] 

ASF subversion and git services commented on LUCENE-9096:
-

Commit 7d6067000cdfcece70c15ce74a5727e56729fdc4 in lucene-solr's branch 
refs/heads/branch_8x from kkewwei
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7d60670 ]

LUCENE-9096:Simplify CompressingTermVectorsWriter#flushOffsets. (#1125)



> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9113) Speed up merging doc values terms dictionaries

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008627#comment-17008627
 ] 

ASF subversion and git services commented on LUCENE-9113:
-

Commit f6c2cb21379044b04f201567d5017ca81624821c in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f6c2cb2 ]

LUCENE-9113: Speed up merging doc values' terms dictionaries. (#1136)



> Speed up merging doc values terms dictionaries
> --
>
> Key: LUCENE-9113
> URL: https://issues.apache.org/jira/browse/LUCENE-9113
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The default {{DocValuesConsumer#mergeSortedField}} and 
> {{DocValuesConsumer#mergeSortedSetField}} implementations create a merged 
> view of the doc values producers to merge. Unfortunately, it doesn't override 
> {{termsEnum()}}, whose default implementation of {{next()}} increments the 
> ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
> {{lookupOrd()}} doesn't take advantage of its current position, and would 
> seek to the block start and then call {{next()}} up to 16 times to go to the 
> desired term. While there are discussions to optimize lookups to take 
> advantage of the current ord (LUCENE-8836), it shouldn't be required for 
> merging to be efficient and we should instead make {{next()}} call {{next()}} 
> on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008629#comment-17008629
 ] 

ASF subversion and git services commented on LUCENE-9096:
-

Commit e2b39bd0ff8241c13296c7388924cb3f4e7ad9b8 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e2b39bd ]

LUCENE-9096: CHANGES entry.


> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13089) bin/solr's use of lsof has some issues

2020-01-06 Thread Martijn Koster (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008684#comment-17008684
 ] 

Martijn Koster commented on SOLR-13089:
---

LGTM

> bin/solr's use of lsof has some issues
> --
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCLI
>Reporter: Martijn Koster
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr 
> port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at 
> [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your 
> effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
>  works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc  26580  mak3u  IPv4 2818104  0t0  TCP *:7788 (LISTEN)
>  fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
>  works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd2524 root3u  IPv4  18426  0t0  TCP *:22 (LISTEN)
> sshd2524 root4u  IPv6  18428  0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you 
> will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using 
> {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in 
> {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R  /opt/; gosu  
> solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u  
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 
> COMMAND PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 
> java  9   115u  IPv4 2813503  0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 
> -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 
> lsof: no pwd entry for UID 
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should 
> produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible 
> arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the 
> arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code 
> would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v 
> $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_liskov bash
> bash-4.4$ cat server/logs/solr-8983-console.log 
> Unrecognized option: --invalid
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {noformat}
> so it is saying Solr is running, when it isn't.
> Now, all this can be avoided by j

[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-06 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008703#comment-17008703
 ] 

Adrien Grand commented on LUCENE-8673:
--

The test already uses a FSDirectory for large numbers of documents. The seed 
doesn't reproduce for me but I suspect that this is related to the fact that 
the test framework randomly wraps with NRTCachingDirectory. If I add some 
logging I'm seeing about 50MB spent on the RAMDirectory even though it's 
configured with a max size of 500kB.

> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14158:
--
Priority: Blocker  (was: Major)

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14158:
--
Fix Version/s: 8.4.1

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.4.1
>
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14158:
--
Affects Version/s: 8.4

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008755#comment-17008755
 ] 

Jan Høydahl commented on SOLR-14158:


This should go in 8.5 and not be a blocker. It has ALWAYS been the case that a 
production Solr cluster needs a secure Zookeeper one way or another. Nothing 
has changed here.

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.4.1
>
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008760#comment-17008760
 ] 

Noble Paul commented on SOLR-14158:
---

The problem is anyone who uses this new feature will have a backward 
incompatible system that's insecure by nature.
The threat levels are much higher in this case. An attacker can run malicious 
code if ZK is compromised. We should not leave this hole open

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.4.1
>
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14169) Fix 20 Resource Leak warnings in apache/solr/common

2020-01-06 Thread Andras Salamon (Jira)
Andras Salamon created SOLR-14169:
-

 Summary: Fix 20 Resource Leak warnings in apache/solr/common
 Key: SOLR-14169
 URL: https://issues.apache.org/jira/browse/SOLR-14169
 Project: Solr
  Issue Type: Sub-task
Reporter: Andras Salamon


There are 20 resource leak warnings in {{apache/solr/common}}
{noformat}
 [ecj-lint] 5. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
 (at line 98) [ecj-lint] 5. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
 (at line 98) [ecj-lint]  props = (Map) new 
JavaBinCodec().unmarshal(bytes); [ecj-lint]                                
^^ [ecj-lint] Resource leak: '' is 
never closed-- [ecj-lint] 6. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/util/Utils.java
 (at line 206) [ecj-lint]  new SolrJSONWriter(writer) [ecj-lint]  
^^ [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 2. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
 (at line 50) [ecj-lint]  try (InputStream is = new 
SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                   
      [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 3. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
 (at line 73) [ecj-lint]  try (InputStream is = new 
SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                   
      [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 4. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
 (at line 98) [ecj-lint]  try (InputStream is = new 
SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                   
      [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 5. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
 (at line 127) [ecj-lint]  try (InputStream is = new 
SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                   
      [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 6. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
 (at line 152) [ecj-lint]  try (InputStream is = new 
SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                   
      [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 7. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
 (at line 177) [ecj-lint]  try (InputStream is = new 
SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                   
      [ecj-lint] Resource leak: '' is never closed-- [ecj-lint] 8. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
 (at line 48) [ecj-lint]  JavaBinCodec codec = new JavaBinCodec(faos, null); 
[ecj-lint]               ^ [ecj-lint] Resource leak: 'codec' is never 
closed-- [ecj-lint] 9. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
 (at line 58) [ecj-lint]  FastJavaBinDecoder.StreamCodec scodec = new 
FastJavaBinDecoder.StreamCodec(fis); [ecj-lint]                                 
^^ [ecj-lint] Resource leak: 'scodec' is never closed-- [ecj-lint] 10. 
WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
 (at line 81) [ecj-lint]  new JavaBinCodec().marshal(m, baos); [ecj-lint]  
^^ [ecj-lint] Resource leak: '' is 
never closed-- [ecj-lint] 11. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
 (at line 83) [ecj-lint]  Map m2 = (Map) new JavaBinCodec().unmarshal(new 
FastInputStream(null, baos.getbuf(), 0, baos.size())); [ecj-lint]               
  ^^ [ecj-lint] Resource leak: '' 
is never closed-- [ecj-lint] 12. WARNING in 
/Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
 (at line 124) [ecj-lint]  SimpleOrderedMap o = (SimpleOrderedMap) new 
JavaBinCodec().unmarshal(baos.toByteArray()); [ecj-lint]                        
                  ^^ [ecj-lint] Resource lea

[jira] [Updated] (SOLR-14169) Fix 20 Resource Leak warnings in apache/solr/common

2020-01-06 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-14169:
--
Attachment: SOLR-14169-01.patch
Status: Open  (was: Open)

> Fix 20 Resource Leak warnings in apache/solr/common
> ---
>
> Key: SOLR-14169
> URL: https://issues.apache.org/jira/browse/SOLR-14169
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14169-01.patch
>
>
> There are 20 resource leak warnings in {{apache/solr/common}}
> {noformat}
>  [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
>  (at line 98) [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
>  (at line 98) [ecj-lint]  props = (Map) new 
> JavaBinCodec().unmarshal(bytes); [ecj-lint]                                
> ^^ [ecj-lint] Resource leak: '' 
> is never closed-- [ecj-lint] 6. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/util/Utils.java
>  (at line 206) [ecj-lint]  new SolrJSONWriter(writer) [ecj-lint]  
> ^^ [ecj-lint] Resource leak: ' value>' is never closed-- [ecj-lint] 2. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 50) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 3. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 73) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 4. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 98) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 127) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 6. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 152) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 7. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 177) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 8. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 48) [ecj-lint]  JavaBinCodec codec = new JavaBinCodec(faos, null); 
> [ecj-lint]               ^ [ecj-lint] Resource leak: 'codec' is never 
> closed-- [ecj-lint] 9. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 58) [ecj-lint]  FastJavaBinDecoder.StreamCodec scodec = new 
> FastJavaBinDecoder.StreamCodec(fis); [ecj-lint]                               
>   ^^ [ecj-lint] Resource leak: 'scodec' is never closed-- [ecj-lint] 10. 
> WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 81) [ecj-lint]  new JavaBinCodec().marshal(m, baos); [ecj-lint]  
> ^^ [ecj-lint] Resource leak: '' 
> is never closed-- [ecj-lint] 11. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 83) [ecj-lint]  Map m2 = (Map) new JavaBinCodec().unm

[jira] [Updated] (SOLR-14169) Fix 20 Resource Leak warnings in apache/solr/common

2020-01-06 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-14169:
--
Status: Patch Available  (was: Open)

> Fix 20 Resource Leak warnings in apache/solr/common
> ---
>
> Key: SOLR-14169
> URL: https://issues.apache.org/jira/browse/SOLR-14169
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14169-01.patch
>
>
> There are 20 resource leak warnings in {{apache/solr/common}}
> {noformat}
>  [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
>  (at line 98) [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
>  (at line 98) [ecj-lint]  props = (Map) new 
> JavaBinCodec().unmarshal(bytes); [ecj-lint]                                
> ^^ [ecj-lint] Resource leak: '' 
> is never closed-- [ecj-lint] 6. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/util/Utils.java
>  (at line 206) [ecj-lint]  new SolrJSONWriter(writer) [ecj-lint]  
> ^^ [ecj-lint] Resource leak: ' value>' is never closed-- [ecj-lint] 2. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 50) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 3. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 73) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 4. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 98) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 127) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 6. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 152) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 7. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 177) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 8. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 48) [ecj-lint]  JavaBinCodec codec = new JavaBinCodec(faos, null); 
> [ecj-lint]               ^ [ecj-lint] Resource leak: 'codec' is never 
> closed-- [ecj-lint] 9. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 58) [ecj-lint]  FastJavaBinDecoder.StreamCodec scodec = new 
> FastJavaBinDecoder.StreamCodec(fis); [ecj-lint]                               
>   ^^ [ecj-lint] Resource leak: 'scodec' is never closed-- [ecj-lint] 10. 
> WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 81) [ecj-lint]  new JavaBinCodec().marshal(m, baos); [ecj-lint]  
> ^^ [ecj-lint] Resource leak: '' 
> is never closed-- [ecj-lint] 11. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/TestFastJavabinDecoder.java
>  (at line 83) [ecj-lint]  Map m2 = (Map) new JavaBinCodec().unmarshal(new 
> FastInputStream

[jira] [Commented] (SOLR-13089) bin/solr's use of lsof has some issues

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008796#comment-17008796
 ] 

ASF subversion and git services commented on SOLR-13089:


Commit ac777a5352224b2c8f46836f0e078809308fc2d8 in lucene-solr's branch 
refs/heads/master from Martijn Koster
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ac777a5 ]

SOLR-13089: Fix lsof edge cases in the solr CLI script


> bin/solr's use of lsof has some issues
> --
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCLI
>Reporter: Martijn Koster
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr 
> port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at 
> [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your 
> effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
>  works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc  26580  mak3u  IPv4 2818104  0t0  TCP *:7788 (LISTEN)
>  fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
>  works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd2524 root3u  IPv4  18426  0t0  TCP *:22 (LISTEN)
> sshd2524 root4u  IPv6  18428  0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you 
> will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using 
> {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in 
> {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R  /opt/; gosu  
> solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u  
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 
> COMMAND PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 
> java  9   115u  IPv4 2813503  0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 
> -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 
> lsof: no pwd entry for UID 
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should 
> produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible 
> arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the 
> arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code 
> would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v 
> $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_liskov bash
> bash-4.4$ cat server/logs/s

[jira] [Commented] (SOLR-13089) bin/solr's use of lsof has some issues

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008798#comment-17008798
 ] 

ASF subversion and git services commented on SOLR-13089:


Commit 2aa739ae873b8b1c9dac4a42daa9e790ebdf700e in lucene-solr's branch 
refs/heads/branch_8x from Martijn Koster
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2aa739a ]

SOLR-13089: Fix lsof edge cases in the solr CLI script

(cherry picked from commit ac777a5352224b2c8f46836f0e078809308fc2d8)


> bin/solr's use of lsof has some issues
> --
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCLI
>Reporter: Martijn Koster
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 8.5
>
> Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr 
> port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at 
> [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your 
> effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
>  works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc  26580  mak3u  IPv4 2818104  0t0  TCP *:7788 (LISTEN)
>  fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
>  works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd2524 root3u  IPv4  18426  0t0  TCP *:22 (LISTEN)
> sshd2524 root4u  IPv6  18428  0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you 
> will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using 
> {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in 
> {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R  /opt/; gosu  
> solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u  
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 
> COMMAND PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 
> java  9   115u  IPv4 2813503  0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 
> -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 
> lsof: no pwd entry for UID 
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should 
> produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible 
> arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the 
> arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code 
> would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v 
> $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another

[jira] [Updated] (SOLR-13089) bin/solr's use of lsof has some issues

2020-01-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-13089:
---
Fix Version/s: 8.5

> bin/solr's use of lsof has some issues
> --
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCLI
>Reporter: Martijn Koster
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 8.5
>
> Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr 
> port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at 
> [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your 
> effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
>  works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc  26580  mak3u  IPv4 2818104  0t0  TCP *:7788 (LISTEN)
>  fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
>  works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd2524 root3u  IPv4  18426  0t0  TCP *:22 (LISTEN)
> sshd2524 root4u  IPv6  18428  0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you 
> will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using 
> {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in 
> {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R  /opt/; gosu  
> solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u  
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 
> COMMAND PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 
> java  9   115u  IPv4 2813503  0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 
> -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 
> lsof: no pwd entry for UID 
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should 
> produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible 
> arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the 
> arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code 
> would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v 
> $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_liskov bash
> bash-4.4$ cat server/logs/solr-8983-console.log 
> Unrecognized option: --invalid
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {noformat}
> so it is saying Solr is running, when it isn't.
> Now, all this can be avoided by just installing th

[jira] [Resolved] (SOLR-13089) bin/solr's use of lsof has some issues

2020-01-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-13089.

Resolution: Fixed

> bin/solr's use of lsof has some issues
> --
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCLI
>Reporter: Martijn Koster
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 8.5
>
> Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr 
> port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at 
> [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your 
> effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
>  works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc  26580  mak3u  IPv4 2818104  0t0  TCP *:7788 (LISTEN)
>  fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
>  works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd2524 root3u  IPv4  18426  0t0  TCP *:22 (LISTEN)
> sshd2524 root4u  IPv6  18428  0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you 
> will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using 
> {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in 
> {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R  /opt/; gosu  
> solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u  
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 
> COMMAND PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 
> java  9   115u  IPv4 2813503  0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 
> -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 
> lsof: no pwd entry for UID 
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should 
> produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible 
> arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the 
> arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code 
> would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v 
> $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_liskov bash
> bash-4.4$ cat server/logs/solr-8983-console.log 
> Unrecognized option: --invalid
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {noformat}
> so it is saying Solr is running, when it isn't.
> Now, all this can be avoided by just installing t

[jira] [Commented] (SOLR-14169) Fix 20 Resource Leak warnings in apache/solr/common

2020-01-06 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008815#comment-17008815
 ] 

Lucene/Solr QA commented on SOLR-14169:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
30s{color} | {color:green} solrj in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  9m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14169 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12990001/SOLR-14169-01.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / ac777a53522 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/646/testReport/ |
| modules | C: solr/solrj U: solr/solrj |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/646/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Fix 20 Resource Leak warnings in apache/solr/common
> ---
>
> Key: SOLR-14169
> URL: https://issues.apache.org/jira/browse/SOLR-14169
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14169-01.patch
>
>
> There are 20 resource leak warnings in {{apache/solr/common}}
> {noformat}
>  [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
>  (at line 98) [ecj-lint] 5. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java
>  (at line 98) [ecj-lint]  props = (Map) new 
> JavaBinCodec().unmarshal(bytes); [ecj-lint]                                
> ^^ [ecj-lint] Resource leak: '' 
> is never closed-- [ecj-lint] 6. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/java/org/apache/solr/common/util/Utils.java
>  (at line 206) [ecj-lint]  new SolrJSONWriter(writer) [ecj-lint]  
> ^^ [ecj-lint] Resource leak: ' value>' is never closed-- [ecj-lint] 2. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 50) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 3. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 73) [ecj-lint]  try (InputStream is = new 
> SolrResourceLoader().openResource("solrj/README"); [ecj-lint]                 
>         [ecj-lint] Resource leak: ' Closeable value>' is never closed-- [ecj-lint] 4. WARNING in 
> /Users/andrassalamon/src/lucene-solr-upstream/solr/solrj/src/test/org/apache/solr/common/util/ContentStreamTest.java
>  (at line 98) 

[GitHub] [lucene-solr] kaynewu opened a new pull request #1143: HdfsDirectory support createTempOutput

2020-01-06 Thread GitBox
kaynewu opened a new pull request #1143: HdfsDirectory support createTempOutput
URL: https://github.com/apache/lucene-solr/pull/1143
 
 
   HdfsDirectory support createTempOutput


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1142: SOLR-14166: fq cache=false should use TwoPhaseIterator

2020-01-06 Thread GitBox
dsmiley commented on a change in pull request #1142: SOLR-14166: fq cache=false 
should use TwoPhaseIterator
URL: https://github.com/apache/lucene-solr/pull/1142#discussion_r363290923
 
 

 ##
 File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java
 ##
 @@ -237,21 +237,25 @@ public FeatureScorer scorer(LeafReaderContext context) 
throws IOException {
  * @return DocIdSetIterator to traverse documents that matched all filter
  * criteria
  */
+// TODO it's not optimal to call getProcessedFilter per-segment!  Save the 
results into one Query
+// TODO rename to "FromFilterQueries" suffix to at least suggest this uses 
the filter cache
 private DocIdSetIterator getDocIdSetIteratorFromQueries(List 
queries,
 LeafReaderContext context) throws IOException {
   final SolrIndexSearcher.ProcessedFilter pf = ((SolrIndexSearcher) 
searcher)
   .getProcessedFilter(null, queries);
-  final Bits liveDocs = context.reader().getLiveDocs();
-
-  DocIdSetIterator idIter = null;
-  if (pf.filter != null) {
-final DocIdSet idSet = pf.filter.getDocIdSet(context, liveDocs);
-if (idSet != null) {
-  idIter = idSet.iterator();
-}
+  if (pf.postFilter != null) {
+throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
+"PostFilter queries are not supported");
   }
-
-  return idIter;
+  Query q = pf.filter;
+  if (q == null) {
+q = new MatchAllDocsQuery(); // usually never happens?
+  }
+  Scorer scorer = q.createWeight(searcher, ScoreMode.COMPLETE_NO_SCORES, 
1f).scorer(context);
 
 Review comment:
   Ah; good catch!  No it's not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1142: SOLR-14166: fq cache=false should use TwoPhaseIterator

2020-01-06 Thread GitBox
dsmiley commented on a change in pull request #1142: SOLR-14166: fq cache=false 
should use TwoPhaseIterator
URL: https://github.com/apache/lucene-solr/pull/1142#discussion_r363293567
 
 

 ##
 File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java
 ##
 @@ -237,21 +237,25 @@ public FeatureScorer scorer(LeafReaderContext context) 
throws IOException {
  * @return DocIdSetIterator to traverse documents that matched all filter
  * criteria
  */
+// TODO it's not optimal to call getProcessedFilter per-segment!  Save the 
results into one Query
+// TODO rename to "FromFilterQueries" suffix to at least suggest this uses 
the filter cache
 private DocIdSetIterator getDocIdSetIteratorFromQueries(List 
queries,
 LeafReaderContext context) throws IOException {
   final SolrIndexSearcher.ProcessedFilter pf = ((SolrIndexSearcher) 
searcher)
   .getProcessedFilter(null, queries);
-  final Bits liveDocs = context.reader().getLiveDocs();
-
-  DocIdSetIterator idIter = null;
-  if (pf.filter != null) {
-final DocIdSet idSet = pf.filter.getDocIdSet(context, liveDocs);
-if (idSet != null) {
-  idIter = idSet.iterator();
-}
+  if (pf.postFilter != null) {
+throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
+"PostFilter queries are not supported");
   }
-
-  return idIter;
+  Query q = pf.filter;
+  if (q == null) {
+q = new MatchAllDocsQuery(); // usually never happens?
+  }
+  Scorer scorer = q.createWeight(searcher, ScoreMode.COMPLETE_NO_SCORES, 
1f).scorer(context);
+  if (scorer != null) {
+return scorer.iterator();
 
 Review comment:
   This particular method on this class for LTR wants to return a 
DocIdSetIterator.  You are correct that this method will not completely benefit 
from TwoPhaseIterator as-designed.  It will benefit from the cost ordering 
aspect though.  I like your suggestion of returning a Scorer, thus enabling the 
caller to _potentially_ use it better (it does not today).  But I don't want to 
scope creep this PR into the LTR module more than necessary to accomplish the 
primary goal of the PR.  If what you propose is pretty simple then it can be 
done now but I don't see it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008846#comment-17008846
 ] 

David Smiley commented on SOLR-14158:
-

This is perhaps a bigger issue that needs discussion on the dev list.  It gets 
at Solr's security posture and what assumptions we have about securing Solr.  
I'm for/against what's happening in the issue but just want more eye-balls on 
it.

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.4.1
>
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008846#comment-17008846
 ] 

David Smiley edited comment on SOLR-14158 at 1/6/20 1:36 PM:
-

This is perhaps a bigger issue that needs discussion on the dev list.  It gets 
at Solr's security posture and what assumptions we have about securing Solr.  
I'm not for or against what's happening in the issue; I just want more 
eye-balls on it.


was (Author: dsmiley):
This is perhaps a bigger issue that needs discussion on the dev list.  It gets 
at Solr's security posture and what assumptions we have about securing Solr.  
I'm for/against what's happening in the issue but just want more eye-balls on 
it.

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.4.1
>
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9115) NRTCachingDirectory may put large files in the cache

2020-01-06 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9115:


 Summary: NRTCachingDirectory may put large files in the cache
 Key: LUCENE-9115
 URL: https://issues.apache.org/jira/browse/LUCENE-9115
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand


NRTCachingDirectory assumes that the length of a file to write is 0 if there is 
no merge info or flush info. This is not correct as there are situations when 
Lucene might write very large files that have neither of them, for instance:
 - Stored fields are written on the fly with IOContext.DEFAULT (which doesn't 
have flush or merge info) and without taking any of the IndexWriter buffer, so 
gigabytes could be written before a flush happens.
 - BKD trees are merged with IOContext.DEFAULT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zsgyulavari opened a new pull request #1144: SOLR-13756 updated restlet mvn repository url.

2020-01-06 Thread GitBox
zsgyulavari opened a new pull request #1144: SOLR-13756 updated restlet mvn 
repository url.
URL: https://github.com/apache/lucene-solr/pull/1144
 
 
   
   
   
   # Description
   
   Updated old repository URL for the restlet framework to the current official 
stated at:
   https://restlet.talend.com/downloads/current/
   
   # Solution
   
   The old repository URL does a redirect for the new one, but Ivy fails to 
follow it on some platforms. The redirect also points to the updated URL.
   
   # Tests
   
   It could be compiled even after deleting the local ivy cache using `rm -rf 
~/.ivy2/cache` .
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [n/a] I have added tests for my changes.
   - [n/a] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008854#comment-17008854
 ] 

ASF subversion and git services commented on LUCENE-8673:
-

Commit b6f31835ad18da0f7a22064481b0d0e167f9f30c in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b6f3183 ]

LUCENE-8673: Avoid OOMEs because of IOContext randomization.


> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008853#comment-17008853
 ] 

ASF subversion and git services commented on LUCENE-8673:
-

Commit 83999401ae9d3b23d14fe880adeb4fc57358bc2a in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8399940 ]

LUCENE-8673: Avoid OOMEs because of IOContext randomization.


> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2020-01-06 Thread Zsolt Gyulavari (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008855#comment-17008855
 ] 

Zsolt Gyulavari commented on SOLR-13756:


Created a PR with the change mentioned above, please review:
[https://github.com/apache/lucene-solr/pull/1144]

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14170) Tag package feature as experimental

2020-01-06 Thread Jira
Jan Høydahl created SOLR-14170:
--

 Summary: Tag package feature as experimental
 Key: SOLR-14170
 URL: https://issues.apache.org/jira/browse/SOLR-14170
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
  Components: documentation
Reporter: Jan Høydahl


The new package store and package installation feature introduced in 8.4 was 
supposed to be tagged as lucene.experimental with a clear warning in ref-guide 
"Not yet recommended for production use"

Let's add that for 8.5 so there is no doubt that if you use the feature you 
know the risks. Once the APIs have stabilized and there are a number of 
packages available "in the wild", we can decide to release it as a "GA" 
feature, but not yet!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14170) Tag package feature as experimental

2020-01-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-14170:
---
Fix Version/s: 8.5

> Tag package feature as experimental
> ---
>
> Key: SOLR-14170
> URL: https://issues.apache.org/jira/browse/SOLR-14170
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
> Fix For: 8.5
>
>
> The new package store and package installation feature introduced in 8.4 was 
> supposed to be tagged as lucene.experimental with a clear warning in 
> ref-guide "Not yet recommended for production use"
> Let's add that for 8.5 so there is no doubt that if you use the feature you 
> know the risks. Once the APIs have stabilized and there are a number of 
> packages available "in the wild", we can decide to release it as a "GA" 
> feature, but not yet!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1145: LUCENE-9115: NRTCachingDirectory shouldn't cache files of unknown size.

2020-01-06 Thread GitBox
jpountz opened a new pull request #1145: LUCENE-9115: NRTCachingDirectory 
shouldn't cache files of unknown size.
URL: https://github.com/apache/lucene-solr/pull/1145
 
 
   See https://issues.apache.org/jira/browse/LUCENE-9115.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-06 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008872#comment-17008872
 ] 

Adrien Grand commented on LUCENE-8673:
--

I dug a bit more. This failure is mostly due to IOContext randomization that 
can make NRTCachingDirectory put large files in the cache. I fixed the test 
framework to not use NRTCachingDirectory when the test requests a FSDirectory 
in order to avoid this issue. Separately I found an issue with 
NRTCachingDirectory, but it is not the root cause of these failures: 
LUCENE-9115.

> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9112) OpenNLP tokenizer is fooled by text containing spurious punctuation

2020-01-06 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008871#comment-17008871
 ] 

Markus Jelsma commented on LUCENE-9112:
---

SegmentingTokenizerBase works fine on texts smaller than 1024. Any term that 
occupies the 1024th position is split due to this bug. Ideally, the class 
should refill the buffer and move on for each full sentence it takes, there are 
hardly any sentences over 1024 characters. But judging from the println i see, 
it does not do that, or incorrectly.

I am going to work around the problem for now by splitting my text into 
paragraphs using newlines. However, paragraphs larger than 1024 will be a 
problem. I have checked my text sources on paragraph length and they usually do 
not exceed it, but paragraphs longer than 1024 are common enough, so i'll 
attach the simplest patch that 'fixes' that part for my case.

> OpenNLP tokenizer is fooled by text containing spurious punctuation
> ---
>
> Key: LUCENE-9112
> URL: https://issues.apache.org/jira/browse/LUCENE-9112
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: master (9.0)
>Reporter: Markus Jelsma
>Priority: Major
>  Labels: opennlp
> Fix For: master (9.0)
>
> Attachments: LUCENE-9112-unittest.patch, LUCENE-9112-unittest.patch, 
> en-sent.bin, en-token.bin
>
>
> The OpenNLP tokenizer show weird behaviour when text contains spurious 
> punctuation such as having triple dots trailing a sentence...
> # the first dot becomes part of the token, having 'sentence.' becomes the 
> token
> # much further down the text, a seemingly unrelated token is then suddenly 
> split up, in my example (see attached unit test) the name 'Baron' is  split 
> into 'Baro' and 'n', this is the real problem
> The problems never seem to occur when using small texts in unit tests but it 
> certainly does in real world examples. Depending on how many 'spurious' dots, 
> a completely different term can become split, or the same term in just a 
> different location.
> I am not too sure if this is actually a problem in the Lucene code, but it is 
> a problem and i have a Lucene unit test proving the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14170) Tag package feature as experimental

2020-01-06 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya reassigned SOLR-14170:
---

Assignee: Ishan Chattopadhyaya

> Tag package feature as experimental
> ---
>
> Key: SOLR-14170
> URL: https://issues.apache.org/jira/browse/SOLR-14170
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.5
>
>
> The new package store and package installation feature introduced in 8.4 was 
> supposed to be tagged as lucene.experimental with a clear warning in 
> ref-guide "Not yet recommended for production use"
> Let's add that for 8.5 so there is no doubt that if you use the feature you 
> know the risks. Once the APIs have stabilized and there are a number of 
> packages available "in the wild", we can decide to release it as a "GA" 
> feature, but not yet!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1146: SOLR-6613: TextField.analyzeMultiTerm does not throw an exception…

2020-01-06 Thread GitBox
bruno-roustant opened a new pull request #1146: SOLR-6613: 
TextField.analyzeMultiTerm does not throw an exception…
URL: https://github.com/apache/lucene-solr/pull/1146
 
 
   when Analyzer returns no term.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term

2020-01-06 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008876#comment-17008876
 ] 

Bruno Roustant commented on SOLR-6613:
--

PR added

> TextField.analyzeMultiTerm should not throw exception when analyzer returns 
> no term
> ---
>
> Key: SOLR-6613
> URL: https://issues.apache.org/jira/browse/SOLR-6613
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.3.1, 4.10.2, 6.0
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Attachments: TestTextField.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In TextField.analyzeMultiTerm()
> at line
> try {
>   if (!source.incrementToken())
> throw new SolrException();
> The method should not throw an exception if there is no token because having 
> no token is legitimate because all tokens may be filtered out (e.g. with a 
> blocking Filter such as StopFilter).
> In this case it should simply return null (as it already returns null in some 
> cases, see first line of method). However, SolrQueryParserBase needs also to 
> be fixed to correctly handle null returned by TextField.analyzeMultiTerm().
> See attached TestTextField for the corresponding new test class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9112) SegmentingTokenizerBase splits terms that occupy 1024th positions in text

2020-01-06 Thread Markus Jelsma (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated LUCENE-9112:
--
Summary: SegmentingTokenizerBase splits terms that occupy 1024th positions 
in text  (was: OpenNLP tokenizer is fooled by text containing spurious 
punctuation)

> SegmentingTokenizerBase splits terms that occupy 1024th positions in text
> -
>
> Key: LUCENE-9112
> URL: https://issues.apache.org/jira/browse/LUCENE-9112
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: master (9.0)
>Reporter: Markus Jelsma
>Priority: Major
>  Labels: opennlp
> Fix For: master (9.0)
>
> Attachments: LUCENE-9112-unittest.patch, LUCENE-9112-unittest.patch, 
> en-sent.bin, en-token.bin
>
>
> The OpenNLP tokenizer show weird behaviour when text contains spurious 
> punctuation such as having triple dots trailing a sentence...
> # the first dot becomes part of the token, having 'sentence.' becomes the 
> token
> # much further down the text, a seemingly unrelated token is then suddenly 
> split up, in my example (see attached unit test) the name 'Baron' is  split 
> into 'Baro' and 'n', this is the real problem
> The problems never seem to occur when using small texts in unit tests but it 
> certainly does in real world examples. Depending on how many 'spurious' dots, 
> a completely different term can become split, or the same term in just a 
> different location.
> I am not too sure if this is actually a problem in the Lucene code, but it is 
> a problem and i have a Lucene unit test proving the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on issue #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-06 Thread GitBox
murblanc commented on issue #1055: SOLR-13932 Review directory locking and Blob 
interactions
URL: https://github.com/apache/lucene-solr/pull/1055#issuecomment-571155842
 
 
   @mbwaheed @yonik I have rebased the changes after Bilal approved the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk opened a new pull request #1147: SOLR-14163: SOLR_SSL_CLIENT_HOSTNAME_VERIFICATION needs to work with Jetty server/client SSL contexts

2020-01-06 Thread GitBox
risdenk opened a new pull request #1147: SOLR-14163: 
SOLR_SSL_CLIENT_HOSTNAME_VERIFICATION needs to work with Jetty server/client 
SSL contexts
URL: https://github.com/apache/lucene-solr/pull/1147
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14075) Investigate performance degradation of /export from Solr 6 to Solr 7

2020-01-06 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein reassigned SOLR-14075:
-

Assignee: Joel Bernstein

> Investigate performance degradation of /export from Solr 6 to Solr 7
> 
>
> Key: SOLR-14075
> URL: https://issues.apache.org/jira/browse/SOLR-14075
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
>
> There have been customer reports, not on the user or dev list, of performance 
> degradation of the /export handler from Solr 6 to Solr 7. Originally it was 
> thought that SOLR-13013 would resolve this issue but, this has turned out not 
> to be the case. This ticket will determine if there is a performance 
> degradation in /export between Solr 6 and 7 and pin-point the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13890:
---
Attachment: SOLR-13890.patch
toplevel-tpi-perf-comparison.png
Status: Open  (was: Open)

Given the recent performance results proving that the main differentiator is 
top-level vs per-segment, I took a stab at a "top-level" DVTQ TPI 
implementation.  It still needs some cleanup, and I could use some feedback on 
if/how we want to expose this to users: should Solr try to pick intelligently 
between the per-segment and top-level TPI implementations?  Should users be 
able to override this if desired?  (Right now I've added a switch over to using 
"top-level" at 500 terms, with a "subMethod" param to let users override this 
if desired.)

So there's some loose ends here, but the performance numbers for the new TPI 
implementation are promising.  Roughly equivalent to the postfilter 
implementation we've been going off of.
 !toplevel-tpi-perf-comparison.png! 

Thoughts?

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png, toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-06 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008973#comment-17008973
 ] 

Erick Erickson commented on SOLR-14130:
---

[~jbernste] I took a quick look at the patch and it looks great. There are two 
things I might suggest:

1> A short note on how to set up the collection you index to, mostly just the 
configset you should use (_default I assume, one shard no replicas?).

2> When I was playing around with this concept I found it useful to have the 
option of a "batch" parameter to allow me to group runs, in this case maybe 
default to the directory specified "/user/foo/logs". I can easily see wanting 
to restrict searches to "baseline", "change1" etc., not to mention deleting all 
the logs indexed for a particular batch when no longer relevant while keeping 
those that are. Or maybe just use the directory specified and encourage users 
to put different runs (or whatever) in different dirs.

There are a number of refinements that I was playing around with for 
experimenting, one in particular (that can wait for later) is being able to 
facet on the first recognizable line in the exceptions. "recognizable" might be 
a line that mentions "org.apache.solr" or "org.apache.lucene". Then when I 
facet on it and see 2,500 exceptions generated by the query parser, I can skip 
them easily... But that's for later and only really useful when there's a UI 
around it.

Now if we just had a UI around this for arbitrary searches ;)

> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman reassigned SOLR-11746:
-

Assignee: Houston Putman

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11746:
--
Attachment: SOLR-11746.patch

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008988#comment-17008988
 ] 

Houston Putman commented on SOLR-11746:
---

[~tflobbe] added a test for that.

All tests passed before I added the test, so I'm going to do a final precommit 
check, then commit to 8 and master.

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11746:
--
Affects Version/s: 7.0

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11746:
--
Attachment: SOLR-11746.patch

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009000#comment-17009000
 ] 

Houston Putman commented on SOLR-11746:
---

Updated the ref-guide to reduce confusion.

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson opened a new pull request #1148: LUCENE-9080: Upgrade ICU4j to 62.2 and regenerate

2020-01-06 Thread GitBox
ErickErickson opened a new pull request #1148: LUCENE-9080: Upgrade ICU4j to 
62.2 and regenerate
URL: https://github.com/apache/lucene-solr/pull/1148
 
 
   # Description
   See comments on the JIRA


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009011#comment-17009011
 ] 

ASF subversion and git services commented on SOLR-11746:


Commit f5ab3ca688b3127bece252ffd87cc8bfa9f285ff in lucene-solr's branch 
refs/heads/master from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f5ab3ca ]

SOLR-11746: Existence query support for numeric point fields


> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9080) Upgrade ICU4j to 62.2 and regenerate

2020-01-06 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009012#comment-17009012
 ] 

Erick Erickson commented on LUCENE-9080:


I think I have it working. Current state:
 * regenerate works
 * precommit passes
 * full test suite passes
 * I updated to ICU 64.2, see above.
 * I ran an absolutely minimal test of Solr, just "bin/solr start -e 
techproducts" and did a search.

The fact that some of the binary files got regenerated makes me a little 
nervous, but all tests pass and I can fire up Solr.

[~jpountz] : I'm flying a little blind here, WDYT about the changes to 
util/packed? I pulled out what I hope are the correct bits from the Packed 
directory, and changed the build file: see the changes in 
lucene/core/build.xml. I deleted gen_PackedThreeBlocks.py and gen_Direct.py as 
well. I'm trusting that the tests would barf if they were necessary. As a test 
I deleted everything in util/packed that had the "DO NOT EDIT" tag before I 
regenerated to see if anything broke. Since files like BulkOperation*.java 
regenerated I feel more confident.

The files I manually changed were:

  lucene/core/build.xml
   utils/packed/gen_Packed64SingleBlock.py
   gen_Direct.py (deleted)
   gen_PackedThreeBlocks.py (deleted)
   generateUTR30DataFiles.java
   ivy-versions.properties

The rest of the changes are a result of running the regenerate.

I'll push it to master in the next day or so absent objections.

 [~dawid.weiss] [~jpountz] [~rcmuir] [~mikemccand] [~uschindler] et.al. what do 
you think about merging this back to 8x? Nobody's apparently run regenerate in 
ages, and my motivation is to have a working baseline for the Gradle build 
which won't be 8x anyway. Maybe raise another Jira that points back here if for 
posterity in case someone else needs to do this?

Oh, and I'm finally trying to get all modern and use PRs, pardon me if I screw 
it up.

> Upgrade ICU4j to 62.2 and regenerate
> 
>
> Key: LUCENE-9080
> URL: https://issues.apache.org/jira/browse/LUCENE-9080
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: after_regen.patch, before_regen.patch, status.res
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The root cause is that RamUsageEstimator.NUM_BYTES_INT has been removed and 
> the python scripts still reference it in the generated scripts. That part's 
> easy to fix.
> Last time I looked, though, the regenerate produces some differences in the 
> generated files that should be looked at to insure they're benign.
> Not really sure whether this should be a Lucene or Solr JIRA. Putting it in 
> Lucene since one of the failed files is: 
> lucene/core/src/java/org/apache/lucene/util/packed/Packed8ThreeBlocks.java
> I do know that one of the Solr jflex-produced file has an unexplained 
> difference so it may bleed over.
> "ant regenerate" needs about 24G on my machine FWIW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009048#comment-17009048
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-11746:
--

Thanks for updating the docs. One nit:
{code}
-** `field:[* TO *]` matches all documents with the field
+** `field:*` or `field:[* TO *]` matches all documents where the field exists
 * Pure negative queries (all clauses prohibited) are allowed (only as a 
top-level clause)
 ** `-inStock:false` finds all field values where inStock is not false
 ** `-field:[* TO *]` finds all documents without a value for field
{code}
You updated the positive but not the negative, i.e. {{`-field:*` or `-field:[* 
TO *]`...}}

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009050#comment-17009050
 ] 

Houston Putman commented on SOLR-11746:
---

Good call, I'll add that in when I push to 8x, and commit the small change to 
master

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] andyvuong commented on issue #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada…

2020-01-06 Thread GitBox
andyvuong commented on issue #1131: SOLR-14134: Add lazy and time-based 
evictiction of shared core concurrency metada…
URL: https://github.com/apache/lucene-solr/pull/1131#issuecomment-571242869
 
 
   > And for regular eviction we do not need this tracking either, we can 
simply do that in SolrCore#close. The basic assumption is: if core container 
can hold so many SolrCore instances we can very easily hold this simple 
metadata. The problem of many cores is planned to be addressed by transient 
cores or zero replica design. With that I don't think this cache need to worry 
about its own size as long as it ties itself with SolrCore instances.
   
   @mbwaheed we can evict lazily on on SolrCore instance creation in addition 
to SolrCore#close. To clarify you're also saying we can scope this item smaller 
and stick with a simpler cache (simple map, no size/time based eviction as done 
here) and let that future item handle whatever is needed ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009068#comment-17009068
 ] 

ASF subversion and git services commented on SOLR-11746:


Commit 1f1b719478e298b5ada064197a7fa919b608d24c in lucene-solr's branch 
refs/heads/branch_8x from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1f1b719 ]

SOLR-11746: Existence query support for numeric point fields


> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009069#comment-17009069
 ] 

ASF subversion and git services commented on SOLR-11746:


Commit 9edb143efdc6616906972ae6c629860c91a5a2e7 in lucene-solr's branch 
refs/heads/master from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9edb143 ]

SOLR-11746: Adding docs for negative existence queries.


> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9116) Simplify postings API by removing long[] metadata

2020-01-06 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9116:


 Summary: Simplify postings API by removing long[] metadata
 Key: LUCENE-9116
 URL: https://issues.apache.org/jira/browse/LUCENE-9116
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand


The postings API allows to store metadata about a term either in a long[] or in 
a byte[]. This is unnecessary as all information could be encoded in the 
byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1149: LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`.

2020-01-06 Thread GitBox
jpountz opened a new pull request #1149: LUCENE-9116: Remove long[] from 
`PostingsWriterBase#encodeTerm`.
URL: https://github.com/apache/lucene-solr/pull/1149
 
 
   All the metadata can be directly encoded in the `DataOutput`.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman resolved SOLR-11746.
---
Fix Version/s: 8.5
   master (9.0)
   Resolution: Fixed

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11746:
--
Attachment: SOLR-11746.patch

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11746:
--
Issue Type: Bug  (was: Improvement)

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata

2020-01-06 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009073#comment-17009073
 ] 

Adrien Grand commented on LUCENE-9116:
--

I want to draw attention to the fact that the attached pull request removes the 
FST and FSTOrd postings formats, which were harder to migrate, and that it 
breaks compatibility for some postings formats, but not Lucene84 and Lucene50 
which we need to support.

> Simplify postings API by removing long[] metadata
> -
>
> Key: LUCENE-9116
> URL: https://issues.apache.org/jira/browse/LUCENE-9116
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The postings API allows to store metadata about a term either in a long[] or 
> in a byte[]. This is unnecessary as all information could be encoded in the 
> byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mbwaheed commented on issue #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada…

2020-01-06 Thread GitBox
mbwaheed commented on issue #1131: SOLR-14134: Add lazy and time-based 
evictiction of shared core concurrency metada…
URL: https://github.com/apache/lucene-solr/pull/1131#issuecomment-571256880
 
 
   > To clarify you're also saying we can scope this item smaller and stick 
with a simpler cache (simple map, no size/time based eviction as done here) and 
let that future item handle whatever is needed ?
   
   @andyvuong Yes. Many SolrCore instances is a bigger problem that needs to be 
handled either by transient cores or zero replica(w/ autoscaling). For this 
cache to grow and shrink with number of SolrCore instances is good enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14171) allTermsRequired does not work when using context filter query

2020-01-06 Thread Jonathan J Senchyna (Jira)
Jonathan J Senchyna created SOLR-14171:
--

 Summary: allTermsRequired does not work when using context filter 
query
 Key: SOLR-14171
 URL: https://issues.apache.org/jira/browse/SOLR-14171
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Suggester
Affects Versions: 8.4, 8.1.1
Reporter: Jonathan J Senchyna


When using the suggester context filtering query param 
{{suggest.contextFilterQuery}} introduced in SOLR-7888, the suggester 
configuration {{allTermsRequired}} is ignored and all terms become required.

In my test configuration, I am not specifying {{allTermsRequired}}, so it 
defaults to {{false}}.  If I send a request without {{cfq}} specified in my 
query params, I get back results for partial matches, as expected.  As soon as 
I specify a {{cfq}} in my requests, I only get back results where all terms 
match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009132#comment-17009132
 ] 

David Smiley commented on SOLR-11746:
-

I'm really glad to finally see this get in :) . Thanks everyone!

 

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009170#comment-17009170
 ] 

Jason Gerlowski commented on SOLR-13890:


Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the good performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png, toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009170#comment-17009170
 ] 

Jason Gerlowski edited comment on SOLR-13890 at 1/6/20 9:04 PM:


Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the improved performance I was after without 
introducing another postfilter.  Things will be even better with SOLR-14166, 
but that doesn't need to block this effort.

Pending more feedback I'll aim to merge this on Wednesday.


was (Author: gerlowskija):
Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the improved performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png, toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009170#comment-17009170
 ] 

Jason Gerlowski edited comment on SOLR-13890 at 1/6/20 9:04 PM:


Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the improved performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.


was (Author: gerlowskija):
Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the good performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png, toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13890:
---
Attachment: SOLR-13890.patch

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9080) Upgrade ICU4j to 62.2 and regenerate

2020-01-06 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-9080:
---
Status: Patch Available  (was: Open)

> Upgrade ICU4j to 62.2 and regenerate
> 
>
> Key: LUCENE-9080
> URL: https://issues.apache.org/jira/browse/LUCENE-9080
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: after_regen.patch, before_regen.patch, status.res
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The root cause is that RamUsageEstimator.NUM_BYTES_INT has been removed and 
> the python scripts still reference it in the generated scripts. That part's 
> easy to fix.
> Last time I looked, though, the regenerate produces some differences in the 
> generated files that should be looked at to insure they're benign.
> Not really sure whether this should be a Lucene or Solr JIRA. Putting it in 
> Lucene since one of the failed files is: 
> lucene/core/src/java/org/apache/lucene/util/packed/Packed8ThreeBlocks.java
> I do know that one of the Solr jflex-produced file has an unexplained 
> difference so it may bleed over.
> "ant regenerate" needs about 24G on my machine FWIW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009194#comment-17009194
 ] 

Mikhail Khludnev commented on SOLR-13890:
-

regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009194#comment-17009194
 ] 

Mikhail Khludnev edited comment on SOLR-13890 at 1/6/20 10:08 PM:
--

regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast.  


was (Author: mkhludnev):
regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009194#comment-17009194
 ] 

Mikhail Khludnev edited comment on SOLR-13890 at 1/6/20 10:17 PM:
--

regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast 
SOLR-6357.  


was (Author: mkhludnev):
regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast.  

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] yonik merged pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-06 Thread GitBox
yonik merged pull request #1055: SOLR-13932 Review directory locking and Blob 
interactions
URL: https://github.com/apache/lucene-solr/pull/1055
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13932) Review directory locking and Blob interactions

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009202#comment-17009202
 ] 

ASF subversion and git services commented on SOLR-13932:


Commit 7d728d9d3a552dda75272bf339f17cae9d6b3734 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from murblanc
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7d728d9 ]

SOLR-13932 Review directory locking and Blob interactions (#1055)

* Initial minor changes for SOLR-13932

* Use all files in index directory when doing resolution against Blob to switch 
local index to new dir in case of conflicts

* Do push from the index directory directly without first making a local copy 
of the index files

* misspelling

* update after comments from mbwaheed


> Review directory locking and Blob interactions
> --
>
> Key: SOLR-13932
> URL: https://issues.apache.org/jira/browse/SOLR-13932
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Ilan Ginzburg
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Review resolution of local index directory content vs Blob copy.
> There has been wrong understanding of following line acquiring a lock on 
> index directory.
>  {{solrCore.getDirectoryFactory().get(indexDirPath, 
> DirectoryFactory.DirContext.DEFAULT, 
> solrCore.getSolrConfig().indexConfig.lockType);}}
> From Yonik:
> _A couple things about Directory locking the locks were only ever to 
> prevent more than one IndexWriter from trying to modify the same index. The 
> IndexWriter grabs a write lock once when it is created and does not release 
> it until it is closed._ 
> _Directories are not locked on acquisition of the Directory from the 
> DirectoryFactory. See the IndexWriter constructor, where the lock is 
> explicitly grabbed._
> Review CorePushPull#pullUpdateFromBlob, ServerSideMetadata and other classes 
> as relevant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13932) Review directory locking and Blob interactions

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009201#comment-17009201
 ] 

ASF subversion and git services commented on SOLR-13932:


Commit 7d728d9d3a552dda75272bf339f17cae9d6b3734 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from murblanc
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7d728d9 ]

SOLR-13932 Review directory locking and Blob interactions (#1055)

* Initial minor changes for SOLR-13932

* Use all files in index directory when doing resolution against Blob to switch 
local index to new dir in case of conflicts

* Do push from the index directory directly without first making a local copy 
of the index files

* misspelling

* update after comments from mbwaheed


> Review directory locking and Blob interactions
> --
>
> Key: SOLR-13932
> URL: https://issues.apache.org/jira/browse/SOLR-13932
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Ilan Ginzburg
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Review resolution of local index directory content vs Blob copy.
> There has been wrong understanding of following line acquiring a lock on 
> index directory.
>  {{solrCore.getDirectoryFactory().get(indexDirPath, 
> DirectoryFactory.DirContext.DEFAULT, 
> solrCore.getSolrConfig().indexConfig.lockType);}}
> From Yonik:
> _A couple things about Directory locking the locks were only ever to 
> prevent more than one IndexWriter from trying to modify the same index. The 
> IndexWriter grabs a write lock once when it is created and does not release 
> it until it is closed._ 
> _Directories are not locked on acquisition of the Directory from the 
> DirectoryFactory. See the IndexWriter constructor, where the lock is 
> explicitly grabbed._
> Review CorePushPull#pullUpdateFromBlob, ServerSideMetadata and other classes 
> as relevant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7964) suggest.highlight=true does not work when using context filter query

2020-01-06 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009211#comment-17009211
 ] 

Lucene/Solr QA commented on SOLR-7964:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
4s{color} | {color:green} suggest in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 
57s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-7964 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906397/SOLR-7964.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 9edb143 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/647/testReport/ |
| modules | C: lucene/suggest solr/core U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/647/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> suggest.highlight=true does not work when using context filter query
> 
>
> Key: SOLR-7964
> URL: https://issues.apache.org/jira/browse/SOLR-7964
> Project: Solr
>  Issue Type: Improvement
>  Components: Suggester
>Affects Versions: 5.4
>Reporter: Arcadius Ahouansou
>Assignee: David Smiley
>Priority: Minor
>  Labels: suggester
> Attachments: SOLR-7964.patch, SOLR_7964.patch, SOLR_7964.patch
>
>
> When using the new suggester context filtering query param 
> {{suggest.contextFilterQuery}} introduced in SOLR-7888, the param 
> {{suggest.highlight=true}} has no effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] andyvuong commented on issue #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada…

2020-01-06 Thread GitBox
andyvuong commented on issue #1131: SOLR-14134: Add lazy and time-based 
evictiction of shared core concurrency metada…
URL: https://github.com/apache/lucene-solr/pull/1131#issuecomment-571356887
 
 
   cc @mbwaheed - I moved the eviction on creation into registerCore, a layer 
above the actual ZooKeeper registration it was previously at after checking 
that all of the new SolrCore instances get created and go through this code 
path. Also switch to simple cache until the future item. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on issue #1138: LUCENE-9077 Print repro line for failed tests

2020-01-06 Thread GitBox
madrob commented on issue #1138: LUCENE-9077 Print repro line for failed tests
URL: https://github.com/apache/lucene-solr/pull/1138#issuecomment-571362867
 
 
   I have a rudimentary in memory version now, can look at the disk spilling 
version later. Ended up fighting with Gradle more than I thought I would need 
to get even this far, although the disk spilling version should be 
straightforward from here. LMK what you think.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on issue #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada…

2020-01-06 Thread GitBox
ErickErickson commented on issue #1131: SOLR-14134: Add lazy and time-based 
evictiction of shared core concurrency metada…
URL: https://github.com/apache/lucene-solr/pull/1131#issuecomment-571368556
 
 
   I was just skimming to see if this is related to transient cores (it doesn't 
appear to be) and noticed that the tests set up a new cluster in each test. 
That's quite a bit of work, why not set up the cluster in BeforeClass and 
dispose of it in AfterClass? Since each test creates its own collection, there 
shouldn't be any confusion.
   
   AddReplicaTest shows one way to do this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] MarcusSorealheis commented on a change in pull request #1141: SOLR-14147 change the Security manager to default to true.

2020-01-06 Thread GitBox
MarcusSorealheis commented on a change in pull request #1141: SOLR-14147 change 
the Security manager to default to true.
URL: https://github.com/apache/lucene-solr/pull/1141#discussion_r363549534
 
 

 ##
 File path: solr/bin/solr
 ##
 @@ -2084,14 +2084,14 @@ else
   REMOTE_JMX_OPTS=()
 fi
 
-# Enable java security manager (limiting filesystem access and other things)
-if [ "$SOLR_SECURITY_MANAGER_ENABLED" == "true" ]; then
+# Disable java security manager (allowing filesystem access and other things)
+if [ "$SOLR_SECURITY_MANAGER_ENABLED" == "false" ]; then
 
 Review comment:
   should be resolved?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009440#comment-17009440
 ] 

ASF subversion and git services commented on LUCENE-9096:
-

Commit 6bb1f6cbbe8accefbfd30b8ee74924ad43ddc356 in lucene-solr's branch 
refs/heads/gradle-master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6bb1f6c ]

LUCENE-9096: CHANGES entry.


> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009444#comment-17009444
 ] 

ASF subversion and git services commented on SOLR-11746:


Commit 9edb143efdc6616906972ae6c629860c91a5a2e7 in lucene-solr's branch 
refs/heads/gradle-master from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9edb143 ]

SOLR-11746: Adding docs for negative existence queries.


> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9096) Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009439#comment-17009439
 ] 

ASF subversion and git services commented on LUCENE-9096:
-

Commit 2db4c909ca10c0d7edda0c94622fa1369833 in lucene-solr's branch 
refs/heads/gradle-master from kkewwei
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2db4c90 ]

LUCENE-9096:Simplify CompressingTermVectorsWriter#flushOffsets. (#1125)



> Implementation of CompressingTermVectorsWriter.flushOffsets can be simpler
> --
>
> Key: LUCENE-9096
> URL: https://issues.apache.org/jira/browse/LUCENE-9096
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: kkewwei
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CompressingTermVectorsWriter.flushOffsets,  we count 
> sumPos and sumOffsets by the way
> {code:java}
> for (int i = 0; i < fd.numTerms; ++i) { 
>   int previousPos = 0;
>   int previousOff = 0;
>   for (int j = 0; j < fd.freqs[i]; ++j) { 
> final int position = positionsBuf[fd.posStart + pos];
> final int startOffset = startOffsetsBuf[fd.offStart + pos];
> sumPos[fieldNumOff] += position - previousPos; 
> sumOffsets[fieldNumOff] += startOffset - previousOff; 
> previousPos = position;
> previousOff = startOffset;
> ++pos;
>   }
> }
> {code}
> we always use the position - previousPos,  it can be summarized like this: 
> {code:java}
> (position5-position4)+(position4-position3)+(position3-position2)+(position2-position1){code}
> If we should simplify it: position5-position1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13089) bin/solr's use of lsof has some issues

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009441#comment-17009441
 ] 

ASF subversion and git services commented on SOLR-13089:


Commit ac777a5352224b2c8f46836f0e078809308fc2d8 in lucene-solr's branch 
refs/heads/gradle-master from Martijn Koster
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ac777a5 ]

SOLR-13089: Fix lsof edge cases in the solr CLI script


> bin/solr's use of lsof has some issues
> --
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCLI
>Reporter: Martijn Koster
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 8.5
>
> Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr 
> port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at 
> [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your 
> effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
>  works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc  26580  mak3u  IPv4 2818104  0t0  TCP *:7788 (LISTEN)
>  fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
>  works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd2524 root3u  IPv4  18426  0t0  TCP *:22 (LISTEN)
> sshd2524 root4u  IPv6  18428  0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you 
> will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using 
> {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in 
> {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R  /opt/; gosu  
> solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u  
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 
> COMMAND PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 
> java  9   115u  IPv4 2813503  0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 
> -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 
> lsof: no pwd entry for UID 
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should 
> produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible 
> arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the 
> arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/pts/0
> 1 /bin/busybox/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code 
> would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v 
> $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_lisko

[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009442#comment-17009442
 ] 

ASF subversion and git services commented on LUCENE-8673:
-

Commit b6f31835ad18da0f7a22064481b0d0e167f9f30c in lucene-solr's branch 
refs/heads/gradle-master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b6f3183 ]

LUCENE-8673: Avoid OOMEs because of IOContext randomization.


> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009443#comment-17009443
 ] 

ASF subversion and git services commented on SOLR-11746:


Commit f5ab3ca688b3127bece252ffd87cc8bfa9f285ff in lucene-solr's branch 
refs/heads/gradle-master from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f5ab3ca ]

SOLR-11746: Existence query support for numeric point fields


> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >