[jira] [Commented] (LUCENE-9670) gradle precommit sometimes fails with "IOException: stream closed" from javadoc in nightly benchmarks

2021-01-18 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267066#comment-17267066
 ] 

Dawid Weiss commented on LUCENE-9670:
-

I would start by running gradlew in non-daemon mode, Mike. When you're using a 
daemon it connects to an existing process... who knows how this is handled when 
you're piping from Python. Add --no-daemon to all commands where you invoke 
gradlew. Maybe it'll help. 

> gradle precommit sometimes fails with "IOException: stream closed" from 
> javadoc in nightly benchmarks
> -
>
> Key: LUCENE-9670
> URL: https://issues.apache.org/jira/browse/LUCENE-9670
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I recently added tracking how long {{gradle precommit}} takes each night so 
> we can track slowdowns over time.
> But it sometimes fails with:
> {noformat}
> > Task :lucene:join:renderJavadoc FAILED
> Could not read standard output of command '/opt/jdk-15.0.1/bin/javadoc'.
> java.io.IOException: Stream Closed
>         at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>         at java.base/java.io.FileOutputStream.write(FileOutputStream.java:347)
>         at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>         at 
> java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
>         at 
> org.gradle.process.internal.streams.ExecOutputHandleRunner.forwardContent(ExecOutputHandleRunner.java:68)
>         at 
> org.gradle.process.internal.streams.ExecOutputHandleRunner.run(ExecOutputHandleRunner.java:53)
>         at 
> org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.run(CurrentBuildOperationPreservingRunnable.java:42)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>         at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>         at 
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
>         at java.base/java.lang.Thread.run(Thread.java:832) {noformat}
> I'm not sure why ... when I run {{./gradlew precommit}} interactively it 
> doesn't seem to do this.
> The nightly tool is quite simple – it just launches a sub-process using 
> {{os.system}}: (first to {{git clean}} then to run {{./gradlew precommit)}}: 
> https://github.com/mikemccand/luceneutil/blob/master/src/python/runNightlyGradleTestPrecommit.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


s1monw commented on a change in pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#discussion_r559385012



##
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##
@@ -104,6 +104,23 @@ public static DirectoryReader open(final IndexCommit 
commit) throws IOException
 return StandardDirectoryReader.open(commit.getDirectory(), commit);
   }
 
+  /**
+   * Expert: returns an IndexReader reading the index in the given {@link 
IndexCommit}. This method
+   * allows to open indices that were created wih a Lucene version older than 
N-1 provided that all
+   * all codecs for this index are available in the classpath and the segment 
file format used was
+   * created with Lucene 7 or older. Users of this API must be aware that 
Lucene doesn't guarantee

Review comment:
   this is due to the fact that the segments info format only supports 7.0 
and upwards





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9674) Faster advance on Vector Values

2021-01-18 Thread Anand Kotriwal (Jira)
Anand Kotriwal created LUCENE-9674:
--

 Summary: Faster advance on Vector Values
 Key: LUCENE-9674
 URL: https://issues.apache.org/jira/browse/LUCENE-9674
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: master (9.0)
 Environment:  
Reporter: Anand Kotriwal


The advance() function in the class Lucene90VectorReader does a linear search 
for the target document.
To make it faster we can do a  binary search over the "ordToDoc" array which 
will make the advance operation take logarithmic time to search.This will make 
retrieving vectors for a sparse set of documents efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Jim Ferenczi (Jira)
Jim Ferenczi created LUCENE-9675:


 Summary: Expose the compression mode of the binary doc values
 Key: LUCENE-9675
 URL: https://issues.apache.org/jira/browse/LUCENE-9675
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Jim Ferenczi


LUCENE-9378 introduced a way to configure the compression mode of the binary 
doc values.
This issue is a proposal to expose this information in the attributes of each 
binary field.
That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-9675:
-
Status: Open  (was: Open)

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-9675:
-
Attachment: LUCENE-9675.patch
Status: Open  (was: Open)

Here's a patch that adds the compression mode in the attributes of the 
FieldInfo.

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9675.patch
>
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267140#comment-17267140
 ] 

Ignacio Vera commented on LUCENE-9675:
--

+1

 

Reading through the patch, maybe the key should have extension {{.mode}} 
instead of {{.compression_mode}} to be consistent with the stored fields 
implementation?

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9675.patch
>
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-18 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-15052:
-

Assignee: Noble Paul

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-18 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-15052.
---
Fix Version/s: 8.8
   Resolution: Fixed

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.8
>
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] codaitya opened a new pull request #2214: LUCENE-9674:Faster advance on Vector Values

2021-01-18 Thread GitBox


codaitya opened a new pull request #2214:
URL: https://github.com/apache/lucene-solr/pull/2214


   Currently the advance() function in the class Lucene90VectorReader does a
   linear search for the target document.
   This will make retrieving vectors for a sparse set of documents efficient.
   
   
   
   
   # Description
   
   Currently the advance() function in the class Lucene90VectorReader does a
   linear search for the target document. This can be an expensive operation if 
we are searching for
   sparse documents having vector fields.
   
   # Solution
   
   Implement a binary search over the "ordToDoc" array which will make the 
advance operation
   take logarithmic time to search.
   
   # Tests
   
   Added testAdvance() in class TestVectorValues. It creates an index with gaps 
for vector fields and randomly calls advance.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI

2021-01-18 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267192#comment-17267192
 ] 

David Eric Pugh commented on SOLR-12613:


Definitly a fair point [~janhoy]!

> Rename "Cloud" tab as "Cluster" in Admin UI
> ---
>
> Key: SOLR-12613
> URL: https://issues.apache.org/jira/browse/SOLR-12613
> Project: Solr
>  Issue Type: Improvement
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
> Fix For: 8.1, master (9.0)
>
>
> Spinoff from SOLR-8207. When adding more cluster-wide functionality to the 
> Admin UI, it feels better to name the "Cloud" UI tab as "Cluster".
> In addition to renaming the "Cloud" tab, we should also change the URL part 
> from {{~cloud}} to {{~cluster}}, update reference guide page names, 
> screenshots and references etc.
> I propose this change is not introduced in 7.x due to the impact, so tagged 
> it as fix-version 8.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mhitza commented on pull request #1435: SOLR-14410: Switch from SysV init script to systemd service file

2021-01-18 Thread GitBox


mhitza commented on pull request #1435:
URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-762201796


   @epugh The main difference I see between this service file and the docker 
configuration is that the docker container starts the service in foreground 
mode. This is to be understood, as docker containers run single services. And 
also because running systemd within docker is pretty hairy and 
platform-specific (can be done only if the host system is another system that 
has systemd available, or at least cgroups that need to be mounted readonly 
mode within the container).
   
   When proposing this change we discussed on the mailing list, briefly, the 
option to start Solr in foreground mode (see [mailinglist 
thread](https://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3ccafszzzxs+zh1mrscsjftyxn0kod_+6fjobxd9zhxt66fhaz...@mail.gmail.com%3e))
 
   
   Regarding testing, the only option I can think of is via configuration 
management (e.g. ansible) targeting a VM (because docker is not as 
straightforward, as mentioned before, but doable).
   
   @janhoy are you referring to the service file within this PR, or something 
else. As far as I know, all the common distros are running systemd, unless 
those peers are on old distros that are no longer maintained. And on systemd 
systems, there shouldn't be any extra package required to run this (except for 
the obvious JRE requirement for Solr itself)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on pull request #1435: SOLR-14410: Switch from SysV init script to systemd service file

2021-01-18 Thread GitBox


epugh commented on pull request #1435:
URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-762208470


   I'd be open to testing this out.   Also, is this more of a 9.0 thing versus 
a branch 8?   Seems like changing how you install Solr is a pretty big deal.   
I've done the upgrade a few times using the old scripts, but this seems like a 
breaking change since you would lose the old way right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh opened a new pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh opened a new pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215


   
   
   
   # Description
   
   This PR supersecedes the work done in #2016, as it doesn't drag all the 
commits made to master.   I followed the steps recommend by Joel Bernstein in 
another PR to clean up the commit history in creating this PR.
   
   To improve our security posture, this moves the ScriptingUpdateProcessor to 
a new contrib module that isn't installed in Solr by default.   This is also a 
chance to clean up the name of the processor from the old slightly awkward name 
"StatelessScriptingUpdateProcessor" to a simpler name.
   
   # Solution
   
   * Created a new `/contrib/scripting` module, and move the code and tests 
related to scripting under it.   
   * Updated all the references to `StatelessScriptingUpdateProcessor` to 
`ScriptingUpdateProcessor` in code and ref guide.
   
   
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ X] I have developed this patch against the `master` branch.
   - [ X] I have run `./gradlew check`.
   - [ X] I have added tests for my changes.
   - [ X] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on pull request #2016: SOLR-14067 v2 Move Stateless Scripting Update Process to /contrib/scripting module

2021-01-18 Thread GitBox


epugh commented on pull request #2016:
URL: https://github.com/apache/lucene-solr/pull/2016#issuecomment-762219832


   I followed the steps that Joel recommended in another thread, and created a 
new clean branch, #2215.   I will close this one in favour of that PR which is 
much more legible.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh closed pull request #2016: SOLR-14067 v2 Move Stateless Scripting Update Process to /contrib/scripting module

2021-01-18 Thread GitBox


epugh closed pull request #2016:
URL: https://github.com/apache/lucene-solr/pull/2016


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2021-01-18 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267235#comment-17267235
 ] 

David Eric Pugh commented on SOLR-13756:


I am going to close this, based on one more search around the source.Please 
comment/reopen if I'm wrong on this.

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2021-01-18 Thread David Eric Pugh (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Eric Pugh resolved SOLR-13756.

Resolution: Not A Problem

I believe this is "Not a Problem" since the underlying issue has been taken 
care of.

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2021-01-18 Thread David Eric Pugh (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Eric Pugh closed SOLR-13756.
--

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2021-01-18 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267239#comment-17267239
 ] 

David Eric Pugh commented on SOLR-13105:


I tried your commands on another branch of mine that was in a similar 
situation, and it worked great.

> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15085) EmbeddedSolrServer calls shutdown on a provided CoreContainer

2021-01-18 Thread Tim Owen (Jira)
Tim Owen created SOLR-15085:
---

 Summary: EmbeddedSolrServer calls shutdown on a provided 
CoreContainer
 Key: SOLR-15085
 URL: https://issues.apache.org/jira/browse/SOLR-15085
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Server, SolrJ
Affects Versions: master (9.0)
Reporter: Tim Owen


There are essentially 2 ways to create an EmbeddedSolrServer object, one by 
passing in a CoreContainer object, and the other way creates one internally 
on-the-fly. The current behaviour of the close method calls shutdown on the 
CoreContainer, regardless of where it came from.

I believe this is not good behaviour for a class that doesn't control the 
lifecycle of the passed-in CoreContainer. In fact, there are 4 cases among the 
codebase where a subclass of EmbeddedSolrServer is created just to override 
this behaviour (with a comment saying it's unwanted).

In my use-case I create EmbeddedSolrServer instances for cores as and when I 
need to work with them, but the CoreContainer exists for the duration. I don't 
want the whole container shut down when I'm done with just one of its cores. 
You can workaround it by just not calling close on the EmbeddedSolrServer 
object, but that's risky especially if you use a try-with-resources as close is 
called automatically then.

Fix is to keep track of whether the CoreContainer was created internally or 
not, and only shut it down if internal. I will attach my patch PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] timatbw opened a new pull request #2216: SOLR-15085 Prevent EmbeddedSolrServer calling shutdown on a CoreConta…

2021-01-18 Thread GitBox


timatbw opened a new pull request #2216:
URL: https://github.com/apache/lucene-solr/pull/2216


   …iner that was passed to it
   
   
   
   
   # Description
   
   Prevent EmbeddedSolrServer calling shutdown on a CoreContainer that was 
passed to it.
   
   # Solution
   
   Now keeping track of whether the CoreContainer was provided or created 
internally and only calling shutdown for internally-created instances.
   
   # Tests
   
   Modified appropriate test to confirm behaviour, and removed overrides used 
in existing tests to workaround this issue.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15085) EmbeddedSolrServer calls shutdown on a provided CoreContainer

2021-01-18 Thread Tim Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-15085:

Labels: pull-request-available  (was: )

> EmbeddedSolrServer calls shutdown on a provided CoreContainer
> -
>
> Key: SOLR-15085
> URL: https://issues.apache.org/jira/browse/SOLR-15085
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server, SolrJ
>Affects Versions: master (9.0)
>Reporter: Tim Owen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are essentially 2 ways to create an EmbeddedSolrServer object, one by 
> passing in a CoreContainer object, and the other way creates one internally 
> on-the-fly. The current behaviour of the close method calls shutdown on the 
> CoreContainer, regardless of where it came from.
> I believe this is not good behaviour for a class that doesn't control the 
> lifecycle of the passed-in CoreContainer. In fact, there are 4 cases among 
> the codebase where a subclass of EmbeddedSolrServer is created just to 
> override this behaviour (with a comment saying it's unwanted).
> In my use-case I create EmbeddedSolrServer instances for cores as and when I 
> need to work with them, but the CoreContainer exists for the duration. I 
> don't want the whole container shut down when I'm done with just one of its 
> cores. You can workaround it by just not calling close on the 
> EmbeddedSolrServer object, but that's risky especially if you use a 
> try-with-resources as close is called automatically then.
> Fix is to keep track of whether the CoreContainer was created internally or 
> not, and only shut it down if internal. I will attach my patch PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mhitza commented on pull request #1435: SOLR-14410: Switch from SysV init script to systemd service file

2021-01-18 Thread GitBox


mhitza commented on pull request #1435:
URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-762235551


   The moment when I wrote this systemd service file I was running Solr 8, that 
is correct. I think it should work with Solr 9 as is.
   
   From my memory when working on updating the installer I think you should be 
able to run the updated installer and it shouldn't lose the old way. Because on 
systemd systems all SysV scripts are overtaken/"wrapped around" by systemd. So 
for example, even if the previous docs stated commands like `service solr 
start`, it should work with `systemctl start solr` as is.
   
   And you could try out a new installation of Solr using the installer `-s 
solr2` flag, for example. And then you could start the new service type with 
`systemctl start solr2` (of course you would need to stop the previous solr 
instance as they are running on the same port number).
   
   What I haven't tested is what happens when you run the installer without any 
flags on a system that has already Solr installed. As it should generate a 
solr.service file (thus having the same service name), but *i think* it would 
supplant the SysV init script. As in, if a systemd and SysV service exist with 
the same name the physical solr.service would precede the SysV one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-9675:
-
Attachment: LUCENE-9675.patch
Status: Open  (was: Open)

Thanks [~ivera], I attached a new patch that  uses the `.mode` suffix.

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9675.patch, LUCENE-9675.patch
>
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2021-01-18 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267294#comment-17267294
 ] 

David Smiley commented on SOLR-14067:
-

I like the rename (I suggested it after all) -- mostly glad to see the 
"Stateless" part gone.  Now that I look at it, I think "ScriptUpdateProcessor" 
is better than "ScriptingUpdateProcessor" because I think the noun form makes 
more sense than the verb.WDYT?

> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2021-01-18 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267295#comment-17267295
 ] 

David Smiley commented on SOLR-14067:
-

Ah; actually you chose "ScriptUpdateProcessor" after all, I see.  I thought 
otherwise because the CHANGES.txt in your latest PR is incorrect.  I'll review 
further there but wanted to discuss the name in the issue to ensure wide peer 
review.

> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on pull request #1435: SOLR-14410: Switch from SysV init script to systemd service file

2021-01-18 Thread GitBox


janhoy commented on pull request #1435:
URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-762275006


   > are you referring to the service file within this PR, or something else
   
   I think you got me wrong - I said that systemd (the new script) should work 
ootb but the old initd style may require an extra package in modern Unix 
systems to even work (service command).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


mikemccand commented on a change in pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#discussion_r559531239



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -3900,6 +3907,12 @@ public static Options parseOptions(String[] args) {
 }
 i++;
 opts.dirImpl = args[i];
+  } else if ("-min_version_created".equals(args[i])) {

Review comment:
   `-min_major_version_created`?

##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -462,6 +463,11 @@ public void setInfoStream(PrintStream out, boolean 
verbose) {
 this.verbose = verbose;
   }
 
+  /** Set the minimum index version created for the index to check */
+  public void setMinIndexVersionCreated(int minIndexVersionCreated) {

Review comment:
   Could we consistently rename to `setMinIndexMajorVersionCreated`, and 
`minIndexMajorVersionCreated`?  (I see e.g. in `SIS.readCommit` below that we 
include `major` in the name).

##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -3900,6 +3907,12 @@ public static Options parseOptions(String[] args) {
 }
 i++;
 opts.dirImpl = args[i];
+  } else if ("-min_version_created".equals(args[i])) {
+if (i == args.length - 1) {
+  throw new IllegalArgumentException("ERROR: missing value for 
-min_version_created");

Review comment:
   Hmm, we should also update the `Usage: ...` exception (around line 3928 
in this modified version) to document this new option?
   
   If a user tries to `CheckIndex` a too-old index without this option they'll 
see a `IndexFormatTooOldException` right?  Should we catch that and rethrow w/ 
better message suggesting to use this option?  Should we maybe by default just 
set this option (always allow `CheckIndex` on a too-old index as long as you 
have the old Codecs around...)?

##
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##
@@ -104,6 +104,23 @@ public static DirectoryReader open(final IndexCommit 
commit) throws IOException
 return StandardDirectoryReader.open(commit.getDirectory(), commit);
   }
 
+  /**
+   * Expert: returns an IndexReader reading the index in the given {@link 
IndexCommit}. This method

Review comment:
   s/`in`/`on`

##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -1009,6 +1009,14 @@ public IndexWriter(Directory d, IndexWriterConfig conf) 
throws IOException {
 changed();
 
   } else if (reader != null) {
+if (reader.segmentInfos.getIndexCreatedVersionMajor() < 
Version.MIN_SUPPORTED_MAJOR) {

Review comment:
   Hmm does `addIndexes` try to verify version of the incoming index is not 
too old?  We will keep doing that, right?  I.e. the only added best effort here 
is when directly opening an `IndexReader` you can (with this change) now ask 
that older versions be allowed.

##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java
##
@@ -499,7 +515,7 @@ private static Codec readCodec(DataInput input) throws 
IOException {
 throw new IllegalArgumentException(
 "Could not load codec '"
 + name
-+ "'.  Did you forget to add lucene-backward-codecs.jar?",
++ "'. Did you forget to add lucene-backward-codecs.jar?",

Review comment:
   Heh

##
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##
@@ -104,6 +104,23 @@ public static DirectoryReader open(final IndexCommit 
commit) throws IOException
 return StandardDirectoryReader.open(commit.getDirectory(), commit);
   }
 
+  /**
+   * Expert: returns an IndexReader reading the index in the given {@link 
IndexCommit}. This method
+   * allows to open indices that were created wih a Lucene version older than 
N-1 provided that all
+   * codecs for this index are available in the classpath and the segment file 
format used was
+   * created with Lucene 7 or older. Users of this API must be aware that 
Lucene doesn't guarantee
+   * semantic compatibility for indices created with versions older than N-1. 
All backwards
+   * compatibility aside of the file format is optional and applied on a best 
effort basis.

Review comment:
   s/`of`/`from`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


romseygeek commented on a change in pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#discussion_r559604398



##
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##
@@ -104,6 +104,23 @@ public static DirectoryReader open(final IndexCommit 
commit) throws IOException
 return StandardDirectoryReader.open(commit.getDirectory(), commit);
   }
 
+  /**
+   * Expert: returns an IndexReader reading the index in the given {@link 
IndexCommit}. This method
+   * allows to open indices that were created wih a Lucene version older than 
N-1 provided that all
+   * all codecs for this index are available in the classpath and the segment 
file format used was
+   * created with Lucene 7 or older. Users of this API must be aware that 
Lucene doesn't guarantee

Review comment:
   The javadoc should read 'Lucene 7 or newer' I think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2216: SOLR-15085 Prevent EmbeddedSolrServer calling shutdown on a CoreConta…

2021-01-18 Thread GitBox


madrob commented on a change in pull request #2216:
URL: https://github.com/apache/lucene-solr/pull/2216#discussion_r559611053



##
File path: 
solr/core/src/java/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java
##
@@ -71,6 +71,7 @@
   protected final String coreName;
   private final SolrRequestParsers _parser;
   private final RequestWriterSupplier supplier;
+  private boolean containerIsLocal = false;

Review comment:
   Can this be final?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


dsmiley commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559588597



##
File path: solr/contrib/scripting/README.md
##
@@ -0,0 +1,14 @@
+Welcome to Apache Solr Scripting!
+===
+
+# Introduction
+
+The Scripting contrib module pulls together various scripting related 
functions.  
+
+Today, the ScriptUpdateProcessorFactory allows Java scripting engines to be 
used during the Solr document update processing, allowing dramatic flexibility 
in expressing custom document processing before being indexed.  It also allows 
hooks to commit, delete, etc, but add is the most common usage.  It is 
implemented as an UpdateProcessor to be placed in an UpdateChain.

Review comment:
   Lets list a few popular options here -- I'm thinking JavaScript, Ruby, 
Python, Groovy

##
File path: 
solr/contrib/scripting/src/java/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.java
##
@@ -58,34 +60,34 @@
 
 /**
  * 
- * An update request processor factory that enables the use of update 
- * processors implemented as scripts which can be loaded by the 
- * {@link SolrResourceLoader} (usually via the conf dir for 
- * the SolrCore).
+ * An update request processor factory that enables the use of update
+ * processors implemented as scripts which can be loaded by the
+ * {@link SolrResourceLoader} (usually via the conf dir for
+ * the SolrCore).  Previously known as the StatelessScriptUpdateProcessor.

Review comment:
   ```suggestion
* processors implemented as scripts which can be loaded from the
* configSet.  Previously known as the StatelessScriptUpdateProcessor.
   ```

##
File path: solr/solr-ref-guide/src/scripting-update-processor.adoc
##
@@ -0,0 +1,295 @@
+= Scripting Update Processor

Review comment:
   ```suggestion
   = Script Update Processor
   ```
   
   And Can we rename this file to remove the "ing"?
   The PR shows this file as new; did you just write all this?

##
File path: solr/solr-ref-guide/src/scripting-update-processor.adoc
##
@@ -0,0 +1,295 @@
+= Scripting Update Processor
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+The 
{solr-javadocs}/contrib/scripting/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.html[ScriptUpdateProcessor]
 allows Java scripting engines to be used
+during Solr document update processing, allowing dramatic flexibility in
+expressing custom document processing logic before being indexed.  It has 
hooks to the
+commit, delete, rollback, etc indexing actions, however add is the most common 
usage.
+It is implemented as an UpdateProcessor to be placed in an UpdateChain.
+
+TIP: This used to be known as the _StatelessScriptingUpdateProcessor_ and was 
renamed to clarify the key aspect of this update processor is it enables 
scripting.
+
+The script can be written in any scripting language supported by your JVM (such
+as JavaScript), and executed dynamically so no pre-compilation is necessary.
+
+WARNING: Being able to run a script of your choice as part of the indexing 
pipeline is a really powerful tool, that I sometimes call the
+_Get out of jail free_ card because you can solve some problems this way that 
you can't in any other way.  However, you are introducing some
+potential security vulnerabilities.
+
+== Installing the ScriptingUpdateProcessor and Scripting Engines
+
+The scripting update processor lives in the contrib module 
`/contrib/scripting`, and you need to explicitly add it to your Solr setup.
+
+Java 11 and previous versions come with a JavaScript engine called Nashorn, 
but Java 12 will require you to add your own JavaScript engine.   Other 
supported scripting engines like
+JRuby, Jython, Groovy, all require you to add JAR files.
+
+
+You can either add the `dist/solr-scripting-*.jar` file into Solr’s resource 
loader in a core `lib/` directory, or via `` directives in 
`solrconfig.xml`:
+
+[source,xml]
+
+
+
+
+Likewise you will need to add some JAR files depending on which scripting 
engines you choose.
+
+
+== Configuration
+
+[source,xml]
+
+
+   
+ update-script.js
+   
+   
+   
+   
+ 
+

[GitHub] [lucene-solr] s1monw commented on a change in pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


s1monw commented on a change in pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#discussion_r559614818



##
File path: lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
##
@@ -104,6 +104,23 @@ public static DirectoryReader open(final IndexCommit 
commit) throws IOException
 return StandardDirectoryReader.open(commit.getDirectory(), commit);
   }
 
+  /**
+   * Expert: returns an IndexReader reading the index in the given {@link 
IndexCommit}. This method
+   * allows to open indices that were created wih a Lucene version older than 
N-1 provided that all
+   * all codecs for this index are available in the classpath and the segment 
file format used was
+   * created with Lucene 7 or older. Users of this API must be aware that 
Lucene doesn't guarantee

Review comment:
   👍 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


s1monw commented on a change in pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#discussion_r559618608



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -3900,6 +3907,12 @@ public static Options parseOptions(String[] args) {
 }
 i++;
 opts.dirImpl = args[i];
+  } else if ("-min_version_created".equals(args[i])) {
+if (i == args.length - 1) {
+  throw new IllegalArgumentException("ERROR: missing value for 
-min_version_created");

Review comment:
   I am all for old indices and remove this option. WDOT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


s1monw commented on a change in pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#discussion_r559620857



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -1009,6 +1009,14 @@ public IndexWriter(Directory d, IndexWriterConfig conf) 
throws IOException {
 changed();
 
   } else if (reader != null) {
+if (reader.segmentInfos.getIndexCreatedVersionMajor() < 
Version.MIN_SUPPORTED_MAJOR) {

Review comment:
   yes we verify that it's the same major as the index we are adding to.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


s1monw commented on pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#issuecomment-762301076


   @mikemccand pushed changes 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] timatbw commented on a change in pull request #2216: SOLR-15085 Prevent EmbeddedSolrServer calling shutdown on a CoreConta…

2021-01-18 Thread GitBox


timatbw commented on a change in pull request #2216:
URL: https://github.com/apache/lucene-solr/pull/2216#discussion_r559642959



##
File path: 
solr/core/src/java/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java
##
@@ -71,6 +71,7 @@
   protected final String coreName;
   private final SolrRequestParsers _parser;
   private final RequestWriterSupplier supplier;
+  private boolean containerIsLocal = false;

Review comment:
   I tried to do that, but it gets awkward because there's 5 constructors 
and one calls another which calls another. I'd have to refactor all of them to 
call a private constructor instead, to avoid changing the external constructor 
parameters. Do you think I should I do that?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559661956



##
File path: solr/solr-ref-guide/src/scripting-update-processor.adoc
##
@@ -0,0 +1,295 @@
+= Scripting Update Processor

Review comment:
   Will update the name and the file.   I added this file to the Ref Guide, 
however much (most?) of the content was sourced from the old Solr cwiki.   Part 
of my goal in this work is to raise the profile of this powerful feature, so I 
wanted the great content to be visible.  I did manually test all of this stuff 
(jython, groovy etc) when I first started working on it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul merged pull request #2177: SOLR-15052: Per-replica states for reducing overseer bottlenecks (trunk)

2021-01-18 Thread GitBox


noblepaul merged pull request #2177:
URL: https://github.com/apache/lucene-solr/pull/2177


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267383#comment-17267383
 ] 

ASF subversion and git services commented on SOLR-15052:


Commit 8505d4d416fdf707bab55bc4da9a71ddb3374274 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8505d4d ]

SOLR-15052: Per-replica states for reducing overseer bottlenecks (trunk) (#2177)



> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.8
>
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9673) The level of IntBlockPool slice is always 1

2021-01-18 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267389#comment-17267389
 ] 

Michael McCandless commented on LUCENE-9673:


Whoa, this is horrible and probably ancient bug!

The slices are supposed to increase in size as ints are appended to the logical 
single (chunked) stream, to make the overhead lower the larger the number of 
ints stored.

Does {{ByteBlockPool}} have the same issue?

> The level of IntBlockPool slice is always 1 
> 
>
> Key: LUCENE-9673
> URL: https://issues.apache.org/jira/browse/LUCENE-9673
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: mashudong
>Priority: Minor
>
> First slice is allocated by IntBlockPoo.newSlice(), and its level is 1,
>  
> {code:java}
> private int newSlice(final int size) {
>  if (intUpto > INT_BLOCK_SIZE-size) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
>  
>  final int upto = intUpto;
>  intUpto += size;
>  buffer[intUpto-1] = 1;
>  return upto;
> }{code}
>  
>  
> If one slice is not enough, IntBlockPoo.allocSlice() is called to allocate 
> more slices,
> as the following code shows, level is 1, newLevel is NEXT_LEVEL_ARRAY[0] 
> which is also 1.
>  
> The result is the level of IntBlockPool slice is always 1, the first slice is 
>  2 bytes long, and all subsequent slices are 4 bytes long.
>  
> {code:java}
> private static final int[] NEXT_LEVEL_ARRAY = {1, 2, 3, 4, 5, 6, 7, 8, 9, 9};
> private int allocSlice(final int[] slice, final int sliceOffset) {
>  final int level = slice[sliceOffset];
>  final int newLevel = NEXT_LEVEL_ARRAY[level - 1];
>  final int newSize = LEVEL_SIZE_ARRAY[newLevel];
>  // Maybe allocate another block
>  if (intUpto > INT_BLOCK_SIZE - newSize) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
> final int newUpto = intUpto;
>  final int offset = newUpto + intOffset;
>  intUpto += newSize;
>  // Write forwarding address at end of last slice:
>  slice[sliceOffset] = offset;
> // Write new level:
>  buffer[intUpto - 1] = newLevel;
> return newUpto;
>  } 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559672654



##
File path: solr/contrib/scripting/README.md
##
@@ -0,0 +1,14 @@
+Welcome to Apache Solr Scripting!
+===
+
+# Introduction
+
+The Scripting contrib module pulls together various scripting related 
functions.  
+
+Today, the ScriptUpdateProcessorFactory allows Java scripting engines to be 
used during the Solr document update processing, allowing dramatic flexibility 
in expressing custom document processing before being indexed.  It also allows 
hooks to commit, delete, etc, but add is the most common usage.  It is 
implemented as an UpdateProcessor to be placed in an UpdateChain.

Review comment:
   Thanks, rewroked this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9670) gradle precommit sometimes fails with "IOException: stream closed" from javadoc in nightly benchmarks

2021-01-18 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267390#comment-17267390
 ] 

Michael McCandless commented on LUCENE-9670:


Thanks [~dweiss]; I'll try that.

> gradle precommit sometimes fails with "IOException: stream closed" from 
> javadoc in nightly benchmarks
> -
>
> Key: LUCENE-9670
> URL: https://issues.apache.org/jira/browse/LUCENE-9670
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I recently added tracking how long {{gradle precommit}} takes each night so 
> we can track slowdowns over time.
> But it sometimes fails with:
> {noformat}
> > Task :lucene:join:renderJavadoc FAILED
> Could not read standard output of command '/opt/jdk-15.0.1/bin/javadoc'.
> java.io.IOException: Stream Closed
>         at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>         at java.base/java.io.FileOutputStream.write(FileOutputStream.java:347)
>         at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>         at 
> java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
>         at 
> org.gradle.process.internal.streams.ExecOutputHandleRunner.forwardContent(ExecOutputHandleRunner.java:68)
>         at 
> org.gradle.process.internal.streams.ExecOutputHandleRunner.run(ExecOutputHandleRunner.java:53)
>         at 
> org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.run(CurrentBuildOperationPreservingRunnable.java:42)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>         at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>         at 
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
>         at java.base/java.lang.Thread.run(Thread.java:832) {noformat}
> I'm not sure why ... when I run {{./gradlew precommit}} interactively it 
> doesn't seem to do this.
> The nightly tool is quite simple – it just launches a sub-process using 
> {{os.system}}: (first to {{git clean}} then to run {{./gradlew precommit)}}: 
> https://github.com/mikemccand/luceneutil/blob/master/src/python/runNightlyGradleTestPrecommit.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559674364



##
File path: solr/CHANGES.txt
##
@@ -186,6 +186,9 @@ Other Changes
 
 * SOLR-14034: Remove deprecated min_rf references (Tim Dillon)
 
+* SOLR-14067: StatelessScriptUpdateProcessor moved to it's own 
/contrib/scripting/ package instead
+ of shipping as part of Solr due to security concerns.  Renamed to 
ScriptingUpdateProcessor. (Eric Pugh)

Review comment:
   I don't love that we have the `Factory` suffix, as that feels like an 
implementation detail of how update processors work.
   
   I wish we could refer to this as the `ScriptUpdateProcessor`, even though 
this specific class is buried in the file `ScriptUpdateProcessorFactory.java` 
as:
   
   ```
   private static class ScriptUpdateProcessor extends UpdateRequestProcessor
   ```
   
   Would it be worth pulling the inner class to it's own 
`ScriptUpdateProcessor.java` file?   
   
   Then we could just link to that file everywhere.  Thoughts





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559695973



##
File path: 
solr/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
##
@@ -674,12 +679,12 @@
  *** WARNING ***
  Before enabling remote streaming, you should make sure your
  system has authentication enabled.
-
-
+http://localhost:8983/solr/techproducts/update?commit=true&stream.contentType=text/csv&fieldnames=id,description&stream.body=1,foo&update.chain=script
   ```
   
   If this is too dangerous, I could revert this change and document the need 
to make the change in the directions.  It's just one more barrier to easily 
trying the feature with the tech products example.   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559696636



##
File path: solr/solr-ref-guide/src/scripting-update-processor.adoc
##
@@ -0,0 +1,295 @@
+= Scripting Update Processor

Review comment:
   Changes made!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


epugh commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559696904



##
File path: solr/solr-ref-guide/src/scripting-update-processor.adoc
##
@@ -0,0 +1,295 @@
+= Scripting Update Processor
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+The 
{solr-javadocs}/contrib/scripting/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.html[ScriptUpdateProcessor]
 allows Java scripting engines to be used
+during Solr document update processing, allowing dramatic flexibility in
+expressing custom document processing logic before being indexed.  It has 
hooks to the
+commit, delete, rollback, etc indexing actions, however add is the most common 
usage.
+It is implemented as an UpdateProcessor to be placed in an UpdateChain.
+
+TIP: This used to be known as the _StatelessScriptingUpdateProcessor_ and was 
renamed to clarify the key aspect of this update processor is it enables 
scripting.
+
+The script can be written in any scripting language supported by your JVM (such
+as JavaScript), and executed dynamically so no pre-compilation is necessary.
+
+WARNING: Being able to run a script of your choice as part of the indexing 
pipeline is a really powerful tool, that I sometimes call the
+_Get out of jail free_ card because you can solve some problems this way that 
you can't in any other way.  However, you are introducing some
+potential security vulnerabilities.
+
+== Installing the ScriptingUpdateProcessor and Scripting Engines
+
+The scripting update processor lives in the contrib module 
`/contrib/scripting`, and you need to explicitly add it to your Solr setup.
+
+Java 11 and previous versions come with a JavaScript engine called Nashorn, 
but Java 12 will require you to add your own JavaScript engine.   Other 
supported scripting engines like
+JRuby, Jython, Groovy, all require you to add JAR files.
+
+
+You can either add the `dist/solr-scripting-*.jar` file into Solr’s resource 
loader in a core `lib/` directory, or via `` directives in 
`solrconfig.xml`:
+
+[source,xml]
+
+
+
+
+Likewise you will need to add some JAR files depending on which scripting 
engines you choose.
+
+
+== Configuration
+
+[source,xml]
+
+
+   
+ update-script.js
+   
+   
+   
+   
+ 
+
+
+NOTE: The processor supports the defaults/appends/invariants concept for its 
config.
+However, it is also possible to skip this level and configure the parameters 
directly underneath the `` tag.
+
+Below follows a list of each configuration parameters and their meaning:
+
+`script`::
+The script file name. The script file must be placed in the `conf/ directory.
+There can be one or more "script" parameters specified; multiple scripts are 
executed in the order specified.
+
+`engine`::
+Optionally specifies the scripting engine to use. This is only needed if the 
extension
+of the script file is not a standard mapping to the scripting engine. For 
example, if your
+script file was coded in JavaScript but the file name was called 
`update-script.foo`,
+use "javascript" as the engine name.
+
+`params`::
+Optional parameters that are passed into the script execution context. This is
+specified as a named list (``) structure with nested typed parameters. If
+specified, the script context will get a "params" object, otherwise there will 
be no "params" object available.
+
+
+== Script execution context
+
+Every script has some variables provided to it.
+
+`logger`::
+Logger (org.slf4j.Logger) instance. This is useful for logging information 
from the script.
+
+`req`::
+{solr-javadocs}/core/org/apache/solr/response/SolrQueryResponse.html[SolrQueryRequest]
 instance.
+
+`rsp`::
+{solr-javadocs}/core/org/apache/solr/response/SolrQueryResponse.html[SolrQueryResponse]
 instance.
+
+`params`::
+The "params" object, if any specified, from the configuration.
+
+== Examples
+
+The `processAdd()` and the other script methods can return false to skip 
further
+processing of the document. All methods must be defined, though generally the
+`processAdd()` method is where the action is.
+
+Here's a URL that works with the techproducts example setup demonstrating 
specifying
+the "script" update chain: 
`ht

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


dsmiley commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559731569



##
File path: solr/CHANGES.txt
##
@@ -186,6 +186,9 @@ Other Changes
 
 * SOLR-14034: Remove deprecated min_rf references (Tim Dillon)
 
+* SOLR-14067: StatelessScriptUpdateProcessor moved to it's own 
/contrib/scripting/ package instead
+ of shipping as part of Solr due to security concerns.  Renamed to 
ScriptingUpdateProcessor. (Eric Pugh)

Review comment:
   I agree 100% on seeing \*Factory all over the config being poor for the 
reason you gave.  It could also be argued that even the "UpdateProcessor" part 
is quite redundant based on where we declare it.  Have you noticed changes in 
Lucene to how schema analysis components are resolved, affecting the Solr 
schema (master only)?  See 
solr/server/solr/configsets/_default/conf/managed-schema -- `` it's beautiful.No "FilterFactory" suffix.  Eventually I 
hope we can take the same approach throughout Solr.  Lucene uses an SPI 
approach which means a special file listing each implementation.  Something 
like that could be embraced.  No need to separate the factory from inner class 
over this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


dsmiley commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559733095



##
File path: 
solr/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
##
@@ -674,12 +679,12 @@
  *** WARNING ***
  Before enabling remote streaming, you should make sure your
  system has authentication enabled.
-
-
+

[jira] [Created] (LUCENE-9676) Hunspell: improve stemming of all-caps words

2021-01-18 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9676:


 Summary: Hunspell: improve stemming of all-caps words
 Key: LUCENE-9676
 URL: https://issues.apache.org/jira/browse/LUCENE-9676
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Peter Gromov


Currently words like "OPENOFFICE.ORG" result in no stems even if the dictionary 
contains "OpenOffice.org"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2215: SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor

2021-01-18 Thread GitBox


dsmiley commented on a change in pull request #2215:
URL: https://github.com/apache/lucene-solr/pull/2215#discussion_r559735328



##
File path: solr/solr-ref-guide/src/script-update-processor.adoc
##
@@ -35,19 +35,19 @@ potential security vulnerabilities.
 
 The scripting update processor lives in the contrib module 
`/contrib/scripting`, and you need to explicitly add it to your Solr setup.
 
-Java 11 and previous versions come with a JavaScript engine called Nashorn, 
but Java 12 will require you to add your own JavaScript engine.   Other 
supported scripting engines like
-JRuby, Jython, Groovy, all require you to add JAR files.
-
-
-You can either add the `dist/solr-scripting-*.jar` file into Solr’s resource 
loader in a core `lib/` directory, or via `` directives in 
`solrconfig.xml`:
+You can either add the `dist/solr-scripting-*.jar` file into Solr’s core 
`lib/` directory, or via `` directives in `solrconfig.xml`:

Review comment:
   It's more probable someone would use SOLR_HOME/lib than a lib directory 
on a core.  And FYI `` directives may be going away or be discouraged.  
Let's link to `<>` instead of 
enumerating how to do this, so we can just maintain this sort of info in one 
place in the ref guide.

##
File path: solr/solr-ref-guide/src/script-update-processor.adoc
##
@@ -267,8 +267,8 @@ def finish() {
 }
 
 
-=== Jython
-
+=== Python
+Python support is implemented via the https://www.jython.org/[Jython] project.
 Put the *standalone* `jython.jar` (the JAR that contains all the dependencies) 
into Solr's resource loader.

Review comment:
   Please remove "resource loader" from this page.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter opened a new pull request #2217: LUCENE-9676: Hunspell: improve stemming of all-caps words

2021-01-18 Thread GitBox


donnerpeter opened a new pull request #2217:
URL: https://github.com/apache/lucene-solr/pull/2217


   
   
   
   # Description
   
   Currently words like "OPENOFFICE.ORG" result in no stems even if the 
dictionary contains "OpenOffice.org"
   
   # Solution
   
   Repeat Hunspell's logic:
   * when encountering a mixed- or (inflectable) all-case dictionary entry, add 
its title-case analog as a hidden entry
   * use that hidden entry for stemming case variants for title- and uppercase 
words, but don't consider it a valid word itself
   * ...unless there's another explicit dictionary entry of that title case
   
   # Tests
   
   Adapted `allcaps` from Hunspell C++ repository, corrected existing 
`TestEscaped` to match Hunspell's behavior.
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9671) Hunspell: shorten Stemmer.applyAffix

2021-01-18 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9671:
-
Status: Patch Available  (was: Open)

> Hunspell: shorten Stemmer.applyAffix
> 
>
> Key: LUCENE-9671
> URL: https://issues.apache.org/jira/browse/LUCENE-9671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9676) Hunspell: improve stemming of all-caps words

2021-01-18 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9676:
-
Status: Patch Available  (was: Open)

> Hunspell: improve stemming of all-caps words
> 
>
> Key: LUCENE-9676
> URL: https://issues.apache.org/jira/browse/LUCENE-9676
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Peter Gromov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently words like "OPENOFFICE.ORG" result in no stems even if the 
> dictionary contains "OpenOffice.org"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-18 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267451#comment-17267451
 ] 

Ilan Ginzburg commented on SOLR-15052:
--

[~ichattopadhyaya], following up on [~mdrob]'s message above about performance 
testing, have you looked at handling of DOWNNODE messages? Under some 
conditions (many replicas on each node for each collection) I believe the per 
replica state can end up being slower than a single state.json update. Moreover 
the current implementation serializes all such updates (this can of course be 
improved later).
I believe that's the most unfavorable case for the per replica state strategy.

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.8
>
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559740603



##
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##
@@ -74,8 +74,8 @@
   ExclusiveSliceProperty(ClusterState clusterState, ZkNodeProps message) {
 this.clusterState = clusterState;
 String tmp = message.getStr(ZkStateReader.PROPERTY_PROP);
-if (StringUtils.startsWith(tmp, 
OverseerCollectionMessageHandler.COLL_PROP_PREFIX) == false) {
-  tmp = OverseerCollectionMessageHandler.COLL_PROP_PREFIX + tmp;
+if (StringUtils.startsWith(tmp, CollectionAdminParams.PROPERTY_PREFIX) == 
false) {

Review comment:
   Minor: use `!` rather than `== false` (I know it's old code but you 
touched it :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabelle Giguere updated SOLR-7913:
---
Attachment: SOLR-7913.patch

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fixTests.patch, SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559742269



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/DeleteCollectionCmd.java
##
@@ -92,6 +98,19 @@ public void call(ClusterState state, ZkNodeProps message, 
@SuppressWarnings({"ra
   collection = extCollection;
 }
 
+PlacementPlugin placementPlugin = 
ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance();

Review comment:
   Didn't dig into the details here, but when we delete a collection, we 
should just check if there's another collection that defines `withCollection` 
on it and refuse the delete based on that, no?
   Just starting to look at the PR so maybe there's a reason for doing it this 
way (in which case maybe adding a comment?)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267462#comment-17267462
 ] 

Isabelle Giguere commented on SOLR-7913:


New patch on tag release/lucene-solr/8.5.0
I had forgotten to attach it when upgrading, last time.

IMPORTANT : There was a bug in previous patches.  Changes in SearchHandler and 
ShardRequest in the previous patches resulted in including the shard request 
URL in the "stream.body" passed to the MLT request.  That's why test results in 
CloudMLTQParserTest were different, when comparing the test with the id request 
(testMLTQParser) and the test with stream.body (testMLTQParserStreamBody).

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fixTests.patch, SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267463#comment-17267463
 ] 

David Smiley commented on LUCENE-9675:
--

I noticed the removal of {{meta.writeByte((byte) 0);}} (or 1) but doesn't this 
introduce a backwards-compatibility issue?

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9675.patch, LUCENE-9675.patch
>
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559749942



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/DeleteReplicaCmd.java
##
@@ -147,14 +170,27 @@ void deleteReplicaBasedOnCount(ClusterState clusterState,
   }
 }
 
+if (placementPlugin != null) {

Review comment:
   By adding `if (placementPlugin != null)` logic in the `*Cmd` classes, we 
are breaking the encapsulation that placement logic is handled by 
`Assign.AssignStrategy`.
   The only reason `*Cmd` code currently (before this PR) is even aware of the 
notion of `PlacementPlugin` is because `PlacementPluginAssignStrategy` 
configuration (rather than `LegacyAssignStrategy`) is dependent on a plugin 
being defined...
   I suggest to move all `*Cmd` vetting logic to `Assign.AssignStrategy`, so 
that `*Cmd` only need to pass the instance of `PlacementPlugin` to that code 
(and later when we finally decide how clusters are configured this will go 
away, and we keep `*Cmd` clean of any specific assign strategy behavior).
   
   In `Assign.AssignStrategy` the default vetting logic will be "accept all" 
and in `PlacementPluginAssignStrategy` we can implement the checks we like.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559750891



##
File path: solr/core/src/java/org/apache/solr/cloud/overseer/ReplicaMutator.java
##
@@ -115,8 +116,8 @@ public ZkWriteCommand addReplicaProperty(ClusterState 
clusterState, ZkNodeProps
 String sliceName = message.getStr(ZkStateReader.SHARD_ID_PROP);
 String replicaName = message.getStr(ZkStateReader.REPLICA_PROP);
 String property = 
message.getStr(ZkStateReader.PROPERTY_PROP).toLowerCase(Locale.ROOT);
-if (StringUtils.startsWith(property, 
OverseerCollectionMessageHandler.COLL_PROP_PREFIX) == false) {
-  property = OverseerCollectionMessageHandler.COLL_PROP_PREFIX + property;
+if (StringUtils.startsWith(property, 
CollectionAdminParams.PROPERTY_PREFIX) == false) {

Review comment:
   `== false` -> `!`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559751155



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/DeleteReplicasRequest.java
##
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement;
+
+import org.apache.solr.cluster.Replica;
+
+import java.util.Set;
+
+/**
+ *

Review comment:
   Javadoc needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559751454



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/DeleteShardsRequest.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement;
+
+import java.util.Set;
+
+/**
+ *

Review comment:
   Javadoc needed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559753902



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/DeleteShardsRequest.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement;
+
+import java.util.Set;
+
+/**
+ *

Review comment:
   Also, this interface is implemented but the implementation is never used.
   Unless we implement a use for it in this PR, I suggest we leave it out until 
we actually need it. I assume we don't need it for `withCollection` because the 
secondary collection has to be single shard so that shard will not be deleted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabelle Giguere updated SOLR-7913:
---
Attachment: SOLR-7913_fix-unit-test-setup.patch

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559755408



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/DeleteReplicaCmd.java
##
@@ -147,14 +170,27 @@ void deleteReplicaBasedOnCount(ClusterState clusterState,
   }
 }
 
+if (placementPlugin != null) {

Review comment:
   In `Assign.AssignStrategy` if we want to be exhaustive, we should for 
example reject shard splits for secondary that are targets of `withCollection` 
(given we refuse such targets to have more than one shard).
   Not saying we should do it, but the vetting infra we put in place should 
allow logical extension to all these aspects (with minor impact on the 
commands).
   
   Also, pushing all the logic to `Assign.AssignStrategy` and minimizing 
changes to `*Cmd` limits the impact of a regression (moving to 
`LegacyAssignStrategy` should be a workaround for most problems).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267462#comment-17267462
 ] 

Isabelle Giguere edited comment on SOLR-7913 at 1/18/21, 7:10 PM:
--

New patch on tag release/lucene-solr/8.5.0
I had forgotten to attach it when upgrading, last time.

IMPORTANT : There was a bug in previous patches.  Changes in SearchHandler and 
ShardRequest in the previous patches resulted in including the shard request 
URL in the "stream.body" passed to the MLT request.  That's why test results in 
CloudMLTQParserTest were different, when comparing the test with the id request 
(testMLTQParser) and the test with stream.body (testMLTQParserStreamBody).

Apply SOLR-7913_fix-unit-test-setup.patch on top of SOLR-7913.patch


was (Author: igiguere):
New patch on tag release/lucene-solr/8.5.0
I had forgotten to attach it when upgrading, last time.

IMPORTANT : There was a bug in previous patches.  Changes in SearchHandler and 
ShardRequest in the previous patches resulted in including the shard request 
URL in the "stream.body" passed to the MLT request.  That's why test results in 
CloudMLTQParserTest were different, when comparing the test with the id request 
(testMLTQParser) and the test with stream.body (testMLTQParserStreamBody).

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #2212: LUCENE-9669: Add an expert API to allow opening indices created < N-1

2021-01-18 Thread GitBox


s1monw commented on pull request #2212:
URL: https://github.com/apache/lucene-solr/pull/2212#issuecomment-762429660


   I plan to merge this during the next 24 hours thanks for the reviews



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267473#comment-17267473
 ] 

Jim Ferenczi commented on LUCENE-9675:
--

[~dsmiley] no because we never released a version that write/read this byte. It 
was added in https://issues.apache.org/jira/browse/LUCENE-9378 to make the 
compression configurable in 8.8 so I am just changing how we record the 
information. That's a different story if we release 8.8 without this patch 
since in this case we'd need to care  about bwc.

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9675.patch, LUCENE-9675.patch
>
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559759523



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/ModificationRequestImpl.java
##
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.cluster.Replica;
+import org.apache.solr.cluster.Shard;
+import org.apache.solr.cluster.SolrCollection;
+import org.apache.solr.cluster.placement.DeleteReplicasRequest;
+import org.apache.solr.cluster.placement.DeleteShardsRequest;
+import org.apache.solr.common.cloud.DocCollection;
+import org.apache.solr.common.cloud.Slice;
+
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Helper class to create modification request instances.
+ */
+public class ModificationRequestImpl {
+
+  /**
+   * Create a delete replicas request.
+   * @param collection collection to delete replicas from
+   * @param replicas replicas to delete
+   */
+  public static DeleteReplicasRequest deleteReplicasRequest(SolrCollection 
collection, Set replicas) {
+return new DeleteReplicasRequest() {
+  @Override
+  public Set getReplicas() {
+return replicas;
+  }
+
+  @Override
+  public SolrCollection getCollection() {
+return collection;
+  }
+
+  @Override
+  public String toString() {
+return "DeleteReplicasRequest{collection=" + collection.getName() +
+",replicas=" + replicas;
+  }
+};
+  }
+
+  /**
+   * Create a delete replicas request using the internal Solr API.
+   * @param docCollection Solr collection
+   * @param shardName shard name
+   * @param replicaNames replica names (aka. core-node names)
+   * @return
+   */
+  public static DeleteReplicasRequest deleteReplicasRequest(DocCollection 
docCollection, String shardName, Set replicaNames) {
+SolrCollection solrCollection = 
SimpleClusterAbstractionsImpl.SolrCollectionImpl.fromDocCollection(docCollection);
+Shard shard = solrCollection.getShard(shardName);
+Slice slice = docCollection.getSlice(shardName);
+Set solrReplicas = new HashSet<>();
+replicaNames.forEach(name -> {
+  org.apache.solr.common.cloud.Replica replica = slice.getReplica(name);
+  Replica solrReplica = new 
SimpleClusterAbstractionsImpl.ReplicaImpl(replica.getName(), shard, replica);

Review comment:
   Why not just do `solrReplicas.add(shard.getReplica(name))` in the 
`forEach`?
   
   Or an easier to read IMO (personal preference but a lambda here is fine):
   `for (String name : replicaNames) { 
solrReplicas.add(shard.getReplica(name)); }`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267477#comment-17267477
 ] 

Isabelle Giguere commented on SOLR-7913:


MLT Query Parser was originally implemented to allow field queries (i.e.: 
myField:some text)
https://issues.apache.org/jira/browse/SOLR-6248
Read specifically the discussion between Steve Molloy, Vitaliy Zhovtyuk and 
Anshum Gupta in the first few comments.
By the time the MLT QParser was committed to SVN trunk, the query format was 
changed: 
https://issues.apache.org/jira/browse/SOLR-6248?focusedCommentId=14189235&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189235
With this format, the input immediately following the closing curly brace is 
assumed to be a docId (whatever field is the unique id key in the schema)

In CloudMLTQParser, without any of the patches on this ticket, the first thing 
that happens is to look for document by id, and if that fails, throw an 
exception.

This whole "stream.body" discussion (or monologue) originally started because 
of a need to identify a document using a query on any field, not just an id.

Maybe it's time to move away from the idea of "stream.body", and re-implement 
support for any field query in MLT QParser.

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on pull request #2217: LUCENE-9676: Hunspell: improve stemming of all-caps words

2021-01-18 Thread GitBox


donnerpeter commented on pull request #2217:
URL: https://github.com/apache/lucene-solr/pull/2217#issuecomment-762432969


   It might be easier to review commits separately



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559761033



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/PlacementContext.java
##
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement;
+
+import org.apache.solr.cluster.Cluster;
+
+/**
+ *

Review comment:
   Javadoc

##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/PlacementContextImpl.java
##
@@ -0,0 +1,39 @@
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.client.solrj.cloud.SolrCloudManager;
+import org.apache.solr.cluster.Cluster;
+import org.apache.solr.cluster.placement.AttributeFetcher;
+import org.apache.solr.cluster.placement.PlacementContext;
+import org.apache.solr.cluster.placement.PlacementPlanFactory;
+
+import java.io.IOException;
+
+/**
+ *
+ */
+public class PlacementContextImpl implements PlacementContext {

Review comment:
   Maybe have "Simple" somewhere in the name of this class given it's 
instantiating `SimpleClusterAbstractionsImpl`?

##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/ModificationRequestImpl.java
##
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.cluster.Replica;
+import org.apache.solr.cluster.Shard;
+import org.apache.solr.cluster.SolrCollection;
+import org.apache.solr.cluster.placement.DeleteReplicasRequest;
+import org.apache.solr.cluster.placement.DeleteShardsRequest;
+import org.apache.solr.common.cloud.DocCollection;
+import org.apache.solr.common.cloud.Slice;
+
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Helper class to create modification request instances.
+ */
+public class ModificationRequestImpl {
+
+  /**
+   * Create a delete replicas request.
+   * @param collection collection to delete replicas from
+   * @param replicas replicas to delete
+   */
+  public static DeleteReplicasRequest deleteReplicasRequest(SolrCollection 
collection, Set replicas) {
+return new DeleteReplicasRequest() {
+  @Override
+  public Set getReplicas() {
+return replicas;
+  }
+
+  @Override
+  public SolrCollection getCollection() {
+return collection;
+  }
+
+  @Override
+  public String toString() {
+return "DeleteReplicasRequest{collection=" + collection.getName() +
+",replicas=" + replicas;
+  }
+};
+  }
+
+  /**
+   * Create a delete replicas request using the internal Solr API.
+   * @param docCollection Solr collection
+   * @param shardName shard name
+   * @param replicaNames replica names (aka. core-node names)
+   * @return
+   */
+  public static DeleteReplicasRequest deleteReplicasRequest(DocCollection 
docCollection, String shardName, Set replicaNames) {
+SolrCollection solrCollection = 
SimpleClusterAbstractionsImpl.SolrCollectionImpl.fromDocCollection(docCollection);
+Shard shard = solrCollection.getShard(shardName);
+Slice slice = docCollection.getSlice(shardName);
+Set solrReplicas = new HashSet<>();
+replicaNames.forEach(name -> {
+  org.apache.solr.common.cloud.Replica replica = slice.getReplica(name);
+  Replica solrReplica = new 
SimpleClusterAbstractionsImpl.ReplicaImpl(replica.getName(), shard, rep

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559762633



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/SimpleClusterAbstractionsImpl.java
##
@@ -324,7 +324,7 @@ public int hashCode() {
   return new Pair<>(replicas, leader);
 }
 
-private ReplicaImpl(String replicaName, Shard shard, 
org.apache.solr.common.cloud.Replica sliceReplica) {
+ReplicaImpl(String replicaName, Shard shard, 
org.apache.solr.common.cloud.Replica sliceReplica) {

Review comment:
   Why are these no longer `private`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559763883



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementConfig.java
##
@@ -43,14 +46,30 @@
   @JsonProperty
   public long prioritizedFreeDiskGB;
 
+  /**
+   * This property defines an additional constraint that primary collections 
(keys) should be
+   * located on the same nodes as the secondary collections (values). The 
plugin will assume
+   * that the secondary collection replicas are already in place and ignore 
candidate nodes where
+   * they are not already present.
+   */
+  @JsonProperty
+  public Map withCollections;
+
   // no-arg public constructor required for deserialization
   public AffinityPlacementConfig() {
 minimalFreeDiskGB = 20L;

Review comment:
   I prefer the no arg constructor here to call the appropriate 
constructor. That way logic is not replicated (might not be applicable here 
unless somebody replaces `withCollections = Map.of();` with `withCollections = 
null;` in a future commit), it looks cleaner and by tracing calls to the most 
complete constructor all callers are inventoried...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabelle Giguere updated SOLR-7913:
---
Attachment: SOLR-7913_negative-tests.patch

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_negative-tests.patch, 
> SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559766377



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -171,14 +174,17 @@ public AffinityPlacementConfig getConfig() {
 
 private final long prioritizedFreeDiskGB;
 
+private final Map withCollections;

Review comment:
   Q: a given collection can only be `withCollection` for a single 
secondary collection? Doesn't seem necessary...
   
   Suggestion: maintain the inverse mapping as well (a multimap, but possibly 
this one should be a multimap as well) to save looping through map keys 
checking values...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559766377



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -171,14 +174,17 @@ public AffinityPlacementConfig getConfig() {
 
 private final long prioritizedFreeDiskGB;
 
+private final Map withCollections;

Review comment:
   Q: a given collection can only be `withCollection` for a single 
secondary collection? Doesn't seem to be a necessary limitation...
   
   Suggestion: maintain the inverse mapping as well (a multimap, but possibly 
this one should be a multimap as well) to save looping through map keys 
checking values...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267477#comment-17267477
 ] 

Isabelle Giguere edited comment on SOLR-7913 at 1/18/21, 7:42 PM:
--

MLT Query Parser was originally implemented to allow field queries (i.e.: 
myField:some text)
https://issues.apache.org/jira/browse/SOLR-6248
Read specifically the discussion between Steve Molloy, Vitaliy Zhovtyuk and 
Anshum Gupta in the first few comments.
By the time the MLT QParser was committed to SVN trunk, the query format was 
changed: 
https://issues.apache.org/jira/browse/SOLR-6248?focusedCommentId=14189235&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189235
With this format, the input immediately following the closing curly brace is 
assumed to be a docId (whatever field is the unique id key in the schema)

As of now, without any of the patches on this ticket, the first thing that 
happens is to look for document by id, and if that fails, throw an exception.

This whole "stream.body" discussion (or monologue) originally started because 
of a need to identify a document using a query on any field, not just an id.

Maybe it's time to move away from the idea of "stream.body", and re-implement 
support for any field query in MLT QParser.

If the extra tests added in SOLR-7913_negative-tests.patch could produce 
results instead of an exception, I don't think anyone would need to use 
stream.body with an MLT QParser query.


was (Author: igiguere):
MLT Query Parser was originally implemented to allow field queries (i.e.: 
myField:some text)
https://issues.apache.org/jira/browse/SOLR-6248
Read specifically the discussion between Steve Molloy, Vitaliy Zhovtyuk and 
Anshum Gupta in the first few comments.
By the time the MLT QParser was committed to SVN trunk, the query format was 
changed: 
https://issues.apache.org/jira/browse/SOLR-6248?focusedCommentId=14189235&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189235
With this format, the input immediately following the closing curly brace is 
assumed to be a docId (whatever field is the unique id key in the schema)

In CloudMLTQParser, without any of the patches on this ticket, the first thing 
that happens is to look for document by id, and if that fails, throw an 
exception.

This whole "stream.body" discussion (or monologue) originally started because 
of a need to identify a document using a query on any field, not just an id.

Maybe it's time to move away from the idea of "stream.body", and re-implement 
support for any field query in MLT QParser.

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_negative-tests.patch, 
> SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559767267



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +247,87 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override
+public void verifyAllowedModification(ModificationRequest 
modificationRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {
+  if (modificationRequest instanceof DeleteShardsRequest) {
+throw new UnsupportedOperationException("not implemented yet");
+  } else if (!(modificationRequest instanceof DeleteReplicasRequest)) {
+throw new UnsupportedOperationException("unsupported request type " + 
modificationRequest.getClass().getName());
+  }
+  DeleteReplicasRequest request = (DeleteReplicasRequest) 
modificationRequest;
+  SolrCollection secondaryCollection = request.getCollection();
+  if (!withCollections.values().contains(secondaryCollection.getName())) {
+return;
+  }
+  Map> secondaryNodeShardReplicas = new 
HashMap<>();
+  secondaryCollection.shards().forEach(shard ->
+  shard.replicas().forEach(replica -> {
+secondaryNodeShardReplicas.computeIfAbsent(replica.getNode(), n -> 
new HashMap<>())
+.computeIfAbsent(replica.getShard().getShardName(), s -> new 
AtomicInteger())
+.incrementAndGet();
+  }));
+
+  // find the colocated-with collections
+  Cluster cluster = placementContext.getCluster();
+  Set colocatedCollections = new HashSet<>();
+  AtomicReference exc = new AtomicReference<>();

Review comment:
   This variable and how it's handled in the lambda below (and after the 
lambda) it too complex. If the `forEach` is replaced by a loop (a foreach 
loop...) the code is a lot simpler (and I believe shorter, although I didn't 
try to write it).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267477#comment-17267477
 ] 

Isabelle Giguere edited comment on SOLR-7913 at 1/18/21, 7:43 PM:
--

MLT Query Parser was originally implemented to allow field queries (i.e.: 
myField:some text)
https://issues.apache.org/jira/browse/SOLR-6248
Read specifically the discussion between Steve Molloy, Vitaliy Zhovtyuk and 
Anshum Gupta in the first few comments.
By the time the MLT QParser was committed to SVN trunk, the query format was 
changed: 
https://issues.apache.org/jira/browse/SOLR-6248?focusedCommentId=14189235&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189235
With this format, the input immediately following the closing curly brace is 
assumed to be a docId (whatever field is the unique id key in the schema)

As of now, without any of the patches on this ticket, the first thing that 
happens is to look for document by id, and if that fails, throw an exception.

This whole "stream.body" discussion (or monologue) originally started because 
of a need to identify a document using a query on any field, not just an id.

Maybe it's time to move away from the idea of "stream.body", and re-implement 
support for any field query in MLT QParser.

If the extra tests added in "SOLR-7913_negative-tests.patch" could produce 
results instead of an exception, I don't think anyone would need to use 
stream.body with an MLT QParser query.


was (Author: igiguere):
MLT Query Parser was originally implemented to allow field queries (i.e.: 
myField:some text)
https://issues.apache.org/jira/browse/SOLR-6248
Read specifically the discussion between Steve Molloy, Vitaliy Zhovtyuk and 
Anshum Gupta in the first few comments.
By the time the MLT QParser was committed to SVN trunk, the query format was 
changed: 
https://issues.apache.org/jira/browse/SOLR-6248?focusedCommentId=14189235&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189235
With this format, the input immediately following the closing curly brace is 
assumed to be a docId (whatever field is the unique id key in the schema)

As of now, without any of the patches on this ticket, the first thing that 
happens is to look for document by id, and if that fails, throw an exception.

This whole "stream.body" discussion (or monologue) originally started because 
of a need to identify a document using a query on any field, not just an id.

Maybe it's time to move away from the idea of "stream.body", and re-implement 
support for any field query in MLT QParser.

If the extra tests added in SOLR-7913_negative-tests.patch could produce 
results instead of an exception, I don't think anyone would need to use 
stream.body with an MLT QParser query.

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_negative-tests.patch, 
> SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7913) Add stream.body support to MLT QParser

2021-01-18 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267462#comment-17267462
 ] 

Isabelle Giguere edited comment on SOLR-7913 at 1/18/21, 7:43 PM:
--

New patch on tag release/lucene-solr/8.5.0
I had forgotten to attach it when upgrading, last time.

IMPORTANT : There was a bug in previous patches.  Changes in SearchHandler and 
ShardRequest in the previous patches resulted in including the shard request 
URL in the "stream.body" passed to the MLT request.  That's why test results in 
CloudMLTQParserTest were different, when comparing the test with the id request 
(testMLTQParser) and the test with stream.body (testMLTQParserStreamBody).

Apply "SOLR-7913_fix-unit-test-setup.patch" on top of "SOLR-7913.patch" of 
today.


was (Author: igiguere):
New patch on tag release/lucene-solr/8.5.0
I had forgotten to attach it when upgrading, last time.

IMPORTANT : There was a bug in previous patches.  Changes in SearchHandler and 
ShardRequest in the previous patches resulted in including the shard request 
URL in the "stream.body" passed to the MLT request.  That's why test results in 
CloudMLTQParserTest were different, when comparing the test with the id request 
(testMLTQParser) and the test with stream.body (testMLTQParserStreamBody).

Apply SOLR-7913_fix-unit-test-setup.patch on top of SOLR-7913.patch

> Add stream.body support to MLT QParser
> --
>
> Key: SOLR-7913
> URL: https://issues.apache.org/jira/browse/SOLR-7913
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Priority: Major
> Attachments: SOLR-7913.patch, SOLR-7913.patch, SOLR-7913.patch, 
> SOLR-7913.patch, SOLR-7913_fix-unit-test-setup.patch, 
> SOLR-7913_fixTests.patch, SOLR-7913_negative-tests.patch, 
> SOLR-7913_tag_7.5.0.patch
>
>
> Continuing from 
> https://issues.apache.org/jira/browse/SOLR-7639?focusedCommentId=14601011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601011.
> It'd be good to have stream.body be supported by the mlt qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559768110



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +247,87 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override
+public void verifyAllowedModification(ModificationRequest 
modificationRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {

Review comment:
   This method should be factored out and cut into a few pieces with 
meaningful names to make reading easier.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-18 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r559768110



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +247,87 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override
+public void verifyAllowedModification(ModificationRequest 
modificationRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {

Review comment:
   This method should IMO be factored out and cut into a few pieces with 
meaningful names to make reading easier.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2217: LUCENE-9676: Hunspell: improve stemming of all-caps words

2021-01-18 Thread GitBox


dweiss commented on a change in pull request #2217:
URL: https://github.com/apache/lucene-solr/pull/2217#discussion_r559806181



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -74,6 +74,8 @@
 
   static final char[] NOFLAGS = new char[0];
 
+  private static final char HIDDEN_FLAG = (char) 65511; // called 
'ONLYUPCASEFLAG' in Hunspell

Review comment:
   I think you could use an explicit char here? '\uFFE7'? Not sure though 
because this isn't valid unicode so some validation tools may complain later 
on... Let's leave it.

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/WordCase.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.hunspell;
+
+enum WordCase {
+  UPPER,
+  TITLE,
+  LOWER,
+  MIXED;
+
+  static WordCase caseOf(char[] word, int length) {
+boolean capitalized = Character.isUpperCase(word[0]);
+
+boolean seenUpper = false;
+boolean seenLower = false;
+for (int i = 1; i < length; i++) {
+  char ch = word[i];
+  seenUpper = seenUpper || Character.isUpperCase(ch);
+  seenLower = seenLower || Character.isLowerCase(ch);
+}
+
+return get(capitalized, seenUpper, seenLower);
+  }
+
+  static WordCase caseOf(CharSequence word, int length) {
+boolean capitalized = Character.isUpperCase(word.charAt(0));
+
+boolean seenUpper = false;
+boolean seenLower = false;
+for (int i = 1; i < length; i++) {

Review comment:
   don't know if this makes much sense to optimize but you could break the 
loop too if (seenLower || seenUpper) as checking further on doesn't make sense.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2209: LUCENE-9671: Hunspell: shorten Stemmer.applyAffix

2021-01-18 Thread GitBox


dweiss merged pull request #2209:
URL: https://github.com/apache/lucene-solr/pull/2209


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9671) Hunspell: shorten Stemmer.applyAffix

2021-01-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267560#comment-17267560
 ] 

ASF subversion and git services commented on LUCENE-9671:
-

Commit ab08fdc6f0c9e5c7e27f053da59c619c6d9e643b in lucene-solr's branch 
refs/heads/master from Peter Gromov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ab08fdc ]

LUCENE-9671: Hunspell: shorten Stemmer.applyAffix (#2209)

Call stem() recursively just once with different arguments depending on various 
conditions. 

NOTE: committing in directly as this is a refactoring, not a functional change 
(no CHANGES.txt entry).

> Hunspell: shorten Stemmer.applyAffix
> 
>
> Key: LUCENE-9671
> URL: https://issues.apache.org/jira/browse/LUCENE-9671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9671) Hunspell: shorten Stemmer.applyAffix

2021-01-18 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9671:

Fix Version/s: master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Hunspell: shorten Stemmer.applyAffix
> 
>
> Key: LUCENE-9671
> URL: https://issues.apache.org/jira/browse/LUCENE-9671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9676) Hunspell: improve stemming of all-caps words

2021-01-18 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9676:

Fix Version/s: master (9.0)

> Hunspell: improve stemming of all-caps words
> 
>
> Key: LUCENE-9676
> URL: https://issues.apache.org/jira/browse/LUCENE-9676
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently words like "OPENOFFICE.ORG" result in no stems even if the 
> dictionary contains "OpenOffice.org"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9676) Hunspell: improve stemming of all-caps words

2021-01-18 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-9676:
---

Assignee: Dawid Weiss

> Hunspell: improve stemming of all-caps words
> 
>
> Key: LUCENE-9676
> URL: https://issues.apache.org/jira/browse/LUCENE-9676
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Peter Gromov
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently words like "OPENOFFICE.ORG" result in no stems even if the 
> dictionary contains "OpenOffice.org"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-18 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267582#comment-17267582
 ] 

Noble Paul commented on SOLR-15052:
---

The DOWNNODE is still processed by overseer as a single multi op. So, i expect 
the performance to be somewhat similar it better. Better because the amount of 
data that is written is much smaller. 

However I shall write a simple test to confirm it

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.8
>
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-18 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267582#comment-17267582
 ] 

Noble Paul edited comment on SOLR-15052 at 1/19/21, 12:35 AM:
--

The DOWNNODE is still processed by overseer as a single multi op. So, I expect 
the performance to be somewhat similar or better. Better because the amount of 
data that is written is much smaller 

However I shall write a simple test to confirm it.


was (Author: noble.paul):
The DOWNNODE is still processed by overseer as a single multi op. So, i expect 
the performance to be somewhat similar it better. Better because the amount of 
data that is written is much smaller. 

However I shall write a simple test to confirm it

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.8
>
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9663) Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-18 Thread Jaison.Bi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267595#comment-17267595
 ] 

Jaison.Bi commented on LUCENE-9663:
---

[~mikemccand] [~jpountz] [~sokolov]

Please help to review the pull request, thanks :)

> Adding compression to terms dict from SortedSet/Sorted DocValues
> 
>
> Key: LUCENE-9663
> URL: https://issues.apache.org/jira/browse/LUCENE-9663
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Jaison.Bi
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Elasticsearch keyword field uses SortedSet DocValues. In our applications, 
> “keyword” is the most frequently used field type.
>  LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do 
> better by replacing prefix-compression with LZ4. In one of our application, 
> the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB).
>  I've done simple tests based on the real application data, comparing the 
> write/merge time cost, and the on-disk *.dvd file size(after merge into 1 
> segment).
> || ||Before||After||
> |Write time cost(ms)|591972|618200|
> |Merge time cost(ms)|270661|294663|
> |*.dvd file size(GB)|1.95|1.15|
> This feature is only for the high-cardinality fields. 
>  I'm doing the benchmark test based on luceneutil. Will attach the report and 
> patch after the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9673) The level of IntBlockPool slice is always 1

2021-01-18 Thread mashudong (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267611#comment-17267611
 ] 

mashudong commented on LUCENE-9673:
---

Yes, it's probably an ancient bug.

IMHO, ByteBlockPool do not have the same issue.

> The level of IntBlockPool slice is always 1 
> 
>
> Key: LUCENE-9673
> URL: https://issues.apache.org/jira/browse/LUCENE-9673
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: mashudong
>Priority: Minor
>
> First slice is allocated by IntBlockPoo.newSlice(), and its level is 1,
>  
> {code:java}
> private int newSlice(final int size) {
>  if (intUpto > INT_BLOCK_SIZE-size) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
>  
>  final int upto = intUpto;
>  intUpto += size;
>  buffer[intUpto-1] = 1;
>  return upto;
> }{code}
>  
>  
> If one slice is not enough, IntBlockPoo.allocSlice() is called to allocate 
> more slices,
> as the following code shows, level is 1, newLevel is NEXT_LEVEL_ARRAY[0] 
> which is also 1.
>  
> The result is the level of IntBlockPool slice is always 1, the first slice is 
>  2 bytes long, and all subsequent slices are 4 bytes long.
>  
> {code:java}
> private static final int[] NEXT_LEVEL_ARRAY = {1, 2, 3, 4, 5, 6, 7, 8, 9, 9};
> private int allocSlice(final int[] slice, final int sliceOffset) {
>  final int level = slice[sliceOffset];
>  final int newLevel = NEXT_LEVEL_ARRAY[level - 1];
>  final int newSize = LEVEL_SIZE_ARRAY[newLevel];
>  // Maybe allocate another block
>  if (intUpto > INT_BLOCK_SIZE - newSize) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
> final int newUpto = intUpto;
>  final int offset = newUpto + intOffset;
>  intUpto += newSize;
>  // Write forwarding address at end of last slice:
>  slice[sliceOffset] = offset;
> // Write new level:
>  buffer[intUpto - 1] = newLevel;
> return newUpto;
>  } 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9663) Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-18 Thread Jaison.Bi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaison.Bi updated LUCENE-9663:
--
Status: Patch Available  (was: Open)

> Adding compression to terms dict from SortedSet/Sorted DocValues
> 
>
> Key: LUCENE-9663
> URL: https://issues.apache.org/jira/browse/LUCENE-9663
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Jaison.Bi
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Elasticsearch keyword field uses SortedSet DocValues. In our applications, 
> “keyword” is the most frequently used field type.
>  LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do 
> better by replacing prefix-compression with LZ4. In one of our application, 
> the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB).
>  I've done simple tests based on the real application data, comparing the 
> write/merge time cost, and the on-disk *.dvd file size(after merge into 1 
> segment).
> || ||Before||After||
> |Write time cost(ms)|591972|618200|
> |Merge time cost(ms)|270661|294663|
> |*.dvd file size(GB)|1.95|1.15|
> This feature is only for the high-cardinality fields. 
>  I'm doing the benchmark test based on luceneutil. Will attach the report and 
> patch after the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9675) Expose the compression mode of the binary doc values

2021-01-18 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267647#comment-17267647
 ] 

Ishan Chattopadhyaya commented on LUCENE-9675:
--

When do we expect this to be resolved. I'll (or noble) build the RC once this 
wraps up.

> Expose the compression mode of the binary doc values
> 
>
> Key: LUCENE-9675
> URL: https://issues.apache.org/jira/browse/LUCENE-9675
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9675.patch, LUCENE-9675.patch
>
>
> LUCENE-9378 introduced a way to configure the compression mode of the binary 
> doc values.
> This issue is a proposal to expose this information in the attributes of each 
> binary field.
> That would expose this information to external readers on a per-field basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (LUCENE-9671) Hunspell: shorten Stemmer.applyAffix

2021-01-18 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov closed LUCENE-9671.


> Hunspell: shorten Stemmer.applyAffix
> 
>
> Key: LUCENE-9671
> URL: https://issues.apache.org/jira/browse/LUCENE-9671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9671) Hunspell: shorten Stemmer.applyAffix

2021-01-18 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9671:
-
Issue Type: Improvement  (was: Bug)

> Hunspell: shorten Stemmer.applyAffix
> 
>
> Key: LUCENE-9671
> URL: https://issues.apache.org/jira/browse/LUCENE-9671
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9667) Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules

2021-01-18 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9667:
-
Issue Type: Improvement  (was: Bug)

> Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules
> -
>
> Key: LUCENE-9667
> URL: https://issues.apache.org/jira/browse/LUCENE-9667
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
> Attachments: LUCENE-9667.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Test data taken from hunspell C++, the new code is based on 
> https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.cxx#L675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >