date:20200305

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-05 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051873#comment-17051873
 ] 

Jan Høydahl commented on SOLR-14306:


Seems Kafka has had the same discussions (see 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-273+-+Kafka+to+support+using+ETCD+beside+Zookeeper)
 but I think they ended up with 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
 instead, I.e handling state and coordination in Kafka instead of external 
system. 
Would be interesting to evaluate Apache Ratis 
(http://ratis.incubator.apache.org/) as an embedded zk replacement!

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9077) Gradle build

2020-03-05 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051889#comment-17051889
 ] 

Dawid Weiss commented on LUCENE-9077:
-

There is no word-for-word equivalent but the functionality is there (it's part 
of {{gradlew check}} on each project).

See {{gradlew :helpDependencies}} if you need details: 
{code}
Updating dependency checksum and licenses
-

The last step is to make sure the licenses, notice files and checksums
are in place for any new dependencies. This command will print what's
missing and where:

gradlew licenses

To update JAR checksums for licenses use:

gradlew updateLicenses 
{code}

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9077-javadoc-locale-en-US.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
>  * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equivalent of 'documentation-lint" to precommit.
>  * (/) Do not require files to be committed before running precommit. (staged 
> files are fine).
>  * (/) add rendering of javadocs (gradlew javadoc)
>  * Attach javadocs to maven publications.
>  * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
> it'll be difficult to run it sensibly because gradle doesn't offer cwd 
> separation for the forked test runners.
>  * if you diff solr packaged distribution against ant-created distribution 
> there are minor differences in library versions and some JARs are excluded/ 
> moved around. I didn't try to force these as everything seems to work (tests, 
> etc.) – perhaps these differences should  be fixed in the ant build instead.
>  * (/) identify and port various "regenerate" tasks from ant builds (javacc, 
> precompiled automata, etc.)
>  * Fill in POM details in gradle/defaults-maven.gradle so that they reflect 
> the previous content better (dependencies aside).
>  * Add any IDE integration layers that should be added (I use IntelliJ and it 
> imports the project out of the box, without the need for any spe

[jira] [Comment Edited] (LUCENE-9077) Gradle build

2020-03-05 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051889#comment-17051889
 ] 

Dawid Weiss edited comment on LUCENE-9077 at 3/5/20, 8:27 AM:
--

There is no word-for-word equivalent but the functionality is there (it's part 
of {{gradlew check}} on each project).

See {{gradlew :helpDeps}} if you need details: 
{code}
Updating dependency checksum and licenses
-

The last step is to make sure the licenses, notice files and checksums
are in place for any new dependencies. This command will print what's
missing and where:

gradlew licenses

To update JAR checksums for licenses use:

gradlew updateLicenses 
{code}


was (Author: dweiss):
There is no word-for-word equivalent but the functionality is there (it's part 
of {{gradlew check}} on each project).

See {{gradlew :helpDependencies}} if you need details: 
{code}
Updating dependency checksum and licenses
-

The last step is to make sure the licenses, notice files and checksums
are in place for any new dependencies. This command will print what's
missing and where:

gradlew licenses

To update JAR checksums for licenses use:

gradlew updateLicenses 
{code}

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9077-javadoc-locale-en-US.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
>  * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equivalent of 'documentation-lint" to precommit.
>  * (/) Do not require files to be committed before running precommit. (staged 
> files are fine).
>  * (/) add rendering of javadocs (gradlew javadoc)
>  * Attach javadocs to maven publications.
>  * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
> it'll be difficult to run it sensibly because gradle doesn't offer cwd 
> separation for the forked test runners.
>  * if you diff solr packaged distribution against ant-created distribution 
> there are minor differences in library

[jira] [Updated] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field

2020-03-05 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-9258:
---
Affects Version/s: 7.7.2

> DocTermsIndexDocValues should not assume it's operating on a SortedDocValues 
> field
> --
>
> Key: LUCENE-9258
> URL: https://issues.apache.org/jira/browse/LUCENE-9258
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.7.2, 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9258.patch
>
>
> When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from 
> _DocTermsIndexDocValues_ , the latter instantiates a new iterator on 
> _SortedDocValues_ regardless of the fact that the underlying field can 
> actually be of a different type (e.g. a _SortedSetDocValues_ processed 
> through a _SortedSetSelector_).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw opened a new pull request #1319: LUCENE-9164: process all events before closing gracefully

2020-03-05 Thread GitBox

s1monw opened a new pull request #1319: LUCENE-9164: process all events before 
closing gracefully
URL: https://github.com/apache/lucene-solr/pull/1319
 
 
   This is yet another / simpler approach to 
https://github.com/apache/lucene-solr/pull/1274 to ensure that all event are 
processed if we are closing the IW gracefully. This also improves the case 
where we closing due to a tragic event where we don't try to be heroic and just 
drop all pending events on the floor. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on issue #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying

2020-03-05 Thread GitBox

s1monw commented on issue #1274: LUCENE-9164: Prevent IW from closing 
gracefully if threads are still modifying
URL: https://github.com/apache/lucene-solr/pull/1274#issuecomment-595130965
 
 
   @mikemccand @dnhatn I explored one more idea that less intrusive and more 
contained. I like this one much better 
https://github.com/apache/lucene-solr/pull/1319


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully

2020-03-05 Thread GitBox

dweiss commented on a change in pull request #1319: LUCENE-9164: process all 
events before closing gracefully
URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r388181480
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -299,7 +300,70 @@ static int getActualMaxDocs() {
   final FieldNumbers globalFieldNumberMap;
 
   final DocumentsWriter docWriter;
-  private final Queue eventQueue = new ConcurrentLinkedQueue<>();
+  private final CloseableQueue eventQueue = new CloseableQueue();
 
 Review comment:
   Wouldn't it be nicer to make it just Closeable and pass IndexWriter in the 
constructor (instead of each method)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on issue #1319: LUCENE-9164: process all events before closing gracefully

2020-03-05 Thread GitBox

s1monw commented on issue #1319: LUCENE-9164: process all events before closing 
gracefully
URL: https://github.com/apache/lucene-solr/pull/1319#issuecomment-595136991
 
 
   thanks for looking @dweiss 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1320: LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes.

2020-03-05 Thread GitBox

bruno-roustant opened a new pull request #1320: LUCENE-9257: Always keep FST 
off-heap. Remove FSTLoadMode and Reader attributes.
URL: https://github.com/apache/lucene-solr/pull/1320
 
 
   This PR modifies many classes because it removes Reader attributes now 
unused because FST is always loaded off-heap.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package

2020-03-05 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052040#comment-17052040
 ] 

Bruno Roustant commented on LUCENE-9257:


New PR#1320 which removes FSTLoadMode and *also* Reader attributes.

When removing FSTLoadMode I realized that Reader attributes have been 
introduced for it and they now become unused. Since Reader attributes represent 
a lot of code I think it is interesting to remove it. If sometime in the future 
someone needs to get them back, this is commit 
a302be381ea611e57d32d7f277206e726329fa6e.

Please tell me if it is ok to remove Reader attributes, or if we should keep 
them (but in this case, where do we define the attribute key constant?).

> FSTLoadMode should not be BlockTree specific as it is used more generally in 
> index package
> --
>
> Key: LUCENE-9257
> URL: https://issues.apache.org/jira/browse/LUCENE-9257
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FSTLoadMode and its associate attribute key (static String) are currently 
> defined in BlockTreeTermsReader, but they are actually used outside of 
> BlockTree in the general "index" package.
> CheckIndex and ReadersAndUpdates are using these enum and attribute key to 
> drive the FST load mode through the SegmentReader which is not specific to a 
> postings format. They have an unnecessary dependency to BlockTreeTermsReader.
> We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public 
> enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not 
> import anymore BlockTreeTermsReader.
> This would also allow other postings formats to use the same enum (e.g. 
> LUCENE-9254)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory

2020-03-05 Thread Yannick Welsch (Jira)

Yannick Welsch created LUCENE-9264:
--

 Summary: Remove SimpleFSDirectory in favor of NIOFsDirectory
 Key: LUCENE-9264
 URL: https://issues.apache.org/jira/browse/LUCENE-9264
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yannick Welsch


{{SimpleFSDirectory}} looks to duplicate what's already offered by 
{{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is using 
non-positional reads on the {{FileChannel}} (i.e., reads that are stateful, 
changing the current position), and {{SimpleFSDirectory}} therefore has to 
externally synchronize access to the read method.

On Windows, positional reads are not supported, which is why {{FileChannel}} is 
already internally using synchronization to guarantee only access by one thread 
at a time for positional reads (see {{read(ByteBuffer dst, long position)}} in 
{{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, which returns 
true on Windows) and the JDK implementation for Windows is emulating positional 
reads by using non-positional ones, see 
[http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139].

This means that on Windows, there should be no difference between 
{{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it should 
be equally poor as both implementations only allow one thread at a time to 
read). On Linux/Mac, {{NIOFsDirectory}} is superior to {{SimpleFSDirectory}}, 
however, as positional reads (pread) can be done concurrently.

My proposal is to remove {{SimpleFSDirectory}} and replace its uses with 
{{NIOFsDirectory}}, given how similar these two directory implementations are 
({{SimpleFSDirectory}} isn't really simpler).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-05 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-8103:
---
Attachment: LUCENE-8103.patch

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-05 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052064#comment-17052064
 ] 

Michele Palmia commented on LUCENE-8103:


Why would a Scorer offer a fast TwoPhaseIterator but not serve it (repackaged 
as DocIdSetIterator) when asked for a simple old-school iterator()? In my 
naivety, I would expect that in case a fast TPI is implemented, it would always 
be served when clients call iterator() too.

In case that's not the case, and an explicit repackage is useful, here's my 
patch.
[^LUCENE-8103.patch]

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14147) enable security manager by default

2020-03-05 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052069#comment-17052069
 ] 

Cassandra Targett commented on SOLR-14147:
--

I just noticed that this commit removed a section of the securing-solr.adoc 
page in the Ref Guide added by SOLR-13984 that for 8.x will explain how to 
enable the security manager. While I get the point of doing that - if it's 
enabled by default, users don't need to enable it - I would suggest that since 
there seem to be a couple of reasons why someone might need to disable it, 
instead of removing the section entirely it should have been edited to describe 
how to disable it.

[~marcussorealheis], any objection to me adding the section back to master 
edited in this way?

> enable security manager by default
> --
>
> Key: SOLR-14147
> URL: https://issues.apache.org/jira/browse/SOLR-14147
> Project: Solr
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For 9.0, set SOLR_SECURITY_MANAGER_ENABLED=true by default. Remove the step 
> from securing solr page as it will be done by default (defaults become safe). 
> Users can disable if they are running hadoop or doing other crazy stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14147) enable security manager by default

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052073#comment-17052073
 ] 

ASF subversion and git services commented on SOLR-14147:


Commit 74b9ba396c670cff7b738563475a92b8051f6690 in lucene-solr's branch 
refs/heads/master from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74b9ba3 ]

SOLR-14147: comment out for now link to security manager docs in upgrade notes 
that don't exist on master


> enable security manager by default
> --
>
> Key: SOLR-14147
> URL: https://issues.apache.org/jira/browse/SOLR-14147
> Project: Solr
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For 9.0, set SOLR_SECURITY_MANAGER_ENABLED=true by default. Remove the step 
> from securing solr page as it will be done by default (defaults become safe). 
> Users can disable if they are running hadoop or doing other crazy stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13983) remove or replace process execution in SystemInfoHandler

2020-03-05 Thread Robert Muir (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-13983.

Fix Version/s: 8.5
   Resolution: Fixed

> remove or replace process execution in SystemInfoHandler
> 
>
> Key: SOLR-13983
> URL: https://issues.apache.org/jira/browse/SOLR-13983
> Project: Solr
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-13983.patch
>
>
> SystemInfoHandler is the only place in solr code executing processes. 
> Since solr is a server/long running process listening to HTTP, ideally 
> process execution could be disabled (e.g. with security manager). But first 
> this code needs to be removed or replaced, so that there is no legitimate use 
> of it:
> {noformat}
> try { 
>   if (!Constants.WINDOWS) {
> info.add( "uname",  execute( "uname -a" ) );
> info.add( "uptime", execute( "uptime" ) );
>   }
> } catch( Exception ex ) {
>   log.warn("Unable to execute command line tools to get operating system 
> properties.", ex);
> } 
> return info;
> {noformat}
> It already looks like its getting data from OS MXbean here, so maybe this 
> logic is simply outdated or not needed. It seems to be "best-effort" anyway. 
> Alternatively similar stuff could be fetched by reading from e.g. /proc file 
> system location if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory

2020-03-05 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052077#comment-17052077
 ] 

Robert Muir commented on LUCENE-9264:
-

+1

> Remove SimpleFSDirectory in favor of NIOFsDirectory
> ---
>
> Key: LUCENE-9264
> URL: https://issues.apache.org/jira/browse/LUCENE-9264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yannick Welsch
>Priority: Minor
>
> {{SimpleFSDirectory}} looks to duplicate what's already offered by 
> {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is 
> using non-positional reads on the {{FileChannel}} (i.e., reads that are 
> stateful, changing the current position), and {{SimpleFSDirectory}} therefore 
> has to externally synchronize access to the read method.
> On Windows, positional reads are not supported, which is why {{FileChannel}} 
> is already internally using synchronization to guarantee only access by one 
> thread at a time for positional reads (see {{read(ByteBuffer dst, long 
> position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, 
> which returns true on Windows) and the JDK implementation for Windows is 
> emulating positional reads by using non-positional ones, see 
> [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139].
> This means that on Windows, there should be no difference between 
> {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it 
> should be equally poor as both implementations only allow one thread at a 
> time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to 
> {{SimpleFSDirectory}}, however, as positional reads (pread) can be done 
> concurrently.
> My proposal is to remove {{SimpleFSDirectory}} and replace its uses with 
> {{NIOFsDirectory}}, given how similar these two directory implementations are 
> ({{SimpleFSDirectory}} isn't really simpler).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1313: LUCENE-8962: Split test case

2020-03-05 Thread GitBox

msokolov commented on a change in pull request #1313: LUCENE-8962: Split test 
case
URL: https://github.com/apache/lucene-solr/pull/1313#discussion_r388264716
 
 

 ##
 File path: 
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java
 ##
 @@ -298,63 +320,44 @@ public void testMergeOnCommit() throws IOException, 
InterruptedException {
 DirectoryReader firstReader = DirectoryReader.open(firstWriter);
 assertEquals(5, firstReader.leaves().size());
 firstReader.close();
-firstWriter.close();
-
-MergePolicy mergeOnCommitPolicy = new LogDocMergePolicy() {
-  @Override
-  public MergeSpecification findFullFlushMerges(MergeTrigger mergeTrigger, 
SegmentInfos segmentInfos, MergeContext mergeContext) {
-// Optimize down to a single segment on commit
-if (mergeTrigger == MergeTrigger.COMMIT && segmentInfos.size() > 1) {
-  List nonMergingSegments = new ArrayList<>();
-  for (SegmentCommitInfo sci : segmentInfos) {
-if (mergeContext.getMergingSegments().contains(sci) == false) {
-  nonMergingSegments.add(sci);
-}
-  }
-  if (nonMergingSegments.size() > 1) {
-MergeSpecification mergeSpecification = new MergeSpecification();
-mergeSpecification.add(new OneMerge(nonMergingSegments));
-return mergeSpecification;
-  }
-}
-return null;
-  }
-};
+firstWriter.close(); // When this writer closes, it does not merge on 
commit.
 
-AtomicInteger abandonedMerges = new AtomicInteger(0);
 IndexWriterConfig iwc = newIndexWriterConfig(new MockAnalyzer(random()))
-.setMergePolicy(mergeOnCommitPolicy)
-.setIndexWriterEvents(new IndexWriterEvents() {
-  @Override
-  public void beginMergeOnCommit() {
-
-  }
-
-  @Override
-  public void finishMergeOnCommit() {
+.setMergePolicy(MERGE_ON_COMMIT_POLICY);
 
-  }
-
-  @Override
-  public void abandonedMergesOnCommit(int abandonedCount) {
-abandonedMerges.incrementAndGet();
-  }
-});
 IndexWriter writerWithMergePolicy = new IndexWriter(dir, iwc);
-
-writerWithMergePolicy.commit();
+writerWithMergePolicy.commit(); // No changes. Commit doesn't trigger a 
merge.
 
 DirectoryReader unmergedReader = 
DirectoryReader.open(writerWithMergePolicy);
-assertEquals(5, unmergedReader.leaves().size()); // Don't merge unless 
there's a change
+assertEquals(5, unmergedReader.leaves().size());
 unmergedReader.close();
 
 TestIndexWriter.addDoc(writerWithMergePolicy);
-writerWithMergePolicy.commit();
+writerWithMergePolicy.commit(); // Doc added, do merge on commit.
+assertEquals(1, writerWithMergePolicy.getSegmentCount()); //
 
 DirectoryReader mergedReader = DirectoryReader.open(writerWithMergePolicy);
-assertEquals(1, mergedReader.leaves().size()); // Now we merge on commit
+assertEquals(1, mergedReader.leaves().size());
 mergedReader.close();
 
+try (IndexReader reader = writerWithMergePolicy.getReader()) {
+  IndexSearcher searcher = new IndexSearcher(reader);
+  assertEquals(6, reader.numDocs());
+  assertEquals(6, searcher.count(new MatchAllDocsQuery()));
+}
+
+writerWithMergePolicy.close();
+dir.close();
+  }
+
+   // Test that when we have multiple indexing threads merging on commit, we 
never throw an exception.
+  @Nightly
 
 Review comment:
   Yes, I think given it does not assert anything -- just makes sure no 
exceptions occur -- we should already be well-covered.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1320: LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes.

2020-03-05 Thread GitBox

juanka588 commented on a change in pull request #1320: LUCENE-9257: Always keep 
FST off-heap. Remove FSTLoadMode and Reader attributes.
URL: https://github.com/apache/lucene-solr/pull/1320#discussion_r388274441
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/FieldReader.java
 ##
 @@ -82,32 +80,11 @@
 //   System.out.println("BTTR: seg=" + segment + " field=" + 
fieldInfo.name + " rootBlockCode=" + rootCode + " divisor=" + indexDivisor);
 // }
 rootBlockFP = (new ByteArrayDataInput(rootCode.bytes, rootCode.offset, 
rootCode.length)).readVLong() >>> BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS;
-// Initialize FST offheap if index is MMapDirectory and
-// docCount != sumDocFreq implying field is not primary key
+// Initialize FST always off-heap.
 if (indexIn != null) {
-  switch (fstLoadMode) {
-case ON_HEAP:
-  isFSTOffHeap = false;
-  break;
-case OFF_HEAP:
-  isFSTOffHeap = true;
-  break;
-case OPTIMIZE_UPDATES_OFF_HEAP:
-  isFSTOffHeap = ((this.docCount != this.sumDocFreq) || 
openedFromWriter == false);
-  break;
-case AUTO:
-  isFSTOffHeap = ((this.docCount != this.sumDocFreq) || 
openedFromWriter == false) && indexIn instanceof ByteBufferIndexInput;
-  break;
-default:
-  throw new IllegalStateException("unknown enum constant: " + 
fstLoadMode);
-  }
   final IndexInput clone = indexIn.clone();
   clone.seek(indexStartFP);
-  if (isFSTOffHeap) {
-index = new FST<>(clone, ByteSequenceOutputs.getSingleton(), new 
OffHeapFSTStore());
-  } else {
-index = new FST<>(clone, ByteSequenceOutputs.getSingleton());
-  }
+  index = new FST<>(clone, ByteSequenceOutputs.getSingleton(), new 
OffHeapFSTStore());
 
 Review comment:
   nice


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on a change in pull request #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

ctargett commented on a change in pull request #1292: SOLR-14284 add 
expressible support to list, and add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#discussion_r388300341
 
 

 ##
 File path: solr/solr-ref-guide/src/stream-api.adoc
 ##
 @@ -0,0 +1,210 @@
+= Stream Request Handler API
+:page-toclevels: 1
+:page-tocclass: right
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+These API commands work with the `/stream` request handler.
+
 
 Review comment:
   Since there are only really 2 types of actions possible - 1 to list all 
expressions available, and 4 to manipulate daemon streams - I think it might be 
helpful for users to state that somewhat limited scope upfront.
   
   Also a link from the daemon expression to this new page would be appropriate 
IMO, and less importantly a link to the Stream handler documentation (unless 
this is intended to be a child of that page - not clear where this is intended 
to live).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on issue #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

ctargett commented on issue #1292: SOLR-14284 add expressible support to list, 
and add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#issuecomment-595237017
 
 
   The changes here are slightly confusing because the descriptions of the Jira 
and the PR refer to documenting `add-expressible` (and etc.), but there are 
examples added for `delete-requesthandler`, which is tangential & sort of 
thrown in there. While the title of this PR mentions it, the descriptions don't 
so it's hard to know what to expect...it's fine in the end, it's just a barrier 
to a review to note for future PRs.
   
   I left another specific comment on the new page, and generally that content 
is good. However, I know it will fail the build because there is no edit to a 
page to include the new page as a child of another (so it does not currently 
fit anywhere in the page hierarchy), which means you didn't run the build first 
and there could be other issues that need to be resolved before this can be 
committed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14284) Document that you can add a new stream function via add-expressible

2020-03-05 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052157#comment-17052157
 ] 

Cassandra Targett commented on SOLR-14284:
--

I took a pass on review of this today. I can't totally vouch for its accuracy, 
but the content is good IMO. The PR is missing a couple things though before it 
can be committed - biggest is to put the page into the page hierarchy by adding 
it as a child of another page. Otherwise the build will fail since the new page 
technically doesn't belong anywhere.

> Document that you can add a new stream function via add-expressible
> ---
>
> Key: SOLR-14284
> URL: https://issues.apache.org/jira/browse/SOLR-14284
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.5
>Reporter: David Eric Pugh
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I confirmed that in Solr 8.5 you will be able to dynamically add a Stream 
> function (assuming the Jar is in the path) via the configset api:
> curl -X POST -H 'Content-type:application/json'  -d '{
>   "add-expressible": {
> "name": "dog",
> "class": "org.apache.solr.handler.CatStream"
>   }
> }' http://localhost:8983/solr/gettingstarted/config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14284) Document that you can add a new stream function via add-expressible

2020-03-05 Thread Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052162#comment-17052162
 ] 

Eric Pugh commented on SOLR-14284:
--

Thanks!  I suspect I goofed the commit.  “Works on my laptop” said every
developer :-)

On Thu, Mar 5, 2020 at 8:54 AM Cassandra Targett (Jira) 



> Document that you can add a new stream function via add-expressible
> ---
>
> Key: SOLR-14284
> URL: https://issues.apache.org/jira/browse/SOLR-14284
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.5
>Reporter: David Eric Pugh
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I confirmed that in Solr 8.5 you will be able to dynamically add a Stream 
> function (assuming the Jar is in the path) via the configset api:
> curl -X POST -H 'Content-type:application/json'  -d '{
>   "add-expressible": {
> "name": "dog",
> "class": "org.apache.solr.handler.CatStream"
>   }
> }' http://localhost:8983/solr/gettingstarted/config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on a change in pull request #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

ctargett commented on a change in pull request #1292: SOLR-14284 add 
expressible support to list, and add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#discussion_r388311699
 
 

 ##
 File path: solr/solr-ref-guide/src/stream-api.adoc
 ##
 @@ -0,0 +1,210 @@
+= Stream Request Handler API
+:page-toclevels: 1
+:page-tocclass: right
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+These API commands work with the `/stream` request handler.
+
 
 Review comment:
   Also, it seems that the v2 API structure is not supported with this API? If 
that's true, we might want to state that somewhere outright.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13919) RefGuide: Add example for AuditLogger to use log4j to log into separate files

2020-03-05 Thread Cassandra Targett (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-13919:
-
Component/s: documentation

> RefGuide: Add example for AuditLogger to use log4j to log into separate files
> -
>
> Key: SOLR-13919
> URL: https://issues.apache.org/jira/browse/SOLR-13919
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation, SolrCloud
>Affects Versions: 8.3
>Reporter: Jörn Franke
>Priority: Minor
>
> At the moment, the Solr reference guide provides an example on how to log 
> audit events to the standard Solr log (see 
> [https://lucene.apache.org/solr/guide/8_3/audit-logging.html).] Those events 
> are logged in the standard Solr log.
> This enhancement proposes to include a simple explanation in the reference 
> guide on how to configure log4j to log audit events in a separate file (this 
> is possible already now with log4j, this issue is just about adding an 
> example log4j configuration file for the Solr audit logger).
> Reasoning behind this is that it can reduce the load on a SIEM system 
> significantly as it only needs to process the relevant audit logs.
> To be discussed: Should there be a standard log4j configuration when 
> installing Solr to log into a separate file (maybe even the same log 
> directory by default) all audit event



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-12865) Custom JSON parser's nested documents example does not work

2020-03-05 Thread Cassandra Targett (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-12865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-12865:
-
Component/s: documentation

> Custom JSON parser's nested documents example does not work
> ---
>
> Key: SOLR-12865
> URL: https://issues.apache.org/jira/browse/SOLR-12865
> Project: Solr
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 7.5
>Reporter: Alexandre Rafalovitch
>Priority: Major
>  Labels: json
>
> The only exam we have for indexing nested JSON using JSON parser does not 
> seem to work: 
> [https://lucene.apache.org/solr/guide/7_5/transforming-and-indexing-custom-json.html#indexing-nested-documents]
> Attempt 1, using default schemaless mode:
>  # bin/solr create -c json_basic
>  # Example command in V1 format (with core name switched to above)
>  # Indexing fails with: *"msg":"[doc=null] missing required field: id"*. My 
> guess it is because the URPs chain do not apply to inner children records
> Attempt 2, using techproducts schema configuration:
>  # bin/solr create -c json_tp -d sample_techproducts_configs
>  # Same example command with new core
>  # Indexing fails with: *"msg":"Raw data can be stored only if split=/"* (due 
> to presence of srcField in the params.json)
> Attempt 3, continuing the above example, but taking out srcField 
> configuration:
>  # Update params.json to remove srcField
>  # Same example command
>  # It indexes (but not commits)
>  # curl http://localhost:8983/solr/json_tp/update/json -v -d '\{commit:{}}
>  # The core now contains only one document with auto-generated "id" and 
> "_version_" field (because we have mapUniqueKeyOnly in params.json)
> Attempt 4, removing more keys
>  # Update params.json to remove mapUniqueKeyOnly
>  # Same example command
>  # Indexing fails with: *"msg":"Document is missing mandatory uniqueKey 
> field: id"*
> There does not seem to be way to index the nested JSON using the transformer 
> approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13120) Bad Documentation Link

2020-03-05 Thread Cassandra Targett (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett resolved SOLR-13120.
--
Resolution: Won't Fix

The page in question hadn't been part of our official docs for a long while, 
and since the migration to cwiki in Summer 2019, it appears to be entirely gone.

> Bad Documentation Link
> --
>
> Key: SOLR-13120
> URL: https://issues.apache.org/jira/browse/SOLR-13120
> Project: Solr
>  Issue Type: Task
>Reporter: Kyle Cundari
>Priority: Major
>
> In the Solr Docs: [https://wiki.apache.org/solr/CommonQueryParameters]
>  
> There is a bad link ("full cursorMark deep paging example") Under the "Deep 
> paging with cursorMark" header.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on issue #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

epugh commented on issue #1292: SOLR-14284 add expressible support to list, and 
add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#issuecomment-595269969
 
 
   Thanks @ctargett for the review.  Do you want me to pull that 
`delete-requesthandler` into another Jira issue?  Totally happy to do that, and 
your feedback makes complete sense about it being a barrier.   I am looking at 
the other comments.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory

2020-03-05 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052214#comment-17052214
 ] 

Bruno Roustant commented on LUCENE-9264:


+1

> Remove SimpleFSDirectory in favor of NIOFsDirectory
> ---
>
> Key: LUCENE-9264
> URL: https://issues.apache.org/jira/browse/LUCENE-9264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yannick Welsch
>Priority: Minor
>
> {{SimpleFSDirectory}} looks to duplicate what's already offered by 
> {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is 
> using non-positional reads on the {{FileChannel}} (i.e., reads that are 
> stateful, changing the current position), and {{SimpleFSDirectory}} therefore 
> has to externally synchronize access to the read method.
> On Windows, positional reads are not supported, which is why {{FileChannel}} 
> is already internally using synchronization to guarantee only access by one 
> thread at a time for positional reads (see {{read(ByteBuffer dst, long 
> position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, 
> which returns true on Windows) and the JDK implementation for Windows is 
> emulating positional reads by using non-positional ones, see 
> [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139].
> This means that on Windows, there should be no difference between 
> {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it 
> should be equally poor as both implementations only allow one thread at a 
> time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to 
> {{SimpleFSDirectory}}, however, as positional reads (pread) can be done 
> concurrently.
> My proposal is to remove {{SimpleFSDirectory}} and replace its uses with 
> {{NIOFsDirectory}}, given how similar these two directory implementations are 
> ({{SimpleFSDirectory}} isn't really simpler).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on a change in pull request #1291: LUCENE-9016: RefGuide meta doc for how to publish website

2020-03-05 Thread GitBox

ctargett commented on a change in pull request #1291: LUCENE-9016: RefGuide 
meta doc for how to publish website
URL: https://github.com/apache/lucene-solr/pull/1291#discussion_r388348872
 
 

 ##
 File path: solr/solr-ref-guide/src/meta-docs/publish.adoc
 ##
 @@ -47,61 +47,26 @@ To build the HTML:
 [source,bash]
 $ ant clean default
 +
-This will produce pages with a DRAFT watermark across them. While these are 
fine for initial DRAFT publication, see the section <> for 
steps to produce final production-ready HTML pages.
+This will produce pages with a DRAFT watermark across them. While these are 
fine for initial DRAFT publication, see the section <> for steps to produce final production-ready HTML pages.
 . The resulting Guide will be in `solr/build/solr-ref-guide`. The HTML files 
themselves will be in `solr/build/solr-ref-guide/html-site`.
 
 Review comment:
   Since you changed the heading this refers to back to its original, this 
reference should be changed too. This is why precommit failed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on issue #1291: LUCENE-9016: RefGuide meta doc for how to publish website

2020-03-05 Thread GitBox

ctargett commented on issue #1291: LUCENE-9016: RefGuide meta doc for how to 
publish website
URL: https://github.com/apache/lucene-solr/pull/1291#issuecomment-595274181
 
 
   Sorry, I approved this and then looked at why precommit failed and there is 
still 1 page reference that is incorrect (since you changed the section title 
back to what it was originally).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on issue #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

ctargett commented on issue #1292: SOLR-14284 add expressible support to list, 
and add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#issuecomment-595275870
 
 
   It's fine to keep it here, it just was a disconnect at first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on a change in pull request #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

epugh commented on a change in pull request #1292: SOLR-14284 add expressible 
support to list, and add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#discussion_r388354635
 
 

 ##
 File path: solr/solr-ref-guide/src/stream-api.adoc
 ##
 @@ -0,0 +1,210 @@
+= Stream Request Handler API
+:page-toclevels: 1
+:page-tocclass: right
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+These API commands work with the `/stream` request handler.
+
 
 Review comment:
   I reworked the intro, and used the `NOTE` to call out the lack of following 
v2 API structure.   Also, there is a link from the Daemon expression detail to 
this page and vice versa, and I modified the `streaming-expressions.adoc` to 
have `stream-api.adoc` be a child page.   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on issue #1292: SOLR-14284 add expressible support to list, and add example of removing a component

2020-03-05 Thread GitBox

epugh commented on issue #1292: SOLR-14284 add expressible support to list, and 
add example of removing a component
URL: https://github.com/apache/lucene-solr/pull/1292#issuecomment-595290097
 
 
   I think I have responded and pushed up all changes.  Thank you for 
reviewing, and let me know if any other changes are needed!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-03-05 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052294#comment-17052294
 ] 

Tomoko Uchida commented on SOLR-11746:
--

[~ctargett] I've used asciidoctor 1.5.6.2, and after updating its version {{ant 
build-site}} started to work for me again. Thank you!

And yes, with Gradle build we shouldn't have such problems. :)  (I didn't know 
jruby-gradle-plugin but it seems to work just as [Bundler|https://bundler.io/], 
the defact dependency management tool for Ruby.

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-03-05 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052324#comment-17052324
 ] 

Munendra S N commented on SOLR-12325:
-

+1 to adding additional test
Few nitpicks 
* Remove any usage of System.out.* in the patch
* matchPart has bug, as it returns on the if both the keys match. Also, instead 
of returning just {{err}}, I think it would be better if we include the key for 
which compare failed

{code:java}
for (String key: keys) {
if ( (((Map) inputObj2).get(key)).equals(((Map) inputObj1).get(key))) {
  return null; // the culprit
} else {
  return "err";
}
  }
{code}

* Also, do we need logging in {{matchTwoJSONs}} as already we are throwing 
exception? If log is for some purpose then, we can keep else, we can avoid it. 
Method would be cleaner without failed flags

[~mkhl] should we close this as branch_8_5 is cut from master?

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13807) Caching for term facet counts

2020-03-05 Thread Michael Gibney (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052328#comment-17052328
 ] 

Michael Gibney commented on SOLR-13807:
---

Regarding TermFacetCacheRegenerator, my understanding of CacheHelper.getKey() 
is that the returned keys should work the same way at the segment level that 
they do at the top level; notably, that the types of modifications you mention 
(deletes, in-place DV updates, etc.) should result in the creation of a new 
cache key. Is that not true?

{{countCacheDf}} is defined wrt the main domain DocSet.size(), and only affects 
whether the {{termFacetCache}} is consulted for a given domain-request 
combination. It should _not_ affect the cached values themselves, if that's 
your concern. As far as the temporarily tabled concerns about concurrent 
mutation, this was something I considered, and (I think) addressed 
[here|https://github.com/apache/lucene-solr/pull/751/files#diff-1b16fc96c8dde547ddde619e54a45c26R1158-R1161]:
{code:java}
  if (segmentCache == null) {
// no cache presence; initialize.
cacheState = CacheState.NOT_CACHED;
newSegmentCache = new 
HashMap<>(fcontext.searcher.getIndexReader().leaves().size() + 1);
  } else if (segmentCache.containsKey(topLevelKey)) {
topLevelEntry = segmentCache.get(topLevelKey);
CachedCountSlotAcc acc = new CachedCountSlotAcc(fcontext, 
topLevelEntry.topLevelCounts);
return new SweepCountAccStruct(qKey, docs, CacheState.CACHED, null, 
isBase, acc,
new ReadOnlyCountSlotAccWrapper(fcontext, acc), acc);
  } else {
// defensive copy, since cache entries are shared across threads
cacheState = CacheState.PARTIALLY_CACHED;
newSegmentCache = new 
HashMap<>(fcontext.searcher.getIndexReader().leaves().size() + 1);
newSegmentCache.putAll(segmentCache);
  }
{code}
In that last {{else}} block, each domain-request combination that finds a 
partial cache entry (with some segments populated), creates and populates an 
entirely new, request-private top-level cache entry (initially sharing the 
immutable segment-level entries from the extant top-level entry). On completion 
of processing, this new top-level entry is placed atomically into the 
termFacetCache. I believe this should be robust; and if indeed robust, at worst 
you'd end up with concurrent requests each doing the work of creating 
equivalent top-level cache entries, the last of which would remain in the cache 
... which should be no worse than the status quo, where each request always 
does all the work of recalculating facet counts.

> Caching for term facet counts
> -
>
> Key: SOLR-13807
> URL: https://issues.apache.org/jira/browse/SOLR-13807
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
> Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is high latency 
> for faceting over an unrestricted domain (e.g., {{\*:\*}}), as might be 
> observed on a top-level landing page that exposes facets. This type of 
> "static" case is often mitigated by external (to Solr) caching, either with a 
> caching layer between Solr and a front-end application, or within a front-end 
> application, or even with a caching layer between the end user and a 
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the 
> stack (or, for a new user, even being aware of this as a potential issue to 
> mitigate), any external caching mitigation is really only appropriate for 
> relatively static cases like the "landing page" example described above. A 
> Solr-internal facet count cache (analogous to the {{filterCache}}) would 
> provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance 
> concern
>  # compact (specifically caching count arrays, without the extra baggage that 
> accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in co

[jira] [Commented] (SOLR-13807) Caching for term facet counts

2020-03-05 Thread Chris M. Hostetter (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052350#comment-17052350
 ] 

Chris M. Hostetter commented on SOLR-13807:
---

bq. my understanding of CacheHelper.getKey() is that the returned keys ... that 
the types of modifications you mention (deletes, in-place DV updates, etc.) 
should result in the creation of a new cache key. Is that not true?

I don't know ... it's not something i've looked into in depth, if so then false 
alarm (but we should double check, and ideally prove it w/a defensive white box 
test of the regenerator after doing some deletes/in-place updates)

bq. countCacheDf is defined wrt the main domain DocSet.size(), and only affects 
whether the termFacetCache is consulted for a given domain-request combination 
...

Oh, oh OH ! ... ok  that explains so much about what i was seeing in cache 
stats after various requests.  For some reason I thought it controlled whether 
individual term=counts were being cached -- which reminds me: we need ref-guide 
updates in the PR : )

bq. ...As far as the temporarily tabled concerns about concurrent mutation...

Those concerns were largely related to my mistaken impression that different 
requests w/different {{countCacheDf}} params were causing the original segment 
level cache values to be mutated in place (w/o doing a new "insert" back into 
the cache) because that's what i convinced myself was happening to explain the 
cache stats i was seeing and my vague (missguided) assumptions about how/why 
{{CacheState.PARTIALLY_CACHED}} existed from skimming the code.

Your point about doing a defensive copy of the segment level counts & atomic 
re-insert of the top level entry after updating the counts for the new segments 
makes perfect sense.

> Caching for term facet counts
> -
>
> Key: SOLR-13807
> URL: https://issues.apache.org/jira/browse/SOLR-13807
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
> Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is high latency 
> for faceting over an unrestricted domain (e.g., {{\*:\*}}), as might be 
> observed on a top-level landing page that exposes facets. This type of 
> "static" case is often mitigated by external (to Solr) caching, either with a 
> caching layer between Solr and a front-end application, or within a front-end 
> application, or even with a caching layer between the end user and a 
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the 
> stack (or, for a new user, even being aware of this as a potential issue to 
> mitigate), any external caching mitigation is really only appropriate for 
> relatively static cases like the "landing page" example described above. A 
> Solr-internal facet count cache (analogous to the {{filterCache}}) would 
> provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance 
> concern
>  # compact (specifically caching count arrays, without the extra baggage that 
> accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in conjunction with 
> variant requests over the same result domain (this would support common use 
> cases like paging, but also potentially more interesting direct uses of 
> facets). 
>  # could be used for distributed refinement (i.e., if facet counts over a 
> given domain are cached, a refinement request could simply look up the 
> ordinal value for each enumerated term and directly grab the count out of the 
> count array that was cached during the first phase of facet calculation)
>  # composable (e.g., in aggregate functions that calculate values based on 
> facet counts across different domains, like SKG/relatedness – see SOLR-13132)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For a

[jira] [Commented] (LUCENE-9016) Document how to update web site

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052352#comment-17052352
 ] 

ASF subversion and git services commented on LUCENE-9016:
-

Commit ceb90ce0e8e8996a524c314397b7a8e38f4a4796 in lucene-solr's branch 
refs/heads/master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ceb90ce ]

LUCENE-9016: RefGuide meta doc for how to publish website (#1291)



> Document how to update web site
> ---
>
> Key: LUCENE-9016
> URL: https://issues.apache.org/jira/browse/LUCENE-9016
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Find all documentation across Wiki, RefGuide, scripts and website itself that 
> talks about how to update or publish the web site, and update accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #1291: LUCENE-9016: RefGuide meta doc for how to publish website

2020-03-05 Thread GitBox

janhoy merged pull request #1291: LUCENE-9016: RefGuide meta doc for how to 
publish website
URL: https://github.com/apache/lucene-solr/pull/1291
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats.

2020-03-05 Thread GitBox

jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization 
that reuses existing formats.
URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-594242054
 
 
   **Benchmarks**
   
   sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.425
   LuceneCluster(n_probes=5) 0.749  574.186
   LuceneCluster(n_probes=10)0.874  308.455
   LuceneCluster(n_probes=20)0.951  116.871
   LuceneCluster(n_probes=50)0.993   67.354
   LuceneCluster(n_probes=100)   0.999   34.651
   ```
   
   glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.722
   LuceneCluster(n_probes=5) 0.680  618.438
   LuceneCluster(n_probes=10)0.766  335.956
   LuceneCluster(n_probes=20)0.835  173.782
   LuceneCluster(n_probes=50)0.905   72.747
   LuceneCluster(n_probes=100)   0.948   37.339
   ```
   
   These benchmarks were performed using the [ann-benchmarks 
repo](https://github.com/erikbern/ann-benchmarks). I hooked up the prototype to 
the benchmarking framework using py4j 
(e10d34c73dc391e4a105253f6181dfc0e9cb6705). Unfortunately py4j adds quite a bit 
of overhead (~3ms per search), so I had to measure that overhead and subtract 
it from the results. This is really not ideal, I will work on more robust 
benchmarks.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats.

2020-03-05 Thread GitBox

jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization 
that reuses existing formats.
URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-594242054
 
 
   **Benchmarks**
   
   sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.425
   LuceneCluster(n_probes=2) 0.536 1138.926
   LuceneCluster(n_probes=5) 0.749  574.186
   LuceneCluster(n_probes=10)0.874  308.455
   LuceneCluster(n_probes=20)0.951  116.871
   LuceneCluster(n_probes=50)0.993   67.354
   LuceneCluster(n_probes=100)   0.999   34.651
   ```
   
   glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.722
   LuceneCluster(n_probes=5) 0.680  618.438
   LuceneCluster(n_probes=10)0.766  335.956
   LuceneCluster(n_probes=20)0.835  173.782
   LuceneCluster(n_probes=50)0.905   72.747
   LuceneCluster(n_probes=100)   0.948   37.339
   ```
   
   These benchmarks were performed using the [ann-benchmarks 
repo](https://github.com/erikbern/ann-benchmarks). I hooked up the prototype to 
the benchmarking framework using py4j 
(e10d34c73dc391e4a105253f6181dfc0e9cb6705). Unfortunately py4j adds quite a bit 
of overhead (~3ms per search), so I had to measure that overhead and subtract 
it from the results. This is really not ideal, I will work on more robust 
benchmarks.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Closed] (LUCENE-9016) Document how to update web site

2020-03-05 Thread Jira



 [ 
https://issues.apache.org/jira/browse/LUCENE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed LUCENE-9016.
---

> Document how to update web site
> ---
>
> Key: LUCENE-9016
> URL: https://issues.apache.org/jira/browse/LUCENE-9016
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Find all documentation across Wiki, RefGuide, scripts and website itself that 
> talks about how to update or publish the web site, and update accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9016) Document how to update web site

2020-03-05 Thread Jira



 [ 
https://issues.apache.org/jira/browse/LUCENE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved LUCENE-9016.
-
Resolution: Fixed

> Document how to update web site
> ---
>
> Key: LUCENE-9016
> URL: https://issues.apache.org/jira/browse/LUCENE-9016
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Find all documentation across Wiki, RefGuide, scripts and website itself that 
> talks about how to update or publish the web site, and update accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9016) Document how to update web site

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052354#comment-17052354
 ] 

ASF subversion and git services commented on LUCENE-9016:
-

Commit ebe35df13a12ad912d7edc03020e6273371c1acf in lucene-solr's branch 
refs/heads/branch_8x from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ebe35df ]

LUCENE-9016: RefGuide meta doc for how to publish website (#1291)

(cherry picked from commit ceb90ce0e8e8996a524c314397b7a8e38f4a4796)


> Document how to update web site
> ---
>
> Key: LUCENE-9016
> URL: https://issues.apache.org/jira/browse/LUCENE-9016
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Find all documentation across Wiki, RefGuide, scripts and website itself that 
> talks about how to update or publish the web site, and update accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats.

2020-03-05 Thread GitBox

jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization 
that reuses existing formats.
URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-594242054
 
 
   **Benchmarks**
   In these benchmarks, we find the nearest k=10 vectors and record the recall 
and queries per second.
   
   sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.425
   LuceneCluster(n_probes=5) 0.749  574.186
   LuceneCluster(n_probes=10)0.874  308.455
   LuceneCluster(n_probes=20)0.951  116.871
   LuceneCluster(n_probes=50)0.993   67.354
   LuceneCluster(n_probes=100)   0.999   34.651
   ```
   
   glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.722
   LuceneCluster(n_probes=5) 0.680  618.438
   LuceneCluster(n_probes=10)0.766  335.956
   LuceneCluster(n_probes=20)0.835  173.782
   LuceneCluster(n_probes=50)0.905   72.747
   LuceneCluster(n_probes=100)   0.948   37.339
   ```
   
   These benchmarks were performed using the [ann-benchmarks 
repo](https://github.com/erikbern/ann-benchmarks). I hooked up the prototype to 
the benchmarking framework using py4j 
(e10d34c73dc391e4a105253f6181dfc0e9cb6705). Unfortunately py4j adds quite a bit 
of overhead (~3ms per search), so I had to measure that overhead and subtract 
it from the results. This is really not ideal, I will work on a more robust 
benchmarking set-up.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats.

2020-03-05 Thread GitBox

jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization 
that reuses existing formats.
URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-594242054
 
 
   **Benchmarks**
   In these benchmarks, we find the nearest k=10 vectors and record the recall 
and queries per second.
   
   sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.425
   LuceneCluster(n_probes=5) 0.749  574.186
   LuceneCluster(n_probes=10)0.874  308.455
   LuceneCluster(n_probes=20)0.951  116.871
   LuceneCluster(n_probes=50)0.993   67.354
   LuceneCluster(n_probes=100)   0.999   34.651
   ```
   
   glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.722
   LuceneCluster(n_probes=5) 0.680  618.438
   LuceneCluster(n_probes=10)0.766  335.956
   LuceneCluster(n_probes=20)0.835  173.782
   LuceneCluster(n_probes=50)0.905   72.747
   LuceneCluster(n_probes=100)   0.948   37.339
   ```
   
   These benchmarks were performed using the [ann-benchmarks 
repo](https://github.com/erikbern/ann-benchmarks). I hooked up the prototype to 
the benchmarking framework using py4j 
(e10d34c73dc391e4a105253f6181dfc0e9cb6705). Unfortunately py4j adds quite a bit 
of overhead (~3ms per search), so I had to measure that overhead and subtract 
it from the results. This is really not ideal, I will work on more robust 
benchmarks.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-05 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052383#comment-17052383
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14306:
--

Thanks Jan. I think with the right interfaces, we should be able to replace the 
underlying implementation we use for coordination (either one of those you 
suggested or maybe others we haven't thought of). While make them pluggable is 
out of the scope of this particular SIP, I think it's a step on that direction. 
If we decide to not make it pluggable, and never to change ZooKeeper, this is 
still important to improve testing IMO.

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14308) Multi-threaded facet.query

2020-03-05 Thread Gregory Koldirkaev (Jira)

Gregory Koldirkaev created SOLR-14308:
-

 Summary: Multi-threaded facet.query
 Key: SOLR-14308
 URL: https://issues.apache.org/jira/browse/SOLR-14308
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: faceting, search
Reporter: Gregory Koldirkaev


Add multi-threading support for facet.query. Facet.threads can be used for this 
purpose just like for facet.field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on issue #1319: LUCENE-9164: process all events before closing gracefully

2020-03-05 Thread GitBox

mikemccand commented on issue #1319: LUCENE-9164: process all events before 
closing gracefully
URL: https://github.com/apache/lucene-solr/pull/1319#issuecomment-595367820
 
 
   I'll try to review this one soon!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory

2020-03-05 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052390#comment-17052390
 ] 

Michael McCandless commented on LUCENE-9264:


+1

> Remove SimpleFSDirectory in favor of NIOFsDirectory
> ---
>
> Key: LUCENE-9264
> URL: https://issues.apache.org/jira/browse/LUCENE-9264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yannick Welsch
>Priority: Minor
>
> {{SimpleFSDirectory}} looks to duplicate what's already offered by 
> {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is 
> using non-positional reads on the {{FileChannel}} (i.e., reads that are 
> stateful, changing the current position), and {{SimpleFSDirectory}} therefore 
> has to externally synchronize access to the read method.
> On Windows, positional reads are not supported, which is why {{FileChannel}} 
> is already internally using synchronization to guarantee only access by one 
> thread at a time for positional reads (see {{read(ByteBuffer dst, long 
> position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, 
> which returns true on Windows) and the JDK implementation for Windows is 
> emulating positional reads by using non-positional ones, see 
> [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139].
> This means that on Windows, there should be no difference between 
> {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it 
> should be equally poor as both implementations only allow one thread at a 
> time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to 
> {{SimpleFSDirectory}}, however, as positional reads (pread) can be done 
> concurrently.
> My proposal is to remove {{SimpleFSDirectory}} and replace its uses with 
> {{NIOFsDirectory}}, given how similar these two directory implementations are 
> ({{SimpleFSDirectory}} isn't really simpler).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory

2020-03-05 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052409#comment-17052409
 ] 

Adrien Grand commented on LUCENE-9264:
--

+1 [~ywelsch] would you like to open a pull request?

> Remove SimpleFSDirectory in favor of NIOFsDirectory
> ---
>
> Key: LUCENE-9264
> URL: https://issues.apache.org/jira/browse/LUCENE-9264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yannick Welsch
>Priority: Minor
>
> {{SimpleFSDirectory}} looks to duplicate what's already offered by 
> {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is 
> using non-positional reads on the {{FileChannel}} (i.e., reads that are 
> stateful, changing the current position), and {{SimpleFSDirectory}} therefore 
> has to externally synchronize access to the read method.
> On Windows, positional reads are not supported, which is why {{FileChannel}} 
> is already internally using synchronization to guarantee only access by one 
> thread at a time for positional reads (see {{read(ByteBuffer dst, long 
> position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, 
> which returns true on Windows) and the JDK implementation for Windows is 
> emulating positional reads by using non-positional ones, see 
> [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139].
> This means that on Windows, there should be no difference between 
> {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it 
> should be equally poor as both implementations only allow one thread at a 
> time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to 
> {{SimpleFSDirectory}}, however, as positional reads (pread) can be done 
> concurrently.
> My proposal is to remove {{SimpleFSDirectory}} and replace its uses with 
> {{NIOFsDirectory}}, given how similar these two directory implementations are 
> ({{SimpleFSDirectory}} isn't really simpler).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on issue #1320: LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes.

2020-03-05 Thread GitBox

jpountz commented on issue #1320: LUCENE-9257: Always keep FST off-heap. Remove 
FSTLoadMode and Reader attributes.
URL: https://github.com/apache/lucene-solr/pull/1320#issuecomment-595382926
 
 
   In the spirit of @dsmiley 's recent email, let's add a CHANGES entry?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-05 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052420#comment-17052420
 ] 

Mike Drob commented on SOLR-14306:
--

I wonder if the double change of coordination module + curator migration is 
going to cause us to miss something due to too many moving parts, or make it 
harder to review and understand the changes and prevent regressions. I also am 
concerned that if we do both changes at the same time we end up with a bad 
abstraction that looks ok but is actually very Curator specific. Why do you 
believe that these issues are difficult to address separately?

I really like the idea of having higher level abstractions in place - are the 
overseer and shard leader election code paths using common tools right now, or 
is each implemented separately. I haven't been in that part of Solr recently, 
so I don't know what the current state looks like.

I know that [~marcussorealheis] has looked at efforts to swap zookeeper for 
etcd in the past, so he probably has thoughts here too.

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14274) Multiple CoreContainers will register the same JVM Metrics

2020-03-05 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052427#comment-17052427
 ] 

Mike Drob commented on SOLR-14274:
--

[~ab] - gentle ping on this, would be interested to know what you think of this 
PR.

> Multiple CoreContainers will register the same JVM Metrics
> --
>
> Key: SOLR-14274
> URL: https://issues.apache.org/jira/browse/SOLR-14274
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running multiple CoreContainer in the same JVM, either because we called 
> {{SolrCloudTestCase.configureCluster(int n)}} with {{n > 1}} or because we 
> have multiple tests running in the same JVM in succession, we will have 
> contention on the shared JVM {{metricsRegistry}} as they each replace the 
> existing metrics with their own. Further, with multiple nodes at the same 
> time, some of these metrics will be incorrect anyway, since they will only 
> reflect a single core container. Others will be fine since I think they are 
> reading system-level information so it doesn't matter where it comes from.
> I think this is a test-only issue, since the circumstances where somebody is 
> running multiple core containers in a single JVM in production should be 
> rare, but maybe there are edge cases affected with EmbeddedSolrServer and 
> MapReduce or Spark, or other unusual deployment patterns.
> Removing the metrics registration entirely can speed up 
> {{configureCluster(100).build()}} on my machine from 2 minutes to 30 seconds, 
> so I'm optimistic that there can be gains here without sacrificing the 
> feature entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-05 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r388498188
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/search/SliceExecutionControlPlane.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Future;
+import java.util.concurrent.FutureTask;
+import java.util.concurrent.RejectedExecutionException;
+
+/**
+ * Execution control plane which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+class SliceExecutionControlPlane {
 
 Review comment:
   nit: I'd prefer a simpler name, e.g. `SliceExecutor`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-05 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r388497511
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
 ##
 @@ -662,34 +676,19 @@ public TopFieldDocs reduce(Collection 
collectors) throws IOEx
   }
   query = rewrite(query);
   final Weight weight = createWeight(query, scoreMode, 1);
-  final List> topDocsFutures = new 
ArrayList<>(leafSlices.length);
-  for (int i = 0; i < leafSlices.length - 1; ++i) {
+  final List listTasks = new ArrayList<>();
 
 Review comment:
   Let's avoid introducing warnings about generics, FutureTask needs to be 
parameterized?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-05 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r388493819
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
 ##
 @@ -211,6 +213,18 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
 assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel 
for reader" + context.reader();
 reader = context.reader();
 this.executor = executor;
+this.sliceExecutionControlPlane = executor == null ? null : 
getSliceExecutionControlPlane(executor);
+this.readerContext = context;
+leafContexts = context.leaves();
+this.leafSlices = executor == null ? null : slices(leafContexts);
+  }
+
+  // Package private for testing
+  IndexSearcher(IndexReaderContext context, Executor executor, 
SliceExecutionControlPlane sliceExecutionControlPlane) {
+assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel 
for reader" + context.reader();
+reader = context.reader();
+this.executor = executor;
+this.sliceExecutionControlPlane = executor == null ? null : 
sliceExecutionControlPlane;
 
 Review comment:
   it feels wrong to not take the one from the constructor?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-05 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r388497042
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
 ##
 @@ -211,6 +215,18 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
 assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel 
for reader" + context.reader();
 reader = context.reader();
 this.executor = executor;
+this.sliceExecutionControlPlane = executor == null ? null : 
getSliceExecutionControlPlane(executor);
+this.readerContext = context;
+leafContexts = context.leaves();
+this.leafSlices = executor == null ? null : slices(leafContexts);
+  }
+
+  // Package private for testing
+  IndexSearcher(IndexReaderContext context, Executor executor, 
SliceExecutionControlPlane sliceExecutionControlPlane) {
 
 Review comment:
   Is there anything we need to do with the executor that we couldn't do with 
the sliceExecutionControlPlane?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-05 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052458#comment-17052458
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14306:
--

Yes, I don't like merging the two, but I felt moving the Solr part alone could 
have been more difficult while at the same time, maybe not the best long term 
if we are talking about moving to Curator eventually.
bq. I also am concerned that if we do both changes at the same time we end up 
with a bad abstraction that looks ok but is actually very Curator specific
That's a very good point. I did a POC and it's easy to fall into this. It may 
or may not be a problem, if we like the interfaces to be curator-oriended, we'd 
have to make whatever replacement we have look like it later.
bq. are the overseer and shard leader election code paths using common tools 
right now
They are in part, yes. One thing I noticed also while looking at Curator is 
that those two could actually fall into different "recipes". Overseer is 
essentially "do some work while you are the leader. Stop doing it when you are 
no longer the leader" (LeaderElector in Curator), while shard leader is "act 
differently while you are the leader" (Leader latch in Curator). Of course they 
can both use the same implementation if we want (i.e. we can keep asking in the 
Overseer "amILeader" and then have listeners to interrupt), but I like that 
differentiation that Curator makes.

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] andyvuong commented on issue #1293: SOLR-14044: Delete collection bug fix by changing sharedShardName to use the same blob delimiter

2020-03-05 Thread GitBox

andyvuong commented on issue #1293: SOLR-14044: Delete collection bug fix by 
changing sharedShardName to use the same blob delimiter
URL: https://github.com/apache/lucene-solr/pull/1293#issuecomment-595431605
 
 
   cc @yonik can you merge. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on issue #1313: LUCENE-8962: Split test case

2020-03-05 Thread GitBox

msokolov commented on issue #1313: LUCENE-8962: Split test case
URL: https://github.com/apache/lucene-solr/pull/1313#issuecomment-595436417
 
 
   I verified this fixes the `TestIndexWriterExceptions2.testBasics` reported 
by @jpountz and also beasted that test 1000x just in case. I think we need to 
get ahead of this given all the fail emails from these tests, and the upcoming 
8.5 release, so I'll push today


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #1313: LUCENE-8962: Split test case

2020-03-05 Thread GitBox

msokolov merged pull request #1313: LUCENE-8962: Split test case
URL: https://github.com/apache/lucene-solr/pull/1313
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052508#comment-17052508
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit a030207a5e547a70db01d72fe4bd1627814ea94c in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a030207 ]

LUCENE-8962: Split test case (#1313)

* LUCENE-8962: Simplify test case

The testMergeOnCommit test case was trying to verify too many things
at once: basic semantics of merge on commit and proper behavior when
a bunch of indexing threads are writing and committing all at once.

Now we just verify basic behavior, with strict assertions on invariants, while 
leaving it to MockRandomMergePolicy to enable merge on commit in existing
 test cases to verify that indexing generally works as expected and no new
unexpected exceptions are thrown.

* LUCENE-8962: Only update toCommit if merge was committed

The code was previously assuming that if mergeFinished() was called and
isAborted() was false, then the merge must have completed successfully.
Instead, we should know for sure if a given merge was committed, and
only then update our pending commit SegmentInfos.


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052510#comment-17052510
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit a030207a5e547a70db01d72fe4bd1627814ea94c in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a030207 ]

LUCENE-8962: Split test case (#1313)

* LUCENE-8962: Simplify test case

The testMergeOnCommit test case was trying to verify too many things
at once: basic semantics of merge on commit and proper behavior when
a bunch of indexing threads are writing and committing all at once.

Now we just verify basic behavior, with strict assertions on invariants, while 
leaving it to MockRandomMergePolicy to enable merge on commit in existing
 test cases to verify that indexing generally works as expected and no new
unexpected exceptions are thrown.

* LUCENE-8962: Only update toCommit if merge was committed

The code was previously assuming that if mergeFinished() was called and
isAborted() was false, then the merge must have completed successfully.
Instead, we should know for sure if a given merge was committed, and
only then update our pending commit SegmentInfos.


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052512#comment-17052512
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit a030207a5e547a70db01d72fe4bd1627814ea94c in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a030207 ]

LUCENE-8962: Split test case (#1313)

* LUCENE-8962: Simplify test case

The testMergeOnCommit test case was trying to verify too many things
at once: basic semantics of merge on commit and proper behavior when
a bunch of indexing threads are writing and committing all at once.

Now we just verify basic behavior, with strict assertions on invariants, while 
leaving it to MockRandomMergePolicy to enable merge on commit in existing
 test cases to verify that indexing generally works as expected and no new
unexpected exceptions are thrown.

* LUCENE-8962: Only update toCommit if merge was committed

The code was previously assuming that if mergeFinished() was called and
isAborted() was false, then the merge must have completed successfully.
Instead, we should know for sure if a given merge was committed, and
only then update our pending commit SegmentInfos.


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-05 Thread Marcus Eagan (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052518#comment-17052518
 ] 

Marcus Eagan commented on SOLR-14306:
-

I have been thinking about this approach and following the Kafka discussion 
that Jan posted. It seems that refactoring coordination code into a separate 
module is a great first step for whichever direction we go in the future. 

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-05 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052526#comment-17052526
 ] 

Jan Høydahl commented on SOLR-14306:


{quote}It seems that refactoring coordination code into a separate module is a 
great first step for whichever direction we go in the future.
{quote}
+1.

The single biggest obstacle I sense when helping customers with SolrCloud is 
Zookeeper. How do we install it, how many, nodes, how to secure it, can ZK run 
on same nodes as Solr, can we use embedded ZK in our test environment etc. And 
I think ZK will be an even bigger topic when more people start deploying in 
k8s. So if we manage to isolate coordination and cluster state on a higher 
level, then offering etcd or ratis plugins in the future will be within reach.

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats.

2020-03-05 Thread GitBox

jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization 
that reuses existing formats.
URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-594242054
 
 
   **Benchmarks**
   In these benchmarks, we find the nearest k=10 vectors and record the recall 
and queries per second. For the number of centroids, we use the heuristic num 
centroids = sqrt(dataset size).
   
   sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.425
   LuceneCluster(n_probes=5) 0.749  574.186
   LuceneCluster(n_probes=10)0.874  308.455
   LuceneCluster(n_probes=20)0.951  116.871
   LuceneCluster(n_probes=50)0.993   67.354
   LuceneCluster(n_probes=100)   0.999   34.651
   ```
   
   glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims.
   ```
   APPROACH  RECALL QPS
   LuceneExact() 1.0006.722
   LuceneCluster(n_probes=5) 0.680  618.438
   LuceneCluster(n_probes=10)0.766  335.956
   LuceneCluster(n_probes=20)0.835  173.782
   LuceneCluster(n_probes=50)0.905   72.747
   LuceneCluster(n_probes=100)   0.948   37.339
   ```
   
   These benchmarks were performed using the [ann-benchmarks 
repo](https://github.com/erikbern/ann-benchmarks). I hooked up the prototype to 
the benchmarking framework using py4j 
(e10d34c73dc391e4a105253f6181dfc0e9cb6705). Unfortunately py4j adds quite a bit 
of overhead (~3ms per search), so I had to measure that overhead and subtract 
it from the results. This is really not ideal, I will work on a more robust 
benchmarking set-up.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11359) An autoscaling/suggestions endpoint to recommend operations

2020-03-05 Thread Megan Carey (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052544#comment-17052544
 ] 

Megan Carey commented on SOLR-11359:


Would it be possible to explicitly return the URL to hit for applying the 
suggestion? i.e. rather than return an HTTP method, operation type, etc. just 
return the constructed URL for executing the action?

Also, are you considering writing a cron to periodically execute these 
suggestions?

> An autoscaling/suggestions endpoint to recommend operations
> ---
>
> Key: SOLR-11359
> URL: https://issues.apache.org/jira/browse/SOLR-11359
> Project: Solr
>  Issue Type: New Feature
>  Components: AutoScaling
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-11359.patch
>
>
> Autoscaling can make suggestions to users on what operations they can perform 
> to improve the health of the cluster
> The suggestions will have the following information
> * http end point
> * http method (POST,DELETE)
> * command payload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14044) Support shard/collection deletion in shared storage

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052545#comment-17052545
 ] 

ASF subversion and git services commented on SOLR-14044:


Commit c8c216514af29d94d3f269d01f57e1c0f2421b69 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c8c2165 ]

SOLR-14044: Delete collection bug fix by changing sharedShardName to use the 
same blob delimiter (#1293)

* Change sharedShardName to use blob delimiter and fix test

* use assign in test


> Support shard/collection deletion in shared storage
> ---
>
> Key: SOLR-14044
> URL: https://issues.apache.org/jira/browse/SOLR-14044
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Andy Vuong
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Solr Cloud deletion APIs for collections and shards are not currently 
> supported by shared storage but are an essential functionality required by 
> the shared storage design. Deletion of objects from shared storage currently 
> only happens in the indexing path (on pushes) and after the index file 
> listings between the local solr process and external store have been resolved.
>  
> This task is to track supporting the delete shard/collection API commands and 
> its scope does not include cleaning up so called “orphaned” index files from 
> blob (i.e. files that are no longer referenced by any core.metadata file on 
> the external store). This will be designed/covered in another subtask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] yonik merged pull request #1293: SOLR-14044: Delete collection bug fix by changing sharedShardName to use the same blob delimiter

2020-03-05 Thread GitBox

yonik merged pull request #1293: SOLR-14044: Delete collection bug fix by 
changing sharedShardName to use the same blob delimiter
URL: https://github.com/apache/lucene-solr/pull/1293
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on issue #1320: LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes.

2020-03-05 Thread GitBox

bruno-roustant commented on issue #1320: LUCENE-9257: Always keep FST off-heap. 
Remove FSTLoadMode and Reader attributes.
URL: https://github.com/apache/lucene-solr/pull/1320#issuecomment-595474374
 
 
   Good point. I added it and classified as 'Other'.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on issue #1313: LUCENE-8962: Split test case

2020-03-05 Thread GitBox

dnhatn commented on issue #1313: LUCENE-8962: Split test case
URL: https://github.com/apache/lucene-solr/pull/1313#issuecomment-595503029
 
 
   @msfroh @msokolov Thank you for working on the fix. Unfortunately, this is 
still an issue. Many Elasticsearch tests are 
[failing](https://github.com/elastic/elasticsearch/issues/53195) even with this 
change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052607#comment-17052607
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit e5be034df2fc22f1b88e4d271b25c8fae1c3093f in lucene-solr's branch 
refs/heads/master from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5be034 ]

LUCENE-8962: woops, remove leftover accidental copyright (darned IDEs)


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052608#comment-17052608
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit 3dbfd102794419551f2ba4b43344cf9e6242a2b8 in lucene-solr's branch 
refs/heads/branch_8x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3dbfd10 ]

LUCENE-8962: woops, remove leftover accidental copyright (darned IDEs)


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14307) "user caches" don't support "enable"

2020-03-05 Thread Chris M. Hostetter (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14307:
--
Attachment: SOLR-14307.patch
Status: Open  (was: Open)

patch with fix and tests

> "user caches" don't support "enable"
> 
>
> Key: SOLR-14307
> URL: https://issues.apache.org/jira/browse/SOLR-14307
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14307.patch
>
>
> while trying to help write some test cases for SOLR-13807 i discovered that 
> the code path used for building the {{List}} of _user_ caches 
> (ie: {{}} doesn't respect the idea of an "enabled" 
> attribute ... that is only checked for in the code path use for building 
> singular CacheConfig options from explicit xpaths (ie: {{ />}} etc...)
> We should fix this, if for no other reason then so it's easy for tests to use 
> system properties to enable/disable all caches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14307) "user caches" don't support "enable"

2020-03-05 Thread Chris M. Hostetter (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14307:
--
Status: Patch Available  (was: Open)

> "user caches" don't support "enable"
> 
>
> Key: SOLR-14307
> URL: https://issues.apache.org/jira/browse/SOLR-14307
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14307.patch
>
>
> while trying to help write some test cases for SOLR-13807 i discovered that 
> the code path used for building the {{List}} of _user_ caches 
> (ie: {{}} doesn't respect the idea of an "enabled" 
> attribute ... that is only checked for in the code path use for building 
> singular CacheConfig options from explicit xpaths (ie: {{ />}} etc...)
> We should fix this, if for no other reason then so it's easy for tests to use 
> system properties to enable/disable all caches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14307) "user caches" don't support "enabled" attribute

2020-03-05 Thread Chris M. Hostetter (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14307:
--
Summary: "user caches" don't support "enabled" attribute  (was: "user 
caches" don't support "enable")

> "user caches" don't support "enabled" attribute
> ---
>
> Key: SOLR-14307
> URL: https://issues.apache.org/jira/browse/SOLR-14307
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14307.patch
>
>
> while trying to help write some test cases for SOLR-13807 i discovered that 
> the code path used for building the {{List}} of _user_ caches 
> (ie: {{}} doesn't respect the idea of an "enabled" 
> attribute ... that is only checked for in the code path use for building 
> singular CacheConfig options from explicit xpaths (ie: {{ />}} etc...)
> We should fix this, if for no other reason then so it's easy for tests to use 
> system properties to enable/disable all caches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-05 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052614#comment-17052614
 ] 

Julie Tibshirani commented on LUCENE-9136:
--

Hello [~tomoko]! My explanation before was way too brief, I'm still getting 
used to the joint JIRA/ GitHub set-up :) I'll give more context on the 
suggested direction.

The draft adds a new format VectorsFormat, which simply delegates to 
DocValuesFormat and PostingsFormat under the hood:
 * The original vectors are stored as BinaryDocValues.
 * The vectors are also clustered through k-means clustering, and the cluster 
information is stored in postings format. In particular, each cluster centroid 
is encoded to a BytesRef to represent a term. Each document belonging to the 
centroid is added to the postings list for that term.

Given a query vector, we first iterate through all the centroid terms to find a 
small number of closest centroids. We then take the disjunction of all those 
postings enums to obtain a DocIdSetIterator of candidate nearest neighbors. To 
produce the score for each candidate, we load its vector from BinaryDocValues 
and compute the distance to the query vector.

I liked that this approach didn't introduce major new data structures and could 
re-use the existing formats. To respond to your point, one difference between 
this approach and HNSW is that it’s able to re-use the formats without 
modifications to their APIs or implementations. In particular, it doesn’t 
require random access for doc values, they are only accessed through forward 
iteration. So to keep the code as simple as possible, I stuck with 
BinaryDocValues and didn’t create a new way to store the vector values. 
However, the PR does introduce a new top-level VectorsFormat as I thought this 
gave nice flexibility while prototyping.

There are two main hacks in the draft that would need addressing:
 * It's fairly fragile to re-use formats explicitly since we write to the same 
files as normal doc values and postings – I think there would be a conflict if 
there were both a vector field and a doc values field with the same name.
 * To write the postings list, we compute the map from centroid to documents in 
memory. We then expose it through a hacky Fields implementation called 
ClusterBackedFields and pass it to the postings writer. It would be better to 
avoid this hack and not to compute cluster information using a map.

Even apart from code-level concerns, I don't think the draft PR would be ready 
to integrate immediately. There are some areas where I think further work is 
needed to determine if coarse quantization (IVFFlat) is the right approach:
 * It would be good to run tests to understand how it scales to larger sets of 
documents, say in the 5M - 100M range. We would probably want to scale the 
number of centroids with the number of documents – a common heuristic is to set 
num centroids = sqrt(dataset size). Looking at the [FAISS 
experiments|https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors],
 it can be helpful to use an even higher number of centroids.
 ** Do we still obtain good recall and QPS for these larger dataset sizes?
 ** Can we still afford to run k-means at index time, given a larger number of 
centroids? With 10,000 centroids for example, each time we index a document 
we’ll be computing the distance between the document and 10,000 other vectors. 
This is a big concern and I think we would need strategies to address it.
 * It’s great that coarse quantization is relatively simple and could be 
implemented with existing data structures. But would we expect a much bigger 
speed-up and better scaling with a graph approach like HNSW? I think this still 
requires more analysis.
 * More thinking is required as to how to handle deleted documents (as 
discussed in LUCENE-9004).

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, 
> image-2020-02-16-15-05-02-451.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in indus

[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-03-05 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052668#comment-17052668
 ] 

Robert Muir commented on LUCENE-9241:
-

[~dweiss] I saw a recent URLclassloader windows leak thread on the jdk list and 
it reminded me of this issue.

I'll remove the use of getResource (*please keep in mind there are many of 
these elsewhere in the codebase if you are actually concerned about this*).

Instead, if the user screws up here in their test, they'll get a 
NullPointerException and they can follow the stack trace. Soon the default NPE 
from the JDK will actually be more helpful than such custom messages like this 
anyway.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-03-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052697#comment-17052697
 ] 

ASF subversion and git services commented on LUCENE-9241:
-

Commit 9cfdf17b2895866877668002d443277a46cd04e8 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9cfdf17 ]

LUCENE-9241: fix tests to pass with -Xmx128m


> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-05 Thread Xin-Chun Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052727#comment-17052727
 ] 

Xin-Chun Zhang commented on LUCENE-9136:


Hi, [~jtibshirani], thanks for you excellent work!

??I was thinking we could actually reuse the existing `PostingsFormat` and 
`DocValuesFormat` implementations.??

Yes, the codes could be simple by reusing these formats. But I agree with 
[~tomoko] that ANN search is a pretty new feature to Lucene, it's better to use 
a dedicated format for maintaining reasons. Moreover, If we are going to use a 
dedicated vector format for HNSW, this could also applied to IVFFlat because 
IVFFlat and HNSW are used for the same purpose of ANN search. It may be strange 
to users if IVFFlat and HNSW perform completely different.

 

??In particular, it doesn’t require random access for doc values, they are only 
accessed through forward iteration.??

Actually, we need random access to the vector values! For a typical search 
engine, we are going to retrieving the best matched documents after obtaining 
the TopK docIDs. Retrieving vectors via these docIDs requires random access to 
the vector values.

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, 
> image-2020-02-16-15-05-02-451.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface, making it hard 
> to be integrated in Java projects or those who are not familier with C/C++  
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local Sensitive Hashing);
>  # Product quantization based algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> where IVFFlat and HNSW are the most popular ones among all the VR algorithms.
> IVFFlat is better for high-precision applications such as face recognition, 
> while HNSW performs better in general scenarios including recommendation and 
> personalized advertisement. *The recall ratio of IVFFlat could be gradually 
> increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
> to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 
> Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
> LUCENE-9004) for Lucene, has made great progress. The issue draws attention 
> of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 
> As an alternative for solving ANN similarity search problems, IVFFlat is also 
> very popular with many users and supporters. Compared with HNSW, IVFFlat has 
> smaller index size but requires k-means clustering, while HNSW is faster in 
> query (no training required) but requires extra storage for saving graphs 
> [indexing 1M 
> vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]].
>  Another advantage is that IVFFlat can be faster and more accurate when 
> enables GPU parallel computing (current not support in Java). Both algorithms 
> have their merits and demerits. Since HNSW is now under development, it may 
> be better to provide both implementations (HNSW && IVFFlat) for potential 
> users who are faced with very different scenarios and want to more choices.
> The latest branch is 
> [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-05 Thread Xin-Chun Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052727#comment-17052727
 ] 

Xin-Chun Zhang edited comment on LUCENE-9136 at 3/6/20, 3:34 AM:
-

Hi, [~jtibshirani], thanks for you excellent work!

??I was thinking we could actually reuse the existing `PostingsFormat` and 
`DocValuesFormat` implementations.??

Yes, the codes could be simple by reusing these formats. But I agree with 
[~tomoko] that ANN search is a pretty new feature to Lucene, it's better to use 
a dedicated format for maintaining reasons. Moreover, If we are going to use a 
dedicated vector format for HNSW, this format should also be applied to IVFFlat 
because IVFFlat and HNSW are used for the same purpose of ANN search. It may be 
strange to users if IVFFlat and HNSW perform completely different.

 

??In particular, it doesn’t require random access for doc values, they are only 
accessed through forward iteration.??

Actually, we need random access to the vector values! For a typical search 
engine, we are going to retrieving the best matched documents after obtaining 
the TopK docIDs. Retrieving vectors via these docIDs requires random access to 
the vector values.


was (Author: irvingzhang):
Hi, [~jtibshirani], thanks for you excellent work!

??I was thinking we could actually reuse the existing `PostingsFormat` and 
`DocValuesFormat` implementations.??

Yes, the codes could be simple by reusing these formats. But I agree with 
[~tomoko] that ANN search is a pretty new feature to Lucene, it's better to use 
a dedicated format for maintaining reasons. Moreover, If we are going to use a 
dedicated vector format for HNSW, this could also applied to IVFFlat because 
IVFFlat and HNSW are used for the same purpose of ANN search. It may be strange 
to users if IVFFlat and HNSW perform completely different.

 

??In particular, it doesn’t require random access for doc values, they are only 
accessed through forward iteration.??

Actually, we need random access to the vector values! For a typical search 
engine, we are going to retrieving the best matched documents after obtaining 
the TopK docIDs. Retrieving vectors via these docIDs requires random access to 
the vector values.

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, 
> image-2020-02-16-15-05-02-451.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface, making it hard 
> to be integrated in Java projects or those who are not familier with C/C++  
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local Sensitive Hashing);
>  # Product quantization based algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> where IVFFlat and HNSW are the most popular ones among all the VR algorithms.
> IVFFlat is better for high-precision applications such as face recognition, 
> while HNSW performs better in general scenarios including recommendation and 
> personalized advertisement. *The recall ratio of IVFFlat could be gradually 
> increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
> to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 
> Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
> LUCENE-9004) for Lucene, has made great progress. The issue draws attention 
> of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 
> As an alternative for solving ANN similarity search pro

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388704872
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3147,6 +3149,42 @@ public final boolean flushNextBuffer() throws 
IOException {
 }
   }
 
+  private MergePolicy.OneMerge 
updateSegmentInfosOnMergeFinish(MergePolicy.OneMerge merge, final SegmentInfos 
toCommit,
+
AtomicReference mergeLatchRef) {
+return new MergePolicy.OneMerge(merge.segments) {
+  public void mergeFinished() throws IOException {
+super.mergeFinished();
+CountDownLatch mergeAwaitLatch = mergeLatchRef.get();
+if (mergeAwaitLatch == null) {
+  // Commit thread timed out waiting for this merge and moved on. No 
need to manipulate toCommit.
+  return;
+}
+if (isAborted() == false) {
+  deleter.incRef(this.info.files());
+  // Resolve "live" SegmentInfos segments to their toCommit cloned 
equivalents, based on segment name.
+  Set mergedSegmentNames = new HashSet<>();
+  for (SegmentCommitInfo sci : this.segments) {
+deleter.decRef(sci.files());
+mergedSegmentNames.add(sci.info.name);
+  }
+  List toCommitMergedAwaySegments = new 
ArrayList<>();
+  for (SegmentCommitInfo sci : toCommit) {
+if (mergedSegmentNames.contains(sci.info.name)) {
+  toCommitMergedAwaySegments.add(sci);
+}
+  }
+  // Construct a OneMerge that applies to toCommit
+  MergePolicy.OneMerge applicableMerge = new 
MergePolicy.OneMerge(toCommitMergedAwaySegments);
+  applicableMerge.info = this.info.clone();
+  long segmentCounter = 
Long.parseLong(this.info.info.name.substring(1), Character.MAX_RADIX);
+  toCommit.counter = Math.max(toCommit.counter, segmentCounter + 1);
+  toCommit.applyMergeChanges(applicableMerge, false);
 
 Review comment:
   We should modify `toCommit` under `IndexWrite.this` lock (or a private 
synchronization between this method and `commitInternal`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388705514
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3147,6 +3149,42 @@ public final boolean flushNextBuffer() throws 
IOException {
 }
   }
 
+  private MergePolicy.OneMerge 
updateSegmentInfosOnMergeFinish(MergePolicy.OneMerge merge, final SegmentInfos 
toCommit,
+
AtomicReference mergeLatchRef) {
+return new MergePolicy.OneMerge(merge.segments) {
+  public void mergeFinished() throws IOException {
+super.mergeFinished();
+CountDownLatch mergeAwaitLatch = mergeLatchRef.get();
+if (mergeAwaitLatch == null) {
+  // Commit thread timed out waiting for this merge and moved on. No 
need to manipulate toCommit.
 
 Review comment:
   We need a stronger synchronization to make sure that we won't modify 
`toCommit` if `commitInternal` has stopped waiting for these merges. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388705156
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3252,6 +3315,53 @@ private long prepareCommitInternal() throws IOException 
{
   } finally {
 maybeCloseOnTragicEvent();
   }
+
+  if (mergeAwaitLatchRef != null) {
+CountDownLatch mergeAwaitLatch = mergeAwaitLatchRef.get();
+// If we found and registered any merges above, within the flushLock, 
then we want to ensure that they
+// complete execution. Note that since we released the lock, other 
merges may have been scheduled. We will
+// block until  the merges that we registered complete. As they 
complete, they will update toCommit to
+// replace merged segments with the result of each merge.
+config.getIndexWriterEvents().beginMergeOnCommit();
+mergeScheduler.merge(this, MergeTrigger.COMMIT, true);
+long mergeWaitStart = System.nanoTime();
+int abandonedCount = 0;
+long waitTimeMillis = (long) (config.getMaxCommitMergeWaitSeconds() * 
1000.0);
+try {
+  if (mergeAwaitLatch.await(waitTimeMillis, TimeUnit.MILLISECONDS) == 
false) {
+synchronized (this) {
+  // Need to do this in a synchronized block, to make sure none of 
our commit merges are currently
+  // executing mergeFinished (since mergeFinished itself is called 
from within the IndexWriter lock).
+  // After we clear the value from mergeAwaitLatchRef, the merges 
we schedule will still execute as
+  // usual, but when they finish, they won't attempt to update 
toCommit or modify segment reference
+  // counts.
+  mergeAwaitLatchRef.set(null);
 
 Review comment:
   I think we should set `mergeAwaitLatchRef` in the `else` branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14307) "user caches" don't support "enabled" attribute

2020-03-05 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052757#comment-17052757
 ] 

Lucene/Solr QA commented on SOLR-14307:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 
40s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14307 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12995795/SOLR-14307.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 9cfdf17 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/698/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/698/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> "user caches" don't support "enabled" attribute
> ---
>
> Key: SOLR-14307
> URL: https://issues.apache.org/jira/browse/SOLR-14307
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14307.patch
>
>
> while trying to help write some test cases for SOLR-13807 i discovered that 
> the code path used for building the {{List}} of _user_ caches 
> (ie: {{}} doesn't respect the idea of an "enabled" 
> attribute ... that is only checked for in the code path use for building 
> singular CacheConfig options from explicit xpaths (ie: {{ />}} etc...)
> We should fix this, if for no other reason then so it's easy for tests to use 
> system properties to enable/disable all caches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on issue #1313: LUCENE-8962: Split test case

2020-03-05 Thread GitBox

dnhatn commented on issue #1313: LUCENE-8962: Split test case
URL: https://github.com/apache/lucene-solr/pull/1313#issuecomment-595597021
 
 
   I've left some comments in https://github.com/apache/lucene-solr/pull/1155.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on issue #1319: LUCENE-9164: process all events before closing gracefully

2020-03-05 Thread GitBox

dnhatn commented on issue #1319: LUCENE-9164: process all events before closing 
gracefully
URL: https://github.com/apache/lucene-solr/pull/1319#issuecomment-595598568
 
 
   Thanks, Simon. I will take a look at this tomorrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud

2020-03-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053036#comment-17053036
 ] 

David Smiley commented on SOLR-14040:
-

> But it affected a small subset of users. Now, we have implemented it for 
> cloud, it can potentially affect a vast majority of users (if they use it ) .

Because it's opt-in and it's has been a secret feature still... sorry I just 
don't see the severity that you see.  Any way, how exactly would you propose 
dealing with this in the immediate term -- for 8.5?  I don't think you mean to 
revert the change in this commit because the feature remains for standalone -- 
and hence I think we're having the discussion on the wrong issue; should be the 
linked SOLR-14232.

> solr.xml shareSchema does not work in SolrCloud
> ---
>
> Key: SOLR-14040
> URL: https://issues.apache.org/jira/browse/SOLR-14040
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Blocker
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> solr.xml has a shareSchema boolean option that can be toggled from the 
> default of false to true in order to share IndexSchema objects within the 
> Solr node.  This is silently ignored in SolrCloud mode.  The pertinent code 
> is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which 
> creates a CloudConfigSetService that is not related to the SchemaCaching 
> class.  This may not be a big deal in SolrCloud which tends not to deal well 
> with many cores per node but I'm working on changing that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14232) Add shareSchema leak protections

2020-03-05 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053037#comment-17053037
 ] 

Noble Paul commented on SOLR-14232:
---

let's say we have classes shared between your solrconfig and schema

core 1  is created with SRL1 . solrconfig uses SRL1, schema uses SRL1 (all good)
core2 is created with SRL2 . solr uses SRL2, schmea uses SRL1 . If 
schema/solrconfig shares an object of say ClassX this can lead to 
ClassCastException

it's avoidable if schema & solrconfig has no shared classes. Or even if you 
share they don't get passed around. If you use it internally in your org, it 
can be avoided if you are careful. We cannot have a public feature that can 
lead to such a bug. 

> Add shareSchema leak protections
> 
>
> Key: SOLR-14232
> URL: https://issues.apache.org/jira/browse/SOLR-14232
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: David Smiley
>Priority: Major
>
> The shareSchema option in solr.xml allows cores to share a common 
> IndexSchema, assuming the underlying schema is literally the same (from the 
> same configSet). However this sharing has no protections to prevent an 
> IndexSchema from accidentally referencing the SolrCore and its settings. The 
> effect might be nondeterministic behavior depending on which core loaded the 
> schema first, or the effect might be a memory leak preventing a closed 
> SolrCore from GC'ing, or maybe an error. Example:
>  * IndexSchema could theoretically do property expansion using the core's 
> props, such as solr.core.name, silly as that may be.
>  * IndexSchema uses the same SolrResourceLoader for the core, which in turn 
> tracks infoMBeans and other things that can refer to the core. It should 
> probably have it's own SolrResourceLoader but it's not trivial; there are 
> complications with life-cycle of ResourceLoaderAware tracking etc.
>  * If anything in IndexSchema is SolrCoreAware, this isn't going to work!
>  ** SchemaSimilarityFactory is SolrCoreAware, though I think it could be 
> reduced to being SchemaAware and work.
>  ** ExternalFileField is currently SchemaAware it grabs the 
> SolrResourceLoader to call getDataDir which is bad.  FYI In a separate PR I'm 
> removing getDataDir from SRL.
>  ** Should probably fail if anything is detected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud

2020-03-05 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053038#comment-17053038
 ] 

Noble Paul commented on SOLR-14040:
---

Please document this in the ref guide and we can unblock this. our users end up 
using undocumented features

> solr.xml shareSchema does not work in SolrCloud
> ---
>
> Key: SOLR-14040
> URL: https://issues.apache.org/jira/browse/SOLR-14040
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Blocker
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> solr.xml has a shareSchema boolean option that can be toggled from the 
> default of false to true in order to share IndexSchema objects within the 
> Solr node.  This is silently ignored in SolrCloud mode.  The pertinent code 
> is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which 
> creates a CloudConfigSetService that is not related to the SchemaCaching 
> class.  This may not be a big deal in SolrCloud which tends not to deal well 
> with many cores per node but I'm working on changing that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-03-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053039#comment-17053039
 ] 

David Smiley commented on SOLR-13749:
-

[~romseygeek] (8.5 RM)  in this issue I'm proposing we expose the committed 
feature differently, but don't have time to do it and so I'm proposing we 
temporarily un-document it until we expose the feature in a sustainable way as 
opposed to having a back-compat concern.  If need be I'll do this un-document 
commit.

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}{{"solr.No

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388705156
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3252,6 +3315,53 @@ private long prepareCommitInternal() throws IOException 
{
   } finally {
 maybeCloseOnTragicEvent();
   }
+
+  if (mergeAwaitLatchRef != null) {
+CountDownLatch mergeAwaitLatch = mergeAwaitLatchRef.get();
+// If we found and registered any merges above, within the flushLock, 
then we want to ensure that they
+// complete execution. Note that since we released the lock, other 
merges may have been scheduled. We will
+// block until  the merges that we registered complete. As they 
complete, they will update toCommit to
+// replace merged segments with the result of each merge.
+config.getIndexWriterEvents().beginMergeOnCommit();
+mergeScheduler.merge(this, MergeTrigger.COMMIT, true);
+long mergeWaitStart = System.nanoTime();
+int abandonedCount = 0;
+long waitTimeMillis = (long) (config.getMaxCommitMergeWaitSeconds() * 
1000.0);
+try {
+  if (mergeAwaitLatch.await(waitTimeMillis, TimeUnit.MILLISECONDS) == 
false) {
+synchronized (this) {
+  // Need to do this in a synchronized block, to make sure none of 
our commit merges are currently
+  // executing mergeFinished (since mergeFinished itself is called 
from within the IndexWriter lock).
+  // After we clear the value from mergeAwaitLatchRef, the merges 
we schedule will still execute as
+  // usual, but when they finish, they won't attempt to update 
toCommit or modify segment reference
+  // counts.
+  mergeAwaitLatchRef.set(null);
 
 Review comment:
   ~I think we should set `mergeAwaitLatchRef` in the `else` branch.~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388717615
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3252,6 +3315,53 @@ private long prepareCommitInternal() throws IOException 
{
   } finally {
 maybeCloseOnTragicEvent();
   }
+
+  if (mergeAwaitLatchRef != null) {
+CountDownLatch mergeAwaitLatch = mergeAwaitLatchRef.get();
+// If we found and registered any merges above, within the flushLock, 
then we want to ensure that they
+// complete execution. Note that since we released the lock, other 
merges may have been scheduled. We will
+// block until  the merges that we registered complete. As they 
complete, they will update toCommit to
+// replace merged segments with the result of each merge.
+config.getIndexWriterEvents().beginMergeOnCommit();
+mergeScheduler.merge(this, MergeTrigger.COMMIT, true);
+long mergeWaitStart = System.nanoTime();
+int abandonedCount = 0;
+long waitTimeMillis = (long) (config.getMaxCommitMergeWaitSeconds() * 
1000.0);
+try {
+  if (mergeAwaitLatch.await(waitTimeMillis, TimeUnit.MILLISECONDS) == 
false) {
+synchronized (this) {
+  // Need to do this in a synchronized block, to make sure none of 
our commit merges are currently
+  // executing mergeFinished (since mergeFinished itself is called 
from within the IndexWriter lock).
+  // After we clear the value from mergeAwaitLatchRef, the merges 
we schedule will still execute as
+  // usual, but when they finish, they won't attempt to update 
toCommit or modify segment reference
+  // counts.
+  mergeAwaitLatchRef.set(null);
 
 Review comment:
   Sorry I misread this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on issue #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on issue #1155: LUCENE-8962: Add ability to selectively merge 
on commit
URL: https://github.com/apache/lucene-solr/pull/1155#issuecomment-595607002
 
 
   Hmm, I missed the fact that `mergeFinished` is executed under IndexWriter 
lock. I will dig into this again. Please ignore my previous comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388705514
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3147,6 +3149,42 @@ public final boolean flushNextBuffer() throws 
IOException {
 }
   }
 
+  private MergePolicy.OneMerge 
updateSegmentInfosOnMergeFinish(MergePolicy.OneMerge merge, final SegmentInfos 
toCommit,
+
AtomicReference mergeLatchRef) {
+return new MergePolicy.OneMerge(merge.segments) {
+  public void mergeFinished() throws IOException {
+super.mergeFinished();
+CountDownLatch mergeAwaitLatch = mergeLatchRef.get();
+if (mergeAwaitLatch == null) {
+  // Commit thread timed out waiting for this merge and moved on. No 
need to manipulate toCommit.
 
 Review comment:
   ~We need a stronger synchronization to make sure that we won't modify 
`toCommit` if `commitInternal` has stopped waiting for these merges.~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-05 Thread GitBox

dnhatn commented on a change in pull request #1155: LUCENE-8962: Add ability to 
selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388704872
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3147,6 +3149,42 @@ public final boolean flushNextBuffer() throws 
IOException {
 }
   }
 
+  private MergePolicy.OneMerge 
updateSegmentInfosOnMergeFinish(MergePolicy.OneMerge merge, final SegmentInfos 
toCommit,
+
AtomicReference mergeLatchRef) {
+return new MergePolicy.OneMerge(merge.segments) {
+  public void mergeFinished() throws IOException {
+super.mergeFinished();
+CountDownLatch mergeAwaitLatch = mergeLatchRef.get();
+if (mergeAwaitLatch == null) {
+  // Commit thread timed out waiting for this merge and moved on. No 
need to manipulate toCommit.
+  return;
+}
+if (isAborted() == false) {
+  deleter.incRef(this.info.files());
+  // Resolve "live" SegmentInfos segments to their toCommit cloned 
equivalents, based on segment name.
+  Set mergedSegmentNames = new HashSet<>();
+  for (SegmentCommitInfo sci : this.segments) {
+deleter.decRef(sci.files());
+mergedSegmentNames.add(sci.info.name);
+  }
+  List toCommitMergedAwaySegments = new 
ArrayList<>();
+  for (SegmentCommitInfo sci : toCommit) {
+if (mergedSegmentNames.contains(sci.info.name)) {
+  toCommitMergedAwaySegments.add(sci);
+}
+  }
+  // Construct a OneMerge that applies to toCommit
+  MergePolicy.OneMerge applicableMerge = new 
MergePolicy.OneMerge(toCommitMergedAwaySegments);
+  applicableMerge.info = this.info.clone();
+  long segmentCounter = 
Long.parseLong(this.info.info.name.substring(1), Character.MAX_RADIX);
+  toCommit.counter = Math.max(toCommit.counter, segmentCounter + 1);
+  toCommit.applyMergeChanges(applicableMerge, false);
 
 Review comment:
   ~We should modify `toCommit` under `IndexWrite.this` lock (or a private 
synchronization between this method and `commitInternal`).~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data

2020-03-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053047#comment-17053047
 ] 

David Smiley commented on SOLR-13942:
-

FWIW I really like Shalin's input, and his option #3 which I'll copy-paste here:

bq. Deprecate /admin/zookeeper, introduce a clean API, migrate UI to this new 
endpoint or a better alternative and remove /admin/zookeeper in 9.0

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: New Feature
>  Components: v2 API
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> example
> download the {{state.json}} of
> {code}
> GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
> {code}
> get a list of all children under {{/live_nodes}}
> {code}
> GET http://localhost:8983/api/cluster/zk/live_nodes
> {code}
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

1 2 >

1 - 100 of 105 matches

Mail list logo