[jira] [Updated] (SOLR-12490) Introducing json.queries WAS:Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-12490: Attachment: SOLR-12490-ref-guide.patch > Introducing json.queries WAS:Query DSL supports for further referring and > exclusion in JSON facets > --- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Labels: newdev > Fix For: 8.5 > > Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, > SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12490) Introducing json.queries WAS:Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013404#comment-17013404 ] Mikhail Khludnev commented on SOLR-12490: - Attaching Fixed Ref Guide patch [^SOLR-12490-ref-guide.patch]. It also fixes a few broken refs to Json Facet API page. [~ctargett], would you like to review it before I push? > Introducing json.queries WAS:Query DSL supports for further referring and > exclusion in JSON facets > --- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Labels: newdev > Fix For: 8.5 > > Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, > SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle
dweiss commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-573306410 I'll take a look later, Mike. As for applying tasks and anything else -- think of the project structure as a graph. You attach things to this graph in two passes (evaluation, configuration), followed by execution of tasks attached to this graph (in topological order of dependencies). It is conceptually simple. The devil hides in details of how gradle is evaluated, deferred evaluated-collections, etc. This should be helpful: https://docs.gradle.org/current/userguide/build_lifecycle.html I'll review the patch and maybe correct it before committing; when you take a look at the commit vs. your patch you'll see the differences made - I think it'll be easier and faster than explaining (but go ahead and ask if you don't understand something). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-12490: Summary: Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets (was: Introducing json.queries WAS:Query DSL supports for further referring and exclusion in JSON facets ) > Introducing json.queries was:Query DSL supports for further referring and > exclusion in JSON facets > --- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Labels: newdev > Fix For: 8.5 > > Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, > SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief
[ https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013497#comment-17013497 ] David Eric Pugh commented on SOLR-13934: The editorial changes you made look great! Changing up the code should probably be a new JIRA. > Documentation on SimplePostTool for Windows users is pretty brief > - > > Key: SOLR-13934 > URL: https://issues.apache.org/jira/browse/SOLR-13934 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SimplePostTool >Affects Versions: 8.3 >Reporter: David Eric Pugh >Assignee: Jason Gerlowski >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > SimplePostTool on windows doesn't have enough documentation, you end up > googling to get it to work. Need to provide better example. > https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9126) Javadoc linting options silently swallow documentation errors
[ https://issues.apache.org/jira/browse/LUCENE-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013505#comment-17013505 ] Dawid Weiss commented on LUCENE-9126: - Jon filed a bug for us. https://bugs.openjdk.java.net/browse/JDK-8236949 > Javadoc linting options silently swallow documentation errors > - > > Key: LUCENE-9126 > URL: https://issues.apache.org/jira/browse/LUCENE-9126 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > I tried to compile javadocs in gradle and I couldn't do it... The output was > full of errors. > I eventually narrowed the problem down to lint options – how they are > interpreted and parsed just doesn't make any sense to me. Try this: > {code} > # Examples below use plain javadoc from Java 11. > cd lucene/core > {code} > This emulates what we have in Ant (this is roughly the options Ant emits): > {code} > javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages > org -quiet -Xdoclint:all -Xdoclint:-missing -Xdoclint:-accessibility > => no errors. > {code} > Now rerun it with this syntax: > {code} > javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages > org -quiet -Xdoclint:all,-missing,-accessibility > => 100 errors, 5 warnings > {code} > This time javadoc displays errors about undefined tags (unknown tag: > lucene.experimental), HTML warnings (warning: empty tag), etc. > Let's add our custom tags and add overview file: > {code} > javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" > -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output > -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet > -Xdoclint:all,-missing,-accessibility > => 100 errors, 5 warnings > => still HTML warnings > {code} > Let's get rid of html linting: > {code} > javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" > -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output > -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet > -Xdoclint:all,-missing,-accessibility,-html > => 3 errors > => malformed HTML syntax in overview.html: src\java\overview.html:150: error: > bad use of '>' (>) > {code} > Finally, let's get rid of syntax linting: > {code} > javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" > -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output > -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet > -Xdoclint:all,-missing,-accessibility,-html,-syntax > => passes > {code} > There are definitely bugs in our documentation -- look at the extra ">" in > the overview file, for example: > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/overview.html#L150 > What I can't understand is why the first syntax suppresses pretty much ALL > the errors, including missing custom tag definitions. This should work, given > what's written in [1]? > [1] https://docs.oracle.com/en/java/javase/11/tools/javadoc.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9126) Javadoc linting options silently swallow documentation errors
[ https://issues.apache.org/jira/browse/LUCENE-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-9126: --- Assignee: Dawid Weiss > Javadoc linting options silently swallow documentation errors > - > > Key: LUCENE-9126 > URL: https://issues.apache.org/jira/browse/LUCENE-9126 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > I tried to compile javadocs in gradle and I couldn't do it... The output was > full of errors. > I eventually narrowed the problem down to lint options – how they are > interpreted and parsed just doesn't make any sense to me. Try this: > {code} > # Examples below use plain javadoc from Java 11. > cd lucene/core > {code} > This emulates what we have in Ant (this is roughly the options Ant emits): > {code} > javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages > org -quiet -Xdoclint:all -Xdoclint:-missing -Xdoclint:-accessibility > => no errors. > {code} > Now rerun it with this syntax: > {code} > javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages > org -quiet -Xdoclint:all,-missing,-accessibility > => 100 errors, 5 warnings > {code} > This time javadoc displays errors about undefined tags (unknown tag: > lucene.experimental), HTML warnings (warning: empty tag), etc. > Let's add our custom tags and add overview file: > {code} > javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" > -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output > -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet > -Xdoclint:all,-missing,-accessibility > => 100 errors, 5 warnings > => still HTML warnings > {code} > Let's get rid of html linting: > {code} > javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" > -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output > -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet > -Xdoclint:all,-missing,-accessibility,-html > => 3 errors > => malformed HTML syntax in overview.html: src\java\overview.html:150: error: > bad use of '>' (>) > {code} > Finally, let's get rid of syntax linting: > {code} > javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" > -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output > -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet > -Xdoclint:all,-missing,-accessibility,-html,-syntax > => passes > {code} > There are definitely bugs in our documentation -- look at the extra ">" in > the overview file, for example: > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/overview.html#L150 > What I can't understand is why the first syntax suppresses pretty much ALL > the errors, including missing custom tag definitions. This should work, given > what's written in [1]? > [1] https://docs.oracle.com/en/java/javase/11/tools/javadoc.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13486) race condition between leader's "replay on startup" and non-leader's "recover from leader" can leave replicas out of sync (TestTlogReplayVsRecovery)
[ https://issues.apache.org/jira/browse/SOLR-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013597#comment-17013597 ] Chris M. Hostetter commented on SOLR-13486: --- I've been revisiting this aspect of my earlier investigation into this bug... {quote}{color:#de350b}*Why does the leader _need_ to do tlog replay in the test at all?*{color} Even if the client doesn't expilicitly commit all docs, the "Commit on Close" semantics of Solr's IndexWriter should ensure that a clean shutdown of the leader means all uncommitted docs in the tlog will be automaticaly committed before the Directory is closed – nothing in the test "kills" the leader before this should happen. So WTF? I still haven't gotten to the bottom of that, but I did confirm that: * unlike the "normal" adds for docs 1-3, the code path in TestCloudConsistency that was adding doc #4 (during hte network partition) was *NOT* committing doc#4. * in the test logs where TestCloudConsistency failed, we never see the normal "Committing on IndexWriter close." i would expect from an oderly shutdown of the leader ** This message does appear in the expected location of the logs for a TestCloudConsistency run that passes At first I thought the problem was some other test class running earlier in the same jenkins JVM mucking with the value of the (public static) {{DirectUpdateHandler2.commitOnClose}} prior to the test running – but even when running a single test class locally, with {{DirectUpdateHandler2.commitOnClose = true;}} i was able to continue to reproduce the problem in my new test. {quote} I've been trying to get to the bottom of this by modifying {{TestTlogReplayVsRecovery}} to explicitly use {{DirectUpdateHandler2.commitOnClose = true;}} (as mentioned above) along with more detailed logging from org.apache.solr.update (particularly DUH2) The first thing I realized is that there's a bug in the test where it's expecting to find {{uncommittedDocs + uncommittedDocs}} docs, not just {{committedDocs + uncommittedDocs}}, which is why it so easily/quickly failed for me before. With that trivial test bug fixed, I have *NOT* been able to reproduce the situation that was observed in {{TestCloudConsistency}} when this jira was filed: That the leader shutdown (evidently) w/o doing a commitOnClose, necessitating tlog replay on startup, which then happenes after a replica did recovery. The only way I can seem to trigger this situation is when {{DirectUpdateHandler2.commitOnClose = false;}} (ie: simulating an unclean shutdown) suggesting that maybe my original guess about some other test in the same JVM borking this seeing was correct ... but I still haven't been able to find a test that ran in the same JVM which might be broken in that way The only failure type I've been able to trigger is a new one AFAICT: * (partitioned) leader successfully indexes some docs & commits on shutdown * leader re-starts, and sends {{REQUESTRECOVERY}} to replica * leader marks itself as active * test thread detects "all replicas are active" *before* replica has a chance to actually go into recovery * test thread checks replica for docs that only leader has, and fails ...ironically I've only been able to reproduce this using {{TestTlogReplayVsRecovery}} – I've never seen it in {{TestCloudConsistency}} even though it seems like that test establishes the same preconditions? (Successful logs of {{TestCloudConsistency}} never show a {{REQUESTRECOVERY}} command sent to the replicas from the leader, like I see in (both success and failure) logs for {{TestTlogReplayVsRecovery}}, so I'm guessing it has to do with how many docs are out of sync and what type of recovery is done? ... not certain) My next steps are: * Commit a fix for the {{uncommittedDocs + uncommittedDocs}} bug in {{TestTlogReplayVsRecovery}} ** This will also include some TODOs about making the test more robust with more randomized committed & uncommitted docs before/after the network partition *** These TODOs aren't really worth pursuing until the underlying bug is fixed * Open new jiras for: ** Replacing {{DirectUpdateHandler2.commitOnClose}} with something in {{TestInjection}} (per comment there) *** so we can be more confident tests aren't leaving it in a bad state ** Consider setting replica to {{State.RECOVERYING}} synchronously when processing {{REQUESTRECOVERY}} command. *** w/o this, even if we fix the bug tracked in this issue, it's still impossible for tests like {{TestTlogReplayVsRecovery}} – or end users – to set CollectionState watchers to know when a collection is healthy in situations like the one being tracked in this jira. After that, i don't think there's any thing else to do until someone smarter then me can chime in about fixing the underlying race condition of (leader) "tlog replay on startup" vs (replica) "recover from leader". > race
[jira] [Created] (SOLR-14183) replicas do not immediately/synchronously reflect state=RECOVERYING when recieving REQUESTRECOVERY commands
Chris M. Hostetter created SOLR-14183: - Summary: replicas do not immediately/synchronously reflect state=RECOVERYING when recieving REQUESTRECOVERY commands Key: SOLR-14183 URL: https://issues.apache.org/jira/browse/SOLR-14183 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter Spun off of SOLR-13486: Consider the following situation, which can occur in {{TestTlogReplayVsRecovery}} * healthy cluster, healthy shard with multiple replicas * network partition occurs, leader adds new documents * network partition is healed, leader is restarted * leader determines it should be leader again ** sends {{REQUESTRECOVERY}} to replicas ** leader marks itself as {{state=ACTIVE}} * client checks cluster status and sees all replicas are {{ACTIVE}} ** client assumes all replicas are far game for searching all documents ** *CLIENT FAILS TO FIND EXPECTED DOCUMENTS IF QUERYING NON-LEADER REPLICA* * asynchronously, non-leader replicas get around to {{doRecovery}} ** only now are non-leader replicas marking themselves as {{state=RECOVERING}} I think we need to reconsider when replicas are marked {{state=RECOVERING}}, either doing it synchronously in {{CoreAdminOperation.REQUESTRECOVERY_OP}}, or letting the leader set it when the leader knows it needs to initiate recovery, so that the status is updated and available to clients (and tests) immediately. Alternatively: we need a more comprehensive way for clients (and tests) to know if a shard is "healthy" then just checking the state of each replica (since setting {{state=RECOVERING}} isn't updated in real time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection
Chris M. Hostetter created SOLR-14184: - Summary: replace DirectUpdateHandler2.commitOnClose with something in TestInjection Key: SOLR-14184 URL: https://issues.apache.org/jira/browse/SOLR-14184 Project: Solr Issue Type: Test Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter Assignee: Chris M. Hostetter {code:java} public static volatile boolean commitOnClose = true; // TODO: make this a real config option or move it to TestInjection {code} Lots of tests muck with this (to simulate unclean shutdown and force tlog replay on restart) but there's no garuntee that it is reset properly. It should be replaced by logic in {{TestInjection}} that is correctly cleaned up by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13486) race condition between leader's "replay on startup" and non-leader's "recover from leader" can leave replicas out of sync (TestTlogReplayVsRecovery)
[ https://issues.apache.org/jira/browse/SOLR-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013603#comment-17013603 ] ASF subversion and git services commented on SOLR-13486: Commit 9a2497f6377601d396b1b3b8b83ffcab0fd331a3 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9a2497f ] SOLR-13486: Fix trivial test bug in TestTlogReplayVsRecovery Add TODOs for future test improvements once underlying race condition is fixed in core code > race condition between leader's "replay on startup" and non-leader's "recover > from leader" can leave replicas out of sync (TestTlogReplayVsRecovery) > > > Key: SOLR-13486 > URL: https://issues.apache.org/jira/browse/SOLR-13486 > Project: Solr > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13486__test.patch, > apache_Lucene-Solr-BadApples-NightlyTests-master_61.log.txt.gz, > apache_Lucene-Solr-BadApples-Tests-8.x_102.log.txt.gz, > org.apache.solr.cloud.TestCloudConsistency.zip > > > There is a bug in solr cloud that can result in replicas being out of sync > with the leader if: > * The leader has uncommitted docs (in the tlog) that didn't make it to the > replica > * The leader restarts > * The replica begins to peer sync from the leader before the leader finishes > it's own tlog replay on startup > A "rolling restart" situation is when this is most likeley to affect real > world users > This was first discovered via hard to reproduce TestCloudConsistency failures > in jenkins, but that test has since been modified to work around this bug, > and a new test "TestTlogReplayVsRecovery" has been added that more > aggressively demonstrates this error. > Original jira description below... > > I've been investigating some jenkins failures from TestCloudConsistency, > which at first glance suggest a problem w/replica(s) recovering after a > network partition from the leader - but in digging into the logs the root > cause acturally seems to be a thread race conditions when a replica (the > leader) is first registered... > * The {{ZkContainer.registerInZk(...)}} method (which is called by > {{CoreContainer.registerCore(...)}} & {{CoreContainer.load()}}) is typically > run in a background thread (via the {{ZkContainer.coreZkRegister}} > ExecutorService) > * {{ZkContainer.registerInZk(...)}} delegates to > {{ZKController.register(...)}} which is ultimately responsible for checking > if there are any "old" tlogs on disk, and if so handling the "Replaying tlog > for during startup" logic > * Because this happens in a background thread, other logic/requests can be > handled by this core/replica in the meantime - before it starts (or while in > the middle of) replaying the tlogs > ** Notably: *leader's that have not yet replayed tlogs on startup will > erroneously respond to RTG / Fingerprint / PeerSync requests from other > replicas w/incomplete data* > ...In general, it seems scary / fishy to me that a replica can (aparently) > become *ACTIVE* before it's finished it's {{registerInZk}} + "Replaying tlog > ... during startup" logic ... particularly since this can happen even for > replicas that are/become leaders. It seems like this could potentially cause > a whole host of problems, only one of which manifests in this particular test > failure: > * *BEFORE* replicaX's "coreZkRegister" thread reaches the "Replaying tlog > ... during startup" check: > ** replicaX can recognize (via zk terms) that it should be the leader(X) > ** this leaderX can then instruct some other replicaY to recover from it > ** replicaY can send RTG / PeerSync / FetchIndex requests to the leaderX > (either on it's own volition, or because it was instructed to by leaderX) in > an attempt to recover > *** the responses to these recovery requests will not include updates in the > tlog files that existed on leaderX prior to startup that hvae not yet been > replayed > * *AFTER* replicaY has finished it's recovery, leaderX's "Replaying tlog ... > during startup" can finish > ** replicaY now thinks it is in sync with leaderX, but leaderX has > (replayed) updates the other replicas know nothing about -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13486) race condition between leader's "replay on startup" and non-leader's "recover from leader" can leave replicas out of sync (TestTlogReplayVsRecovery)
[ https://issues.apache.org/jira/browse/SOLR-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013613#comment-17013613 ] ASF subversion and git services commented on SOLR-13486: Commit 23fab1b6ebc08dab54f2937d2886fdc9c270711c in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=23fab1b ] SOLR-13486: Fix trivial test bug in TestTlogReplayVsRecovery Add TODOs for future test improvements once underlying race condition is fixed in core code (cherry picked from commit 9a2497f6377601d396b1b3b8b83ffcab0fd331a3) > race condition between leader's "replay on startup" and non-leader's "recover > from leader" can leave replicas out of sync (TestTlogReplayVsRecovery) > > > Key: SOLR-13486 > URL: https://issues.apache.org/jira/browse/SOLR-13486 > Project: Solr > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13486__test.patch, > apache_Lucene-Solr-BadApples-NightlyTests-master_61.log.txt.gz, > apache_Lucene-Solr-BadApples-Tests-8.x_102.log.txt.gz, > org.apache.solr.cloud.TestCloudConsistency.zip > > > There is a bug in solr cloud that can result in replicas being out of sync > with the leader if: > * The leader has uncommitted docs (in the tlog) that didn't make it to the > replica > * The leader restarts > * The replica begins to peer sync from the leader before the leader finishes > it's own tlog replay on startup > A "rolling restart" situation is when this is most likeley to affect real > world users > This was first discovered via hard to reproduce TestCloudConsistency failures > in jenkins, but that test has since been modified to work around this bug, > and a new test "TestTlogReplayVsRecovery" has been added that more > aggressively demonstrates this error. > Original jira description below... > > I've been investigating some jenkins failures from TestCloudConsistency, > which at first glance suggest a problem w/replica(s) recovering after a > network partition from the leader - but in digging into the logs the root > cause acturally seems to be a thread race conditions when a replica (the > leader) is first registered... > * The {{ZkContainer.registerInZk(...)}} method (which is called by > {{CoreContainer.registerCore(...)}} & {{CoreContainer.load()}}) is typically > run in a background thread (via the {{ZkContainer.coreZkRegister}} > ExecutorService) > * {{ZkContainer.registerInZk(...)}} delegates to > {{ZKController.register(...)}} which is ultimately responsible for checking > if there are any "old" tlogs on disk, and if so handling the "Replaying tlog > for during startup" logic > * Because this happens in a background thread, other logic/requests can be > handled by this core/replica in the meantime - before it starts (or while in > the middle of) replaying the tlogs > ** Notably: *leader's that have not yet replayed tlogs on startup will > erroneously respond to RTG / Fingerprint / PeerSync requests from other > replicas w/incomplete data* > ...In general, it seems scary / fishy to me that a replica can (aparently) > become *ACTIVE* before it's finished it's {{registerInZk}} + "Replaying tlog > ... during startup" logic ... particularly since this can happen even for > replicas that are/become leaders. It seems like this could potentially cause > a whole host of problems, only one of which manifests in this particular test > failure: > * *BEFORE* replicaX's "coreZkRegister" thread reaches the "Replaying tlog > ... during startup" check: > ** replicaX can recognize (via zk terms) that it should be the leader(X) > ** this leaderX can then instruct some other replicaY to recover from it > ** replicaY can send RTG / PeerSync / FetchIndex requests to the leaderX > (either on it's own volition, or because it was instructed to by leaderX) in > an attempt to recover > *** the responses to these recovery requests will not include updates in the > tlog files that existed on leaderX prior to startup that hvae not yet been > replayed > * *AFTER* replicaY has finished it's recovery, leaderX's "Replaying tlog ... > during startup" can finish > ** replicaY now thinks it is in sync with leaderX, but leaderX has > (replayed) updates the other replicas know nothing about -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13486) race condition between leader's "replay on startup" and non-leader's "recover from leader" can leave replicas out of sync (TestTlogReplayVsRecovery)
[ https://issues.apache.org/jira/browse/SOLR-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013614#comment-17013614 ] Chris M. Hostetter commented on SOLR-13486: --- New linked jiras: * SOLR-14183: replicas do not immediately/synchronously reflect state=RECOVERYING when recieving REQUESTRECOVERY commands * SOLR-14184: replace DirectUpdateHandler2.commitOnClose with something in TestInjection > race condition between leader's "replay on startup" and non-leader's "recover > from leader" can leave replicas out of sync (TestTlogReplayVsRecovery) > > > Key: SOLR-13486 > URL: https://issues.apache.org/jira/browse/SOLR-13486 > Project: Solr > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13486__test.patch, > apache_Lucene-Solr-BadApples-NightlyTests-master_61.log.txt.gz, > apache_Lucene-Solr-BadApples-Tests-8.x_102.log.txt.gz, > org.apache.solr.cloud.TestCloudConsistency.zip > > > There is a bug in solr cloud that can result in replicas being out of sync > with the leader if: > * The leader has uncommitted docs (in the tlog) that didn't make it to the > replica > * The leader restarts > * The replica begins to peer sync from the leader before the leader finishes > it's own tlog replay on startup > A "rolling restart" situation is when this is most likeley to affect real > world users > This was first discovered via hard to reproduce TestCloudConsistency failures > in jenkins, but that test has since been modified to work around this bug, > and a new test "TestTlogReplayVsRecovery" has been added that more > aggressively demonstrates this error. > Original jira description below... > > I've been investigating some jenkins failures from TestCloudConsistency, > which at first glance suggest a problem w/replica(s) recovering after a > network partition from the leader - but in digging into the logs the root > cause acturally seems to be a thread race conditions when a replica (the > leader) is first registered... > * The {{ZkContainer.registerInZk(...)}} method (which is called by > {{CoreContainer.registerCore(...)}} & {{CoreContainer.load()}}) is typically > run in a background thread (via the {{ZkContainer.coreZkRegister}} > ExecutorService) > * {{ZkContainer.registerInZk(...)}} delegates to > {{ZKController.register(...)}} which is ultimately responsible for checking > if there are any "old" tlogs on disk, and if so handling the "Replaying tlog > for during startup" logic > * Because this happens in a background thread, other logic/requests can be > handled by this core/replica in the meantime - before it starts (or while in > the middle of) replaying the tlogs > ** Notably: *leader's that have not yet replayed tlogs on startup will > erroneously respond to RTG / Fingerprint / PeerSync requests from other > replicas w/incomplete data* > ...In general, it seems scary / fishy to me that a replica can (aparently) > become *ACTIVE* before it's finished it's {{registerInZk}} + "Replaying tlog > ... during startup" logic ... particularly since this can happen even for > replicas that are/become leaders. It seems like this could potentially cause > a whole host of problems, only one of which manifests in this particular test > failure: > * *BEFORE* replicaX's "coreZkRegister" thread reaches the "Replaying tlog > ... during startup" check: > ** replicaX can recognize (via zk terms) that it should be the leader(X) > ** this leaderX can then instruct some other replicaY to recover from it > ** replicaY can send RTG / PeerSync / FetchIndex requests to the leaderX > (either on it's own volition, or because it was instructed to by leaderX) in > an attempt to recover > *** the responses to these recovery requests will not include updates in the > tlog files that existed on leaderX prior to startup that hvae not yet been > replayed > * *AFTER* replicaY has finished it's recovery, leaderX's "Replaying tlog ... > during startup" can finish > ** replicaY now thinks it is in sync with leaderX, but leaderX has > (replayed) updates the other replicas know nothing about -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013667#comment-17013667 ] Tomoko Uchida commented on LUCENE-9004: --- [~sokolov] thanks, I myself also have tested it with a real dataset that is generated from recent snapshot files of Japanese Wikipedia. Yes it seems like "functionally correct", although we should do more formal tests for measuring Recall (effectiveness). {quote}I think it's time to post back to a branch in the Apache git repository so we can enlist contributions from the community here to help this go forward. I'll try to get that done this weekend {quote} OK, I pushed the branch to the Apache Gitbox to let others who want to involve in this issue check out it and have a try. [https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-9004-aknn-2] This also includes a patch Xin-Chun Zhang. Note: currently the new codec for the vectors and kNN graphs is placed in {{o.a.l.codecs.lucene90}}, I think we can move this to proper location when this is ready to be released. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a singl
[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013667#comment-17013667 ] Tomoko Uchida edited comment on LUCENE-9004 at 1/12/20 7:51 AM: [~sokolov] thanks, I myself also have tested it with a real dataset that is generated from recent snapshot files of Japanese Wikipedia. Yes it seems like "functionally correct", although we should do more formal tests for measuring Recall (effectiveness). {quote}I think it's time to post back to a branch in the Apache git repository so we can enlist contributions from the community here to help this go forward. I'll try to get that done this weekend {quote} OK, I pushed the branch to the Apache Gitbox to let others who want to involve in this issue check out it and have a try. While I feel it's far from being complete :), but agree with that the code is prepared to take in contributions from the community. [https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-9004-aknn-2] This also includes a patch from Xin-Chun Zhang. Note: currently the new codec for the vectors and kNN graphs is placed in {{o.a.l.codecs.lucene90}}, I think we can move this to proper location when this is ready to be released. was (Author: tomoko uchida): [~sokolov] thanks, I myself also have tested it with a real dataset that is generated from recent snapshot files of Japanese Wikipedia. Yes it seems like "functionally correct", although we should do more formal tests for measuring Recall (effectiveness). {quote}I think it's time to post back to a branch in the Apache git repository so we can enlist contributions from the community here to help this go forward. I'll try to get that done this weekend {quote} OK, I pushed the branch to the Apache Gitbox to let others who want to involve in this issue check out it and have a try. [https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-9004-aknn-2] This also includes a patch Xin-Chun Zhang. Note: currently the new codec for the vectors and kNN graphs is placed in {{o.a.l.codecs.lucene90}}, I think we can move this to proper location when this is ready to be released. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would
[jira] [Updated] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9004: -- Description: "Semantic" search based on machine-learned vector "embeddings" representing terms, queries and documents is becoming a must-have feature for a modern search engine. SOLR-12890 is exploring various approaches to this, including providing vector-based scoring functions. This is a spinoff issue from that. The idea here is to explore approximate nearest-neighbor search. Researchers have found an approach based on navigating a graph that partially encodes the nearest neighbor relation at multiple scales can provide accuracy > 95% (as compared to exact nearest neighbor calculations) at a reasonable cost. This issue will explore implementing HNSW (hierarchical navigable small-world) graphs for the purpose of approximate nearest vector search (often referred to as KNN or k-nearest-neighbor search). At a high level the way this algorithm works is this. First assume you have a graph that has a partial encoding of the nearest neighbor relation, with some short and some long-distance links. If this graph is built in the right way (has the hierarchical navigable small world property), then you can efficiently traverse it to find nearest neighbors (approximately) in log N time where N is the number of nodes in the graph. I believe this idea was pioneered in [1]. The great insight in that paper is that if you use the graph search algorithm to find the K nearest neighbors of a new document while indexing, and then link those neighbors (undirectedly, ie both ways) to the new document, then the graph that emerges will have the desired properties. The implementation I propose for Lucene is as follows. We need two new data structures to encode the vectors and the graph. We can encode vectors using a light wrapper around {{BinaryDocValues}} (we also want to encode the vector dimension and have efficient conversion from bytes to floats). For the graph we can use {{SortedNumericDocValues}} where the values we encode are the docids of the related documents. Encoding the interdocument relations using docids directly will make it relatively fast to traverse the graph since we won't need to lookup through an id-field indirection. This choice limits us to building a graph-per-segment since it would be impractical to maintain a global graph for the whole index in the face of segment merges. However graph-per-segment is a very natural at search time - we can traverse each segments' graph independently and merge results as we do today for term-based search. At index time, however, merging graphs is somewhat challenging. While indexing we build a graph incrementally, performing searches to construct links among neighbors. When merging segments we must construct a new graph containing elements of all the merged segments. Ideally we would somehow preserve the work done when building the initial graphs, but at least as a start I'd propose we construct a new graph from scratch when merging. The process is going to be limited, at least initially, to graphs that can fit in RAM since we require random access to the entire graph while constructing it: In order to add links bidirectionally we must continually update existing documents. I think we want to express this API to users as a single joint {{KnnGraphField}} abstraction that joins together the vectors and the graph as a single joint field type. Mostly it just looks like a vector-valued field, but has this graph attached to it. I'll push a branch with my POC and would love to hear comments. It has many nocommits, basic design is not really set, there is no Query implementation and no integration iwth IndexSearcher, but it does work by some measure using a standalone test class. I've tested with uniform random vectors and on my laptop indexed 10K documents in around 10 seconds and searched them at 95% recall (compared with exact nearest-neighbor baseline) at around 250 QPS. I haven't made any attempt to use multithreaded search for this, but it is amenable to per-segment concurrency. [1] [https://www.semanticscholar.org/paper/Efficient-and-robust-approximate-nearest-neighbor-Malkov-Yashunin/699a2e3b653c69aff5cf7a9923793b974f8ca164] *UPDATES:* * (1/12/2020) The up-to-date branch is: [https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-9004-aknn-2] was: "Semantic" search based on machine-learned vector "embeddings" representing terms, queries and documents is becoming a must-have feature for a modern search engine. SOLR-12890 is exploring various approaches to this, including providing vector-based scoring functions. This is a spinoff issue from that. The idea here is to explore approximate nearest-neighbor search. Researchers have found an approach based on navigating a g
[jira] [Updated] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9004: -- Description: "Semantic" search based on machine-learned vector "embeddings" representing terms, queries and documents is becoming a must-have feature for a modern search engine. SOLR-12890 is exploring various approaches to this, including providing vector-based scoring functions. This is a spinoff issue from that. The idea here is to explore approximate nearest-neighbor search. Researchers have found an approach based on navigating a graph that partially encodes the nearest neighbor relation at multiple scales can provide accuracy > 95% (as compared to exact nearest neighbor calculations) at a reasonable cost. This issue will explore implementing HNSW (hierarchical navigable small-world) graphs for the purpose of approximate nearest vector search (often referred to as KNN or k-nearest-neighbor search). At a high level the way this algorithm works is this. First assume you have a graph that has a partial encoding of the nearest neighbor relation, with some short and some long-distance links. If this graph is built in the right way (has the hierarchical navigable small world property), then you can efficiently traverse it to find nearest neighbors (approximately) in log N time where N is the number of nodes in the graph. I believe this idea was pioneered in [1]. The great insight in that paper is that if you use the graph search algorithm to find the K nearest neighbors of a new document while indexing, and then link those neighbors (undirectedly, ie both ways) to the new document, then the graph that emerges will have the desired properties. The implementation I propose for Lucene is as follows. We need two new data structures to encode the vectors and the graph. We can encode vectors using a light wrapper around {{BinaryDocValues}} (we also want to encode the vector dimension and have efficient conversion from bytes to floats). For the graph we can use {{SortedNumericDocValues}} where the values we encode are the docids of the related documents. Encoding the interdocument relations using docids directly will make it relatively fast to traverse the graph since we won't need to lookup through an id-field indirection. This choice limits us to building a graph-per-segment since it would be impractical to maintain a global graph for the whole index in the face of segment merges. However graph-per-segment is a very natural at search time - we can traverse each segments' graph independently and merge results as we do today for term-based search. At index time, however, merging graphs is somewhat challenging. While indexing we build a graph incrementally, performing searches to construct links among neighbors. When merging segments we must construct a new graph containing elements of all the merged segments. Ideally we would somehow preserve the work done when building the initial graphs, but at least as a start I'd propose we construct a new graph from scratch when merging. The process is going to be limited, at least initially, to graphs that can fit in RAM since we require random access to the entire graph while constructing it: In order to add links bidirectionally we must continually update existing documents. I think we want to express this API to users as a single joint {{KnnGraphField}} abstraction that joins together the vectors and the graph as a single joint field type. Mostly it just looks like a vector-valued field, but has this graph attached to it. I'll push a branch with my POC and would love to hear comments. It has many nocommits, basic design is not really set, there is no Query implementation and no integration iwth IndexSearcher, but it does work by some measure using a standalone test class. I've tested with uniform random vectors and on my laptop indexed 10K documents in around 10 seconds and searched them at 95% recall (compared with exact nearest-neighbor baseline) at around 250 QPS. I haven't made any attempt to use multithreaded search for this, but it is amenable to per-segment concurrency. [1] [https://www.semanticscholar.org/paper/Efficient-and-robust-approximate-nearest-neighbor-Malkov-Yashunin/699a2e3b653c69aff5cf7a9923793b974f8ca164] *UPDATES:* * (1/12/2020) The up-to-date branch is: [https://github.com/apache/lucene-solr/tree/jira/lucene-9004-aknn-2] was: "Semantic" search based on machine-learned vector "embeddings" representing terms, queries and documents is becoming a must-have feature for a modern search engine. SOLR-12890 is exploring various approaches to this, including providing vector-based scoring functions. This is a spinoff issue from that. The idea here is to explore approximate nearest-neighbor search. Researchers have found an approach based on navigating a graph that partially encodes the ne