[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1925: Cleanup DWPT state handling

GitBox Tue, 29 Sep 2020 07:00:47 -0700


mikemccand commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r496729477




##########
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##########
@@ -157,49 +157,42 @@ private boolean updatePeaks(long delta) {
   }
 
   DocumentsWriterPerThread doAfterDocument(DocumentsWriterPerThread perThread, 
boolean isUpdate) {
-    final long delta = perThread.getCommitLastBytesUsedDelta();
+    final long delta = perThread.commitLastBytesUsed();
     synchronized (this) {
-      // we need to commit this under lock but calculate it outside of the 
lock to minimize the time this lock is held
-      // per document. The reason we update this under lock is that we mark 
DWPTs as pending without acquiring it's
-      // lock in #setFlushPending and this also reads the committed bytes and 
modifies the flush/activeBytes.
-      // In the future we can clean this up to be more intuitive.

Review comment:
       ;)

##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
##########
@@ -446,7 +446,7 @@ long updateDocuments(final Iterable<? extends Iterable<? 
extends IndexableField>
   private boolean doFlush(DocumentsWriterPerThread flushingDWPT) throws 
IOException {
     boolean hasEvents = false;
     while (flushingDWPT != null) {
-      assert flushingDWPT.hasFlushed() == false;
+      assert flushingDWPT.getState() == 
DocumentsWriterPerThread.State.FLUSHING : "expected FLUSHING but was: " + 
flushingDWPT.getState();

Review comment:
       Maybe add a `boolean assertState(State)` to DWPT that does the above 
`assert` and then returns `true` so we can call it under `assert`?  Only 
downside is then we have double looking `assert`, i.e. `assert 
flushingDWPT.assertState(State.FLUSHING)`.  Looks like we are doing this in at 
least three places here ...

##########
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##########
@@ -157,49 +157,42 @@ private boolean updatePeaks(long delta) {
   }
 
   DocumentsWriterPerThread doAfterDocument(DocumentsWriterPerThread perThread, 
boolean isUpdate) {
-    final long delta = perThread.getCommitLastBytesUsedDelta();
+    final long delta = perThread.commitLastBytesUsed();
     synchronized (this) {
-      // we need to commit this under lock but calculate it outside of the 
lock to minimize the time this lock is held
-      // per document. The reason we update this under lock is that we mark 
DWPTs as pending without acquiring it's
-      // lock in #setFlushPending and this also reads the committed bytes and 
modifies the flush/activeBytes.
-      // In the future we can clean this up to be more intuitive.
-      perThread.commitLastBytesUsed(delta);
       try {
         /*
          * We need to differentiate here if we are pending since 
setFlushPending
          * moves the perThread memory to the flushBytes and we could be set to
          * pending during a delete
          */
-        if (perThread.isFlushPending()) {
-          flushBytes += delta;
-          assert updatePeaks(delta);
-        } else {
-          activeBytes += delta;
-          assert updatePeaks(delta);
+        activeBytes += delta;

Review comment:
       Hmm, why are we removing the conditional on "flush pending" here (and 
always updating `activeBytes`, never `flushBytes`)?  I guess `setFlushPending` 
will move all `activeBytes` over to `flushBytes`, so it is fine for us to 
always increment `activeBytes` here?
   
   Edit: hmm it looks like we are also removing `setFlushPending`'s moving of 
`activeBytes` to `flushBytes` ;)  So now I don't quite understand how we are 
changing the RAM accounting here.
   
   Edit again!: OK I see, we moved the RAM shifting into 
`checkoutFlushableWriter`, OK good!
   
   Should we fix (or maybe just remove) the above comment now?

##########
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##########
@@ -295,27 +289,32 @@ public synchronized void waitForFlush() {
    * {@link DocumentsWriterPerThread} must have indexed at least on Document 
and must not be
    * already pending.
    */
-  public synchronized void setFlushPending(DocumentsWriterPerThread perThread) 
{
-    assert !perThread.isFlushPending();
+  synchronized void setFlushPending(DocumentsWriterPerThread perThread) {
+    assert !perThread.isFlushPending() : "state: " + perThread.getState();
     if (perThread.getNumDocsInRAM() > 0) {
-      perThread.setFlushPending(); // write access synced
-      final long bytes = perThread.getLastCommittedBytesUsed();
-      flushBytes += bytes;
-      activeBytes -= bytes;

Review comment:
       Hmm we are no longer moving `bytes` from `activeBytes` to `flushedBytes` 
here?  Are we (the caller here?) doing that elsewhere?

##########
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##########
@@ -513,48 +515,21 @@ public String toString() {
         + numDocsInRAM + ", deleteQueue=" + deleteQueue + ", " + 
numDeletedDocIds + " deleted docIds" + "]";
   }
 
-
   /**
    * Returns true iff this DWPT is marked as flush pending
    */
   boolean isFlushPending() {
-    return flushPending.get() == Boolean.TRUE;
-  }
-
-  /**
-   * Sets this DWPT as flush pending. This can only be set once.
-   */
-  void setFlushPending() {
-    flushPending.set(Boolean.TRUE);
-  }
-
-
-  /**
-   * Returns the last committed bytes for this DWPT. This method can be called
-   * without acquiring the DWPTs lock.
-   */
-  long getLastCommittedBytesUsed() {
-    return lastCommittedBytesUsed;
-  }
-
-  /**
-   * Commits the current {@link #ramBytesUsed()} and stores it's value for 
later reuse.
-   * The last committed bytes used can be retrieved via {@link 
#getLastCommittedBytesUsed()}
-   */
-  void commitLastBytesUsed(long delta) {
-    assert isHeldByCurrentThread();
-    assert getCommitLastBytesUsedDelta() == delta : "delta has changed";
-    lastCommittedBytesUsed += delta;
+    return state == State.FLUSH_PENDING;

Review comment:
       Maybe we should remove `isFlushPending` method entirely and expect 
callers to check `state == State.FLUSH_PENDING` themselves?

##########
File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##########
@@ -593,9 +568,61 @@ void unlock() {
   }
 
   /**
-   * Returns <code>true</code> iff this DWPT has been flushed
+   * Returns the DWPTs current state.
    */
-  boolean hasFlushed() {
-    return hasFlushed.get() == Boolean.TRUE;
+  State getState() {
+    return state;
   }
+
+  /**
+   * Transitions the DWPT to the given state of fails if the transition is 
invalid.

Review comment:
       s/`of`/`or`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1925: Cleanup DWPT state handling

Reply via email to