[PR] core:Refactor the code of HadoopTableOptions [iceberg]

via GitHub Wed, 03 Jul 2024 00:35:37 -0700


BsoBird opened a new pull request, #10623:
URL: https://github.com/apache/iceberg/pull/10623


   # Refactor the code of HadoopTableOptions
   
   ## 1.Current problems
   
   ### 1.1.Enabling `write.metadata.delete-after-commit.enabled` may result in 
dirty commits.
   
   Since this option is turned on, catalog will clean up the commit history.  
   If the client commits an older version at this point, the commit will 
succeed because the metadata has been deleted. 
   However, the user will not be able to view this commit.  
   From the user's point of view, he will encounter a very strange result: his 
submission is successful, 
   but the content of the submission disappears.  
   In addition, dirty commits generated in this case will not be cleaned up.  
   
   
   ### 1.2.Problems with lockManager's locking policy.
   
   #### 1.2.1.Incorrect behaviour of locking and releasing
   
   But it ignores the fact that the writeVersionHint process may also need to 
be locked.  
   In current implementations, users rely on versionHintFile to speed up 
version lookups.  
   If the information in the versionHintFile is incorrect, then it will lead to 
big trouble.  
   Example:If more than one client executes the commit method concurrently, 
   the version information in the versionHintFile may be written to a very old 
version.(versionHintFile’s version < currentVersion - 
write.metadata.previous-versions-max).  
   At this point, the user cannot call the `refresh` method to get the latest 
version.  
   Eventually, the user's commit will **fail forever**.
   
   #### 1.2.2.Incorrect timing of locking and releasing
   
   he current version of the code is locked only for commits of a certain 
version, e.g., multiple clients are committing to version 3 at the same time, 
and the lock is in effect.  
   However, when multiple clients submit different versions at the same time, 
they are not locked to each other.  
   For object storage filesystems, such locking may cause the information in 
the versionHintFile to be incorrect.  
   For normal filesystems, such locking is meaningless, since the OS provides 
the same functionality when calling fs.rename. And it also causes the 
versionHintFile content to be messed up.  
   
   #### 1.2.3.Incorrect behaviour of handing exception
   
   The current implementation of the commit method is not atomic, assuming that 
the client calls renameMetadata successfully and then encounters an OOM 
exception, 
   which triggers spark to clean up the data files from this commit. 
   But spark doesn't clean up the metadata files generated by this commit, 
which ultimately leads to the problem described in 
[here](https://github.com/apache/iceberg/pull/9546#issue-2095166356).
   
   **So.given the large number of problems with hadoopTableOptions, we need to 
completely refactor hadoopTableOptions.**
   
   ## 2.List of features to be supported
   
   ### 2.1.In the case of a user using an arbitrary storage system with a 
global locking service, we need to support the following features:
    - Supports concurrency control
    - No dirty commits
    - Avoid generating versionHintFile files with misplaced content
    - Guarantees the atomicity of commit operations.
   ### 2.2.In the case where the user is using a normal file system (which 
supports rename operations that do not overwrite the target file) and is not 
using the global locking service, the following features need to be supported:
    - Supports concurrency control
    - Regardless of whether the contents of the versionHintFile are incorrect, 
the client can submit the correct metadata.
    - Support for cleaning up dirty commits
    - Guarantees the atomicity of commit operations.
   
   ## 3.Glossary:
    - Dirty-commit: `commitVersion < currentVersion - 
write.metadata.previous-versions-max` && `commit-success=True`
    - Normal file system: which supports rename operations that do not 
overwrite the target file,Example: `mv -n a.txt 
b.txt`,`HdfsFileSystem.rename(src,dst)`,`WindowsFs.rename(sec,dst)`
    - Atomicity of commit: After the new version of the metadata has been 
renamed successfully, any exceptions will be ignored. Any exceptions thrown 
before the metadata has been renamed will be treated as a commit failure. If 
the rename of the new version of the metadata fails, we will throw a 
`CommitStateUnknown` exception and stop everything. The operation of 
VersionHintFile will be disregarded.
    - Global locking service: All clients operating on the iceberg table must 
acquire the same lock. Successful acquisitions are required before subsequent 
commits can be performed. Example: `Distributed locking based on 
zk/redis/database`, `constrains all operations to the same in-memory lock`.
   
   ## 4.Flow chart:
   ```mermaid
   graph TB
       START[commit-start] -->TRY-LOCK{try-get-lock}
       TRY-LOCK --> |fail| COMMIT-FAIL[commit-fail]
       TRY-LOCK --> |success| CHECK-EXISTS{check-metadata-file-exists}
       CHECK-EXISTS -->|already-exists| COMMIT-FAIL
       CHECK-EXISTS -->|not-exists| GLOBAL-LOCK{useGlobalLockingService?}
       GLOBAL-LOCK -->|yes| FIND-LATEST-VERSION(findLatestVersion)
       GLOBAL-LOCK -->|no| 
FIND-LATEST-VERSION-NO-HINT(findLatestVersionWithOutVersionHint)
       FIND-LATEST-VERSION-NO-HINT --> 
CHECK-COMMIT-VERSION{checkCurrentVersionIsLatest}
       FIND-LATEST-VERSION --> CHECK-COMMIT-VERSION
       CHECK-COMMIT-VERSION --> |no| COMMIT-FAIL(commit-fail)
       CHECK-COMMIT-VERSION --> |yes| DO-COMMIT{do-commit}
       FAST-FAIL --> COMMIT-STATE-UNKNOWN(commit-state-unknow)
       DO-COMMIT --> |not-success| COMMIT-FAIL(commit-fail)
       DO-COMMIT --> |server-side-exception| TRY-CHECK-COMMIT(try-check)
       DO-COMMIT --> |commit-success| NEED-CLEAN-DIRTY{useGlobalLockingService?}
       DO-COMMIT --> |client-side-exception| COMMIT-STATE-UNKNOWN
       TRY-CHECK-COMMIT --> |check-success-and-commit-fail| COMMIT-FAIL
       TRY-CHECK-COMMIT --> |check-success-and-commit-success| COMMIT-SUCCESS
       TRY-CHECK-COMMIT --> |exception| COMMIT-STATE-UNKNOWN
       NEED-CLEAN-DIRTY --> |yes| 
CHECK-DIRTY-COMMIT{this-commit-is-a-dirty-commit?}
       CHECK-DIRTY-COMMIT --> |no| CLEAN-DIRTY(clean-old-dirty-commit)
       CHECK-DIRTY-COMMIT --> |yes| FAST-FAIL(fast-fail)
       CLEAN-DIRTY  --> COMMIT-SUCCESS
       NEED-CLEAN-DIRTY --> |no| COMMIT-SUCCESS
       COMMIT-SUCCESS --> UNLOCK(UNLOCK)
       COMMIT-STATE-UNKNOWN --> UNLOCK
       COMMIT-FAIL --> UNLOCK
       UNLOCK --> END(END)
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] core:Refactor the code of HadoopTableOptions [iceberg]

Reply via email to