[
https://issues.apache.org/jira/browse/HADOOP-19906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085046#comment-18085046
]
ASF GitHub Bot commented on HADOOP-19906:
-----------------------------------------
pan3793 opened a new pull request, #8522:
URL: https://github.com/apache/hadoop/pull/8522
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
This is an alternative to HADOOP-19668 and HADOOP-19670
(SubjectInheritingThread approach) to restore Subject propagation semantics on
JDK 22+.
Issues of HADOOP-19668 and HADOOP-19670 approach:
- requires invasive modification on all downstream projects - downstream
projects like Spark must replace all Thread with SubjectInheritingThread, this
requires a lot of work and is mostly impossible, third-party libs may not allow
setting a custom ThreadFactory ...
- semantics are not fully aligned
Subject should be captured at the Thread construction instead of calling
start() time, which means a Thread construct in A, start in B must observe the
same Subject with A (capture at construction). This breaks a typical thread
usage pattern - thread pool.
No cascading Subject propagation, it's an obvious conclusion because
`SubjectInheritingThread` should be explicitly declared everywhere to achieve
the Subject propagation semantics.
This approach addressed the above issues but still has its own limitations -
- The approach should work on the platform thread, but
InheritableThreadLocal has different behaviors on a virtual thread
- This approach works for UGI.doAs, but does not apply to Subject.doAs,
e.g., when users use JAAS-based Kerberos auth, they still need to maintain
Subject propagation semantics themselves.
### How was this patch tested?
UT added, also applied to our internal branch and integrated with Spark:
- without HADOOP-19668 and HADOOP-19670
Spark works on YARN correctly, because YARN will prepare the credentials
before launching the container, all threads see null Subject, then fallback to
the login user, and pick those credentials, but subsequent DT update is broken.
On K8s, everything on the executor side is broken.
- applies HADOOP-19668 and HADOOP-19670 on the Hadoop client, also changes
Spark's thread to use SubjectInheritingThread
A few threads see the same Subject, but most threads do not, due to
SubjectInheritingThread not working on the thread pool case (explained in the
above section). The situation does not change from the user's perspective.
- Revert HADOOP-19668 and HADOOP-19670, applies this patch. Changes only
apply to Hadoop Client, both Spark on YARN and K8s work as expected.
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
Contains content generated by Claude Opus 4.7.
> Alternative to SubjectInheritingThread to restore Subject propagation
> ---------------------------------------------------------------------
>
> Key: HADOOP-19906
> URL: https://issues.apache.org/jira/browse/HADOOP-19906
> Project: Hadoop Common
> Issue Type: Bug
> Components: security
> Affects Versions: 3.5.0, 3.4.3
> Reporter: Cheng Pan
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]