Hari Krishna Dara created HBASE-30222:
-----------------------------------------

             Summary: DefaultCipherProvider re-parses HBase configuration 
(hbase-default.xml/hbase-site.xml) on every instantiation
                 Key: HBASE-30222
                 URL: https://issues.apache.org/jira/browse/HBASE-30222
             Project: HBase
          Issue Type: Bug
          Components: encryption
    Affects Versions: 2.6.6, 2.5.15, 3.0.0-beta-1, 4.0.0-alpha-1
            Reporter: Hari Krishna Dara
            Assignee: Hari Krishna Dara


  h3. Summary

  {{DefaultCipherProvider}} initializes its {{conf}} field inline:

    // DefaultCipherProvider.java
    private Configuration conf = HBaseConfiguration.create();

  Every time the provider is constructed, this parses hbase-default.xml and
  hbase-site.xml off the classpath to build a fresh Configuration. That
  Configuration is then immediately discarded: providers are constructed via
  {{Encryption.getCipherProvider(conf)}} -> 
{{ReflectionUtils.newInstance(class, conf)}},
  which invokes the no-arg constructor (running the field initializer) and 
*then*
  calls {{setConf(conf)}} -- CipherProvider extends Configurable -- overwriting 
the
  field with the caller's conf. So the default-config parse performed in the
  constructor is pure waste on every call.

  {{Encryption.getCipherProvider(conf)}} constructs a new provider instance on 
every
  invocation (it does not reuse the {{getInstance()}} singleton), so the 
wasteful
  parse happens on every cipher resolution.

  h3. Impact

  Each call to {{Encryption.getCipher(conf, name)}} /
  {{Encryption.getCipherProvider(conf)}} parses the full HBase configuration. A 
JMH
  microbenchmark of a single provider construction measured ~2.3 ms latency and
  ~1.1 MB of allocation, essentially all of it attributable to
  {{HBaseConfiguration.create()}} -- confirmed by control benchmarks isolating 
the
  config parse from cipher construction and the AES operations. (Microbenchmark 
on
  Apple M4 Max / JDK 17; the absolute numbers are environment-dependent, but the
  attribution -- ~100% of the cost is the config parse -- is the point.)

  For encryption at rest as it exists today the practical impact is small, 
because
  cipher resolution happens only at infrequent, amortized points:

    - per WAL file: AbstractProtobufLogWriter, SecureProtobufLogReader
    - per HFile / encryption-context creation: 
SecurityUtil.createEncryptionContext
      and related paths

  These are once-per-file events amortized over many cells, so the cost is real 
but
  negligible in normal operation. However, the inefficiency is latent: any code 
path
  that resolves ciphers at higher frequency (e.g. per-key or per-row) would pay 
a
  full config parse each time, turning a harmless cost into a significant
  latency/allocation problem. Fixing it removes a foot-gun for current and 
future
  callers and eliminates needless GC churn.

  h3. Root cause

  {{DefaultCipherProvider}} has no Configuration-accepting constructor, so
  construction *always* calls {{HBaseConfiguration.create()}}, even though every
  production caller overwrites that value via {{setConf()}} before use. There 
is no
  way to suppress the parse.

  The {{getInstance()}} singleton is used only by tests (TestAES, 
TestCommonsAES),
  both of which already call {{setConf(conf)}} immediately afterward; it is
  effectively a test-only accessor but is not annotated as such.

  h3. Proposed fix

  Remove the inline initializer so construction does not parse config:

    private Configuration conf;   // set via setConf() before use

  This is safe for production: {{ReflectionUtils.newInstance()}} always calls
  {{setConf()}} before the provider is used, and {{getConf()}} is first 
dereferenced
  later (in the AES/AES256GCM constructor), by which point conf is set.
  Additionally:

    - Annotate {{getInstance()}} as @InterfaceAudience.LimitedPrivate(UNITTEST) 
(or
      otherwise document it as test-only), since tests are its only callers and 
they
      already set conf explicitly.
    - Optionally add a fail-fast guard in {{getConf()}}
      (Preconditions.checkState(conf != null, ...)) so a caller that forgets
      setConf() fails loudly instead of NPE-ing inside cipher construction --
      preserving the "always non-null" property the field initializer 
accidentally
      provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to