chrajeshbabu opened a new issue, #16566:
URL: https://github.com/apache/pinot/issues/16566
**Background**
Apache Pinot supports encrypting segments before storing them in deepstores
such as HDFS or Amazon S3, ensuring data is protected at rest in those
locations. Pinot’s comprehensive metadata management tracks the exact placement
of column data and associated index information within each segment file,
enabling precise control over data access and storage.
**Problem Statement**
When segments are downloaded from deepstores for query processing, they are
decrypted and stored on the server’s local disk. Users having the access to the
disk can read the data plaintext on local storage. While encrypting the entire
segment would mitigate this, it would also introduce performance penalties, as
the entire file would need to be decrypted for every read, even if only a small
subset of columns is queried.
**Proposed Approach – Column-Level Data Encryption**
> Rather than encrypting the full segment, apply encryption selectively at
the column level for sensitive fields. Pinot’s metadata provides:
The byte-range location of each column within the segment file.
The index structures associated with each column.
> By leveraging this metadata, we can:
Encrypt only columns containing sensitive data.
Decrypt these columns on-demand during query execution, leaving other
columns untouched.
Maintain efficient reads for non-sensitive columns without additional
decryption overhead.
Benefits
Performance Efficiency: Only the required sensitive columns are
decrypted, reducing CPU and I/O overhead.
Reduced Security Risk: Sensitive fields remain encrypted even in
temporary local storage.
Minimal Architectural Change: Builds on Pinot’s existing metadata and
indexing capabilities without major changes to the segment format.
Flexibility: Allows different encryption strategies per column, based on
sensitivity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]