richardstartin commented on issue #7791:
URL: https://github.com/apache/pinot/issues/7791#issuecomment-979221370


   For the sake of considering alternatives, if the required metadata could be 
squeezed into 2KB it could be stored in the file attributes on the segment 
file. S3 allows [retrieval of all 
attributes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html)
 without downloading the file via a `HeadObject` request, and HDFS [extended 
file 
attributes](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html)
 support a similar lightweight access protocol. Naturally, GCS has the same 
[concept](https://cloud.google.com/storage/docs/metadata), as does 
[Azure](https://docs.microsoft.com/en-us/rest/api/storageservices/get-file-metadata).
 I believe this would address concerns about separation, directory structures, 
but also provide good guarantees., the only catch is the limit of 2KB for user 
defined attributes.
   
   The metadata written into creation.meta consists of a crc and creation time, 
metadata.properties is heavier, but it could all be written in to a JSON object 
which could be compressed, base64 encoded and saved as a single file attribute 
on the segment file. 
   
   Optimistic `HeadObject`/`getfattr` requests would be made to the filesystem. 
Support for old segment files without the correct metadata would be provided by 
falling back to the mechanism proposed here, and eventually metadata would 
always be retrieved successfully first time by reading file attributes without 
downloading any segment files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to