richardstartin commented on issue #7791: URL: https://github.com/apache/pinot/issues/7791#issuecomment-979221370
For the sake of considering alternatives, if the required metadata could be squeezed into 2KB it could be stored in the file attributes on the segment file. S3 allows [retrieval of all attributes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html) without downloading the file via a `HeadObject` request, and HDFS [extended file attributes](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html) support a similar lightweight access protocol. Naturally, GCS has the same [concept](https://cloud.google.com/storage/docs/metadata), as does [Azure](https://docs.microsoft.com/en-us/rest/api/storageservices/get-file-metadata). I believe this would address concerns about separation, directory structures, but also provide good guarantees., the only catch is the limit of 2KB for user defined attributes. The metadata written into creation.meta consists of a crc and creation time, metadata.properties is heavier, but it could all be written in to a JSON object which could be compressed, base64 encoded and saved as a single file attribute on the segment file. Optimistic `HeadObject`/`getfattr` requests would be made to the filesystem. Support for old segment files without the correct metadata would be provided by falling back to the mechanism proposed here, and eventually metadata would always be retrieved successfully first time by reading file attributes without downloading any segment files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org