jhuber6 added a comment.

In D122069#3397867 <https://reviews.llvm.org/D122069#3397867>, @tra wrote:

> In D122069#3397560 <https://reviews.llvm.org/D122069#3397560>, @jhuber6 wrote:
>
>> I was definitely thinking about a lot of alternatives to making yet another 
>> on-disk binary format. I had a few discussions with @JonChesterfield on the 
>> subject. There were two things that put me off of using JSON primarily, 
>> correct me if I have any misconceptions.
>>
>> 1. This is fundamentally a very thin metadata wrapper around a binary image. 
>> AFAIK if you want to stream binary data to a JSON you need to use a base64 
>> or similar encoding, which makes the files bigger and adds some extra work.
>
> I didn't realize that content of the file is part of your packagin scheme. 
> I've interpreted `embed certain metadata along with a binary image ` as 
> literally keeping the binaries as binaries and just adding a small blob with 
> additional metadata. It was that data I meant to encode as JSON. Encoding the 
> offload binaries themselves as JSON would indeed be wasteful.
>
> We could keep the header as a binary (never changes on-disk format) and use 
> JSON representation for the array of the entries (0-terminated string or 
> string + length stored in the header) which also has fixed (as in 
> version-agnostic) on-disk format, though of variable length.
> Versioning still has to be dealt with, but now it would be independent of the 
> on-disk format. The variable length of JSON is both a plus and a minus. On 
> the positive side is that content is open. Some tool may add whatever is 
> relevant for its own use, comments, provenance info, checksum, etc.
> The downside is that it has variable length, so it would have to be written 
> after the image binary. We would also need to deal with potential errors 
> parsing JSON.
>
> I don't have a strong preference either way. I think JSON may have few minor 
> benefits, but the proposed binary format has the advantage of simplicity. We 
> can always switch to json-encoded entries later by bumping the header version.

I like the idea of keeping the header, we could add an additional field to the 
header for the size of the entry and I feel like we'd be pretty future-proof if 
we want to change stuff. I think using this binary format for now is sufficient 
as long as we keep upgrading it to something more complex an open possibility.

I'm also not sure if I should extend this as a binary format inheriting from 
LLVM's Binary class. It would be a minimal amount of work but I'm not sure if 
this use-case warrants extending this to broader LLVM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122069/new/

https://reviews.llvm.org/D122069

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to