Dear Richard Biener, On Wed, Dec 4, 2019 at 5:48 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Sun, Dec 1, 2019 at 7:47 PM JeanHeyd Meneide <phdoftheho...@gmail.com> > wrote: > > > > ... > > It worked, but this approach required removing some type checks > > in digest_init just to be able to fake-up a proper initialization from > > a string literal. It also could not initialize data beyond `unsigned > > char`, as that is what I had pinned the array representation to upon > > creation of the STRING_CST. > > Using a STRING_CST is an iteresting idea and probably works well > for most data. > > ... > > Note we also have "special" CONSTRUCTOR fields like > RANGE_EXPR for repetitive data. > > Since the large initializers are usually in static initializers > tied to variables another option is to replace the DECL_INITIAL > CONSTRUCTOR tree node with a new BINARY_BLOB > tree node containing a pointer to target encoded (compressed) > data.
Thank you so much for your feedback! Your ideas really helped me out here. I'm using RANGE_EXPR with an INDEX of 2 operands that are the min and max of the array, and a VALUE that is the binary data to pull from. I coded a special handling for digest_init for the C frontend: I'll likely have to add some additional magic for the C++ initialization rules too. Some preliminary testing with large binary files went like so: - 50 MB binary file, huge.bin - xxd generated include file, huge.bin.h (N.B. took 302 MB) - compile a file with no library dependencies, using the #embed directive or just relying on the xxd file It takes 11 seconds for #embed compilation to chew through the file, encode it in a special way so it can survive external tools applied between the preprocessor and the real compilation of the file (e.g., a distcc or icecc workflow). It takes 621 seconds for the #include-based, xxd-like compilation. I could get it even faster if I didn't have to do the encode/decode step for the special way #embed handles data between when it exits the preprocessor and when it enters the actual C/C++ front ends. I know of an implementation to do it, but because #embed is not standard I have to respect that other tools won't know how to behave in the presence of such a special secondary implementation, so my encoded implementation is the one that will have to stand for now. Thank you so much, JeanHeyd