Niels Thykier wrote (05 Jul 2016 20:21:03 GMT) : > Re: the memory usage; it may make sense to do the report as multiple > "documents" (e.g. one per source package or something).
> It would allow both generator and consumers to process it more > efficiently by processing a single source at the time. I'm open to discussing this option, and have just spent some time thinking about it. I have a few worries about it, as far as this first iteration is concerned. First of all, for my use case retrieving all data in one single HTTP request simpler, and I'm ready to take the performance hit since it also makes my consumer code much simpler to write, review and maintain. Also, once published on https://lintian.d.o/, these per-package files will look very much like endpoints for a web API, that consumers might start using in the wild, and then: 1. The exact URIs matter a lot, as they become the API endoints; I'm not interested in designing that API at the moment personally, especially in a way that provides any kind of stability (and anyway, the API should be versioned so all URIs should start with a "/0.1" component or something). 2. I wonder if YAML is optimal for consumers that want to use a smaller subset of the data. E.g. for web tools it's easier to use JSON. So right now, I'm leaning towards keeping the one-big-YAML-file design since it matches my current needs very well, and leaving it to those who want finer-grained machine-readable access to design how the data could be made available (endpoints, format, etc.). What do you think? As a side note: I'd be totally fine with advertising the one-big-YAML-file format as subject to change, and to adjust my consumer code in the future if needed, e.g. if/when a better data format and endpoints layout is designed. Cheers, -- intrigeri