On Fri, Mar 23, 2018 at 02:24:39PM +0100, Enrico Zini wrote: > On Fri, Mar 23, 2018 at 02:10:07PM +0100, Bill Allombert wrote: > > > > Probably. Is the format of that file documented somewhere? > > This is a list of key/value pair in RFC822 style. > > See /usr/share/doc/popularity-contest/examples/bin/README.examples > > for the format of the Package line. > > I have a few questions: > > How is the package name separated from the integer fields? It does not > look like a fixed-width field: > > Package: abev-form-obhgepi-fpk-nav 0 0 0 2 > Package: abev-form-obhgepi-fpk-nav-egyeb 0 0 0 2 > > If it is instead space-separated, currently I didn't see package names > that contained spaces, but is there a guarantee that the package name > won't contain spaces?
It is garanteed that package name will not contain spaces. > Alternatively, should the parsing instead be done by splitting on \s+ > from the right with a maximum of 4 splits? > > Some package names seem to be truncated, like this one: > > Package: apache-openoffice-4.1.4-linux-x86-install-rpm-de 0 0 0 > 1 The server should not truncate anything. I will check what happened. > Is the character set guaranteed to be UTF8 Definitely no. > or should I parse it as > binary, and drop all lines that do not decode as UTF8, or even all lines > that are not strictly 7-bit ascii, like this one? There is no reason to assume that UTF8 is "better" than non-UTF8 here. Cheers, -- Bill. <ballo...@debian.org> Imagine a large red swirl here.