On Fri, Mar 23, 2018 at 02:24:39PM +0100, Enrico Zini wrote:
> On Fri, Mar 23, 2018 at 02:10:07PM +0100, Bill Allombert wrote:
> 
> > > Probably. Is the format of that file documented somewhere?
> > This is a list of key/value pair in RFC822 style.
> > See /usr/share/doc/popularity-contest/examples/bin/README.examples
> > for the format of the Package line.
> 
> I have a few questions:
> 
> How is the package name separated from the integer fields? It does not
> look like a fixed-width field:
> 
> Package: abev-form-obhgepi-fpk-nav          0     0     0     2
> Package: abev-form-obhgepi-fpk-nav-egyeb     0     0     0     2
> 
> If it is instead space-separated, currently I didn't see package names
> that contained spaces, but is there a guarantee that the package name
> won't contain spaces?

It is garanteed that package name will not contain spaces.

> Alternatively, should the parsing instead be done by splitting on \s+
> from the right with a maximum of 4 splits?
> 
> Some package names seem to be truncated, like this one:
> 
> Package: apache-openoffice-4.1.4-linux-x86-install-rpm-de     0     0     0   
>   1

The server should not truncate anything. I will check what happened.

> Is the character set guaranteed to be UTF8

Definitely no.

> or should I parse it as
> binary, and drop all lines that do not decode as UTF8, or even all lines
> that are not strictly 7-bit ascii, like this one?

There is no reason to assume that UTF8 is "better" than non-UTF8 here.

Cheers,
-- 
Bill. <ballo...@debian.org>

Imagine a large red swirl here. 

Reply via email to