(Sorry about a not fully formed thought on numeral 2 in the numbered list
in the last message. Typing and thinking too fast.)
Anyway, as an update to this: further research indicates that
upload-history is simply spitting out the data from the email in a
"Message-Date" field. I can't blame the upload-history code for spitting
out the garbage it got as input data.
However, in udd/upload_history_gatherer.py in udd.git, there is machinery
to insert the value of Message-Date into the field, rather than the
changes file's Date header.
I suggest the following untested patch:
diff --git a/udd/upload_history_gatherer.py
b/udd/upload_history_gatherer.py
index 4091eec..fe485e6 100644
--- a/udd/upload_history_gatherer.py
+++ b/udd/upload_history_gatherer.py
@@ -54,5 +54,5 @@ class upload_history_gatherer(gatherer):
VALUES ($1, $2, $3, $4)" % (self.my_config['table'] + '_closes'))
- query = "EXECUTE uh_insert(%(Source)s, %(Version)s, %(Message-Date)s,
\
+ query = "EXECUTE uh_insert(%(Source)s, %(Version)s, %(Date)s, \
%(Changed-By)s, %(Changed-By_name)s, %(Changed-By_email)s, \
%(Maintainer)s, %(Maintainer_name)s, %(Maintainer_email)s, %(NMU)s,
\
(sorry about some patch mangling here by email)
The key question for us is: Are we okay with changing the definition of
"date" in the upload-history table to mean the date within the changes
file, rather than the email message's date?
One downside to this is that for sponsored packages, we see when the
sponsoree did the work, rather than when the package got uploaded. The
current behavior, of keeping the upload_history table's contents being the
date of the message, results in a best-effort attempt to measure when the
upload seemed to actually get processed. So I like the current behavior,
and I now would reject my patch.
Okay.
At that point if the goal is, "The date field represents, to the best
effort we can approximate, the time that the upload was successfully
processed by the Debian servers", we have a few different options. I'm
going to do some further research here and get back to the bug with a
recommendation, but as a surely-incomplete list of options:
1. We could try to implement a very conservative fixup strategy like, "If
the Message-Date field is more than one year different from the Date
field, go with whichever of (Message-Date, Date) is closest to the
envelope From".
2. We could create a "fixups" list by hand of package uploads and actual
dates, manually maintained, that overrides the data in the mbox files.
That is probably the easiest, since I suspect there are not very many
packages with this problem.
3. We could store both Message-Date and Date in UDD, and then tell users
of UDD that they will have to deal with this problem of bad data. (This is
the option I like the least.)
Of these, I like option 1 the most. I will work on implementing that.
-- Asheesh.
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org