ctl_mboxlist dump and undump

ellie timoney Tue, 01 Feb 2022 21:09:55 -0800

Hi developers,

This is one I said I'd send an email about last week and then forgot to do so...


There's a long standing problem with 'ctl_mboxlist -d' (dump mailboxes.db to a 
file) and 'ctl_mboxlist -u' (undump such a file back into a mailboxes.db): they 
do not know about all the fields mailboxes.db stores, so you lose the data from 
those fields in the process.  I don't know how long ago the tool became out of 
synchronisation with the data.

Relatedly, there's a long standing problem with the "improved_mboxlist_sort" 
option being off by default, meaning there's surely a lot of deployments stuck 
with weird sorting issues around space/hyphen/etc (which in ASCII order sort 
earlier than the hierarchy delimiters).  The documentation for this setting 
says that it must not be changed on a live system, and that you must dump and 
then undump your mailboxes.db between changing it.  But we can't be 
recommending people do that if it's going to lose data in the process.

Dilyan has provided a fix for ctl_mboxlist which correctly dumps and undumps 
all the fields that are present in mailboxes.db in 3.4: 
https://github.com/cyrusimap/cyrus-imapd/pull/3384  But the thing is, in master 
and 3.6, there are even more new fields, which the fix omits.  One of the new 
fields in 3.6/master is name_history, which is itself a multi-value field I 
think, so it won't be trivial to just further extend Dilyan's work to include 
it.

I think I plan to accept Dilyan's patch for 3.4, but even so, the new fields in 
3.6/master mean we will then have an immediate regression when 3.6 is released. 
 So while it's in beta I want to get this tool fixed up, and in a way that's 
forward compatible with whatever complicated new stuff we add next.  I was 
expecting to be spending some quality time on the tools this year, and I guess 
this might be the first candidate.

To that end, I want to change the dump format to something capable of 
representing complex data (such as name_history), and which we already have 
generic parser/deparsers for, which means probably dlist or json.

dlist may be cheaper, since I think all those "fields" are already stored as 
dlist format in the database anyway.  It could also retain some pretence at 
backward compatibility, if the existing dump format was kept mostly as is, with 
the new fields added in a dlist at the end of each row.

json, on the other hand, would fit in with our general plan of having some kind 
of json output from every tool.  If I were going to use json I would want the 
dump format to be pure json, not json chunks appended at the end of structured 
lines.  Though this would not be compatible with dumping a db from an old 
version of Cyrus and undumping it onto a new one!

I don't know if dump/undump across versions is a thing we expect to work.  I 
don't think I would expect it myself, but there's almost certainly someone out 
there who does and will be put out if we change the format.  Though arguably, 
since it loses data at the moment anyway, replacing the format entirely isn't 
making it any worse.

I'm interested in your input, particularly with regards to:

* dlist or json?

* sort-of-backwards-compatible or not-at-all?

* dragons I should be aware of?

Cheers,

ellie

------------------------------------------
Cyrus: Devel
Permalink: 
https://cyrus.topicbox.com/groups/devel/Tea9b3c8c5728d1e8-M7b68e56d5c41dafa2acbba63
Delivery options: https://cyrus.topicbox.com/groups/devel/subscription

ctl_mboxlist dump and undump

Reply via email to