It's definitely safe to have one rolling mode writing and one repacking. I
wouldn't run multiple repacks in parallel, as they can wind up doing duplicate
work (though the end result should always be correct and safe).
Here's what we run:
# Any time the disk gets over 50%, compress -o single down to data
13 * * * * [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl -a
-o -d 50 temp data' %]
# Copy the temporary search databases down to data during the week
43 1 * * 1,2,3,4,5,6 [% INCLUDE cronjob
c='/home/mod_perl/hm/scripts/xapian_compact.pl -a temp,meta data' %]
# Sundays repack the entire data directory
43 1 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl -a
temp,meta,data data' %]
# Late on Sundays, pack any oversized data directories down to archive
0 15 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_archive.pl
-a' %]
And here's the interesting logic. In xapian_compact.pl:
if ($Opts{d}) {
my $Path = $Slot->SearchPath();
my $Usage = df($Path);
my $RunUsage = df("/run/cyrus");
return Process::Status->new(0) if ($Usage->{per} < $Opts{d} and
$RunUsage->{per} < $Opts{d});
}
my @args = (-z => $dest, -t => $src);
push @args, '-v' if $Opts{v};
push @args, '-o' if $Opts{o};
push @args, '-F' if $Opts{F};
push @args, '-X' if $Opts{X};
push @args, ('-T' => $Opts{T}) if $Opts{T};
push @args, ('-u' => $Opts{u}) if $Opts{u};
my %RunOpts = (
PrintOutput => 1,
);
$RunOpts{Nice} = 1 unless $Opts{N};
$RunOpts{Daemon} = 1 if $Opts{D};
$0 = "xapian_compact: $SN";
$Slot->RunCommand(\%RunOpts, 'squatter', @args);
And in xapian_archive.pl:
my $Percent = $Opts{P} || 20;
[...]
foreach my $user (sort keys %$DataUsage) {
my $au = $ArchiveUsage->{$user} || 1;
my $du = $DataUsage->{$user} || 1;
if ($du < 5000) {
print "Too small $user ($du)\n";
next;
}
my $This = int($du * 100 / $au);
if ($This < $Percent) {
print "Not enough dirty $user: ($du, $au)\n";
next;
}
print "Recompacting $user: ($du, $au)\n";
my @args = (-z => 'archive', -t => 'data,archive');
[...]
In summary, repack data down to archive if data is more than 1/5 size of
existing archive. So each of these scripts is a wrapper around squatter to help
it run automatically.
Bron.
On Mon, Feb 11, 2019, at 21:55, Egoitz Aurrekoetxea wrote:
> Now I'm noticing for instance, for moving data between Xapian databases.. you
> need to launch something like :
>
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -v -z
> archive -t temp,meta,data,archive -u user/[email protected]
>
>
> perhaps would be better to do :
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf _*-F*_ -v
> -z archive -t temp,meta,data,archive -u user/[email protected]
> But then, having two Squatter processes running at same time, one for rolling
> mode and one for moving/repacking data, should not be an issue?.
>
>
> Thanks mates!!
>
> ---
>
> sarenet
> *Egoitz Aurrekoetxea*
> Departamento de sistemas
> 944 209 470
> Parque Tecnológico. Edificio 103
> 48170 Zamudio (Bizkaia)
> [email protected]
> www.sarenet.es
>
> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
>
> El 11-02-2019 11:22, Egoitz Aurrekoetxea escribió:
>> Hi Bron,
>>
>> So, it would be interesting to run once a day... for instance in cyrus.conf
>> in events section :
>> repack_xapian cmd="squatter -F" at=0200
>> Is it needed top stop the other rolling Squatter we run, in same cyrus.conf
>> as :
>> START {
>> # do not delete this entry!
>> recover cmd="ctl_cyrusdb -r"
>>
>> squatter cmd="squatter -R"
>> }
>>
>> Thank you so much for all the clarifications mate :) really :)
>>
>> Cheers!
>> ---
>>
>> sarenet
>> *Egoitz Aurrekoetxea*
>> Departamento de sistemas
>> 944 209 470
>> Parque Tecnológico. Edificio 103
>> 48170 Zamudio (Bizkaia)
>> [email protected]
>> www.sarenet.es
>>
>> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
>>
>> El 11-02-2019 10:23, Bron Gondwana escribió:
>>> Conversations.db is an index over lots of interesting bits of the message,
>>> but the key part that's used by Xapian is the mapping from G key (aka:
>>> GUID, aka: sha1 of the message RFC822 data) to individual email. It's used
>>> for deduplication and for mapping from results to messages.
>>>
>>> The data in conversations.db is added and removed in real time as messages
>>> are appended and updated in the cyrus.index.
>>>
>>> The data in the xapian databases on the other hand is append only - so you
>>> can wind up with hits that no longer map to existing emails. The way to
>>> solve that is with a xapian repack that filters messages - which can be
>>> done using the -F flag to squatter.
>>>
>>> Cheers,
>>>
>>> Bron.
>>>
>>> On Sat, Feb 9, 2019, at 23:04, Egoitz Aurrekoetxea wrote:
>>>> Good morning,
>>>>
>>>> As far as I understood, for Xapian you first create it's conversation
>>>> database in order to work. Later you create database(s) for each mailbox
>>>> where Xapian can search in. You can move data between them, new mails
>>>> become indexed for instance Squatter in rolling mode... that's ok... and
>>>> understood I think. I was wondering, what happens when mail indexed in the
>>>> archive database in removed and then does not exist any more in the
>>>> database... does Squatter rolling log manage that too?.
>>>>
>>>> By the way. I was wondering if mail gets indexed in the tier databases
>>>> (for instance in Fastmail in temp, meta, data, archine...) what's the role
>>>> or function of conversations databases you create with ctl_conversationsdb
>>>> -b -r ?.
>>>>
>>>> Cheers!
>>>> --
>>>>
>>>> sarenet
>>>> *Egoitz Aurrekoetxea*
>>>> Departamento de sistemas
>>>> 944 209 470
>>>> Parque Tecnológico. Edificio 103
>>>> 48170 Zamudio (Bizkaia)
>>>> [email protected]
>>>> www.sarenet.es
>>>>
>>>> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
>>>> ----
>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>> To Unsubscribe:
>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>
>>> --
>>> Bron Gondwana, CEO, FastMail Pty Ltd
>>> [email protected]
>>>
>>>
>>>
>>> ----
>>> Cyrus Home Page: http://www.cyrusimap.org/
>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>> To Unsubscribe:
>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--
Bron Gondwana, CEO, FastMail Pty Ltd
[email protected]
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus