It's definitely safe to have one rolling mode writing and one repacking. I 
wouldn't run multiple repacks in parallel, as they can wind up doing duplicate 
work (though the end result should always be correct and safe).

Here's what we run:

# Any time the disk gets over 50%, compress -o single down to data
13 * * * * [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl -a 
-o -d 50 temp data' %]
# Copy the temporary search databases down to data during the week
43 1 * * 1,2,3,4,5,6 [% INCLUDE cronjob 
c='/home/mod_perl/hm/scripts/xapian_compact.pl -a temp,meta data' %]
# Sundays repack the entire data directory
43 1 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl -a 
temp,meta,data data' %]
# Late on Sundays, pack any oversized data directories down to archive
0 15 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_archive.pl 
-a' %]

And here's the interesting logic. In xapian_compact.pl:

 if ($Opts{d}) {
 my $Path = $Slot->SearchPath();
 my $Usage = df($Path);
 my $RunUsage = df("/run/cyrus");
 return Process::Status->new(0) if ($Usage->{per} < $Opts{d} and 
$RunUsage->{per} < $Opts{d});
 }

 my @args = (-z => $dest, -t => $src);
 push @args, '-v' if $Opts{v};
 push @args, '-o' if $Opts{o};
 push @args, '-F' if $Opts{F};
 push @args, '-X' if $Opts{X};
 push @args, ('-T' => $Opts{T}) if $Opts{T};
 push @args, ('-u' => $Opts{u}) if $Opts{u};
 my %RunOpts = (
 PrintOutput => 1,
 );
 $RunOpts{Nice} = 1 unless $Opts{N};
 $RunOpts{Daemon} = 1 if $Opts{D};

 $0 = "xapian_compact: $SN";
 $Slot->RunCommand(\%RunOpts, 'squatter', @args);

And in xapian_archive.pl:

my $Percent = $Opts{P} || 20;
[...]

 foreach my $user (sort keys %$DataUsage) {
 my $au = $ArchiveUsage->{$user} || 1;
 my $du = $DataUsage->{$user} || 1;
 if ($du < 5000) {
 print "Too small $user ($du)\n";
 next;
 }
 my $This = int($du * 100 / $au);
 if ($This < $Percent) {
 print "Not enough dirty $user: ($du, $au)\n";
 next;
 }
 print "Recompacting $user: ($du, $au)\n";
 my @args = (-z => 'archive', -t => 'data,archive');
[...]
 
In summary, repack data down to archive if data is more than 1/5 size of 
existing archive. So each of these scripts is a wrapper around squatter to help 
it run automatically.

Bron.


On Mon, Feb 11, 2019, at 21:55, Egoitz Aurrekoetxea wrote:
> Now I'm noticing for instance, for moving data between Xapian databases.. you 
> need to launch something like :


> 
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -v -z 
> archive -t temp,meta,data,archive -u user/ego...@sarenet.es
> 
> 
> perhaps would be better to do :
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf _*-F*_ -v 
> -z archive -t temp,meta,data,archive -u user/ego...@sarenet.es
> But then, having two Squatter processes running at same time, one for rolling 
> mode and one for moving/repacking data, should not be an issue?.
> 
> 
> Thanks mates!!
> 
> ---
>  
> sarenet
> *Egoitz Aurrekoetxea*
> Departamento de sistemas
> 944 209 470
> Parque Tecnológico. Edificio 103
> 48170 Zamudio (Bizkaia)
> ego...@sarenet.es
> www.sarenet.es
> 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
> 


> El 11-02-2019 11:22, Egoitz Aurrekoetxea escribió:


>> Hi Bron,


>> 


>> So, it would be interesting to run once a day... for instance in cyrus.conf 
>> in events section :


>> repack_xapian cmd="squatter -F" at=0200


>> Is it needed top stop the other rolling Squatter we run, in same cyrus.conf 
>> as :




>> START {
>>  # do not delete this entry!
>>  recover cmd="ctl_cyrusdb -r"
>> 
>>  squatter cmd="squatter -R"
>> }


>> 


>> Thank you so much for all the clarifications mate :) really :)


>> 


>> Cheers!


>> ---
>>  
>> sarenet
>> *Egoitz Aurrekoetxea*
>> Departamento de sistemas
>> 944 209 470
>> Parque Tecnológico. Edificio 103
>> 48170 Zamudio (Bizkaia)
>> ego...@sarenet.es
>> www.sarenet.es
>> 
>> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
>> 


>> El 11-02-2019 10:23, Bron Gondwana escribió:


>>> Conversations.db is an index over lots of interesting bits of the message, 
>>> but the key part that's used by Xapian is the mapping from G key (aka: 
>>> GUID, aka: sha1 of the message RFC822 data) to individual email. It's used 
>>> for deduplication and for mapping from results to messages.
>>>  
>>> The data in conversations.db is added and removed in real time as messages 
>>> are appended and updated in the cyrus.index.
>>>  
>>> The data in the xapian databases on the other hand is append only - so you 
>>> can wind up with hits that no longer map to existing emails. The way to 
>>> solve that is with a xapian repack that filters messages - which can be 
>>> done using the -F flag to squatter.
>>>  
>>> Cheers,
>>>  
>>> Bron.
>>>  
>>> On Sat, Feb 9, 2019, at 23:04, Egoitz Aurrekoetxea wrote:
>>>> Good morning,


>>>> 


>>>> As far as I understood, for Xapian you first create it's conversation 
>>>> database in order to work. Later you create database(s) for each mailbox 
>>>> where Xapian can search in. You can move data between them, new mails 
>>>> become indexed for instance Squatter in rolling mode... that's ok... and 
>>>> understood I think. I was wondering, what happens when mail indexed in the 
>>>> archive database in removed and then does not exist any more in the 
>>>> database... does Squatter rolling log manage that too?.


>>>> 


>>>> By the way. I was wondering if mail gets indexed in the tier databases 
>>>> (for instance in Fastmail in temp, meta, data, archine...) what's the role 
>>>> or function of conversations databases you create with ctl_conversationsdb 
>>>> -b -r ?.


>>>> 


>>>> Cheers!


>>>> --
>>>>  
>>>> sarenet
>>>> *Egoitz Aurrekoetxea*
>>>> Departamento de sistemas
>>>> 944 209 470
>>>> Parque Tecnológico. Edificio 103
>>>> 48170 Zamudio (Bizkaia)
>>>> ego...@sarenet.es
>>>> www.sarenet.es
>>>>  
>>>> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
>>>> ----
>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>> To Unsubscribe:
>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>  
>>> --
>>>  Bron Gondwana, CEO, FastMail Pty Ltd
>>>  br...@fastmailteam.com
>>>  
>>>  
>>> 
>>> ----
>>>  Cyrus Home Page: http://www.cyrusimap.org/
>>>  List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>  To Unsubscribe:
>>>  https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 br...@fastmailteam.com

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to