Because there is a lot of data, and for scalability reasons we want all
non-write operations to happen from a slave - we don't want to be using
the master unless necessary
On 17/04/10 08:28, Otis Gospodnetic wrote:
Hm, why not just go to the MySQL master then?
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
----- Original Message ----
From: Michael Tibben<michael.tib...@stomp.com.au>
To: solr-user@lucene.apache.org
Sent: Thu, April 15, 2010 10:15:14 PM
Subject: DIH dataimport.properties with
Hi,
I am using the DIH to import data from a mysql slave. However, the
slave sometimes runs behind the master. The delay is variable, most of the time
it is in sync, but sometimes can run behind by a few minutes.
This is a
problem, because DIH uses dataimport.properties to determine the last_index_time
for delta updates. This last_index_time does not correspond to the position of
the slave, and so documents are being missed.
What I need to be able to
do is tell DIH what the last_index_time should be. Or alternatively, be able to
specify another property in dataimport.properties, perhaps called
datasource_version or similar.
Is this possible?
I have
thought of a sneaky way to hack around the issue. Just before the delta update
is run, I will switch the system time to the mysql slave's replication time. The
system is used for nothing but solr master, so I think this should work OK. Any
thoughts?
Regards,
Michael