That's pretty interesting to use the autoincrementing document ID as a way
to keep track of what has not been indexed in Solr.  And you overwrite this
document ID even when you modify an existing document.  Very cool.  I
suppose the number can even rotate back to 0, as long as you handle that.

I am thinking of using a timestamp to achieve a similar thing. All documents
that have been accessed after the last Solr index need to be added to the
Solr index.  In fact, each name-value pair in Cassandra has a timestamp
associated with it, so I'm curious if I could simply use this.

I'm curious how you handle the delta-imports. Do you have some routine that
periodically checks for updates to your MySQL database via the document ID?
Which language do you use for that?

Thanks,
Ben

On Tue, Mar 15, 2011 at 9:12 AM, Shawn Heisey <s...@elyograg.org> wrote:

> On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote:
>
>> But my main question is, how do I guarantee that data between my Cassandra
>> database and Solr index are consistent and up-to-date?
>>
>
> Our MySQL database has two unique indexes.  One is a document ID,
> implemented in MySQL as an autoincrement integer and in Solr as a long.  The
> other is what we call a tag id, implemented in MySQL as a varchar and Solr
> as a single lowercased token and serving as Solr's uniqueKey.  We have an
> update trigger on the database that updates the document ID whenever the
> database document is updated.
>
> We have a homegrown build system for Solr.  In a nutshell, it keeps track
> of the newest document ID in the Solr Index.  If the DIH delta-import fails,
> it doesn't update the stored ID, which means that on the next run, it will
> try and index those documents again.  Changes to the entries in the database
> are automatically picked up because the document ID is newer, but the tag id
> doesn't change, so the document in Solr is overwritten.
>
> Things are actually more complex than I've written, because our index is
> distributed.  Hopefully it can give you some ideas for yours.
>
> Shawn
>
>

Reply via email to