Take a look at DataStax Enterprise, which is basically Cassandra with Solr
tightly integrated as an embedded search engine. Write and update your data
in Cassandra and it will automatically be indexed in Solr, all in one
cluster, so no need to build and maintain a separate SolrCloud cluster just
to search your NoSQL data in Cassandra. The data is stored in Cassandra but
indexed in Solr. Solr runs in the same JVM as Cassandra for efficient
indexing of new and updated data - none of the synchronization issues of an
ETL or trigger-based approach with a separate search platform.
(Disclosure: I am a contractor for DataStax. I'm their "Domain Expert for
Search/Solr".)
-- Jack Krupansky
-----Original Message-----
From: andrey prokopenko
Sent: Wednesday, November 5, 2014 8:52 AM
To: solr-user@lucene.apache.org
Subject: on regards to Solr and NoSQL storages integration
Greetings Comrades.
There were numerous requests and considerations on using Solr as both
search engine and NoSQL store at the same time.
While being an excellent tool as a search engine, Solr is looking not so
good when it comes to storing documents and various stored fields,
especially with big amount of data. Index quickly grows to unmanageable
sizes. Then, there is ever-coming PITA problem
with partial document update: due to the nature of Lucene/Solr index,
documents can't be updated, they always need to be deleted & inserted.
All in all, Solr desperately need a tight integration with some document
storage, offloading stored fields of the document and
transactionally coupled with search index itself, so that stored field are
at all times synced with the other parts
of the index (terms, doc values etc.).
Unfortunately, unlike Lucene, Solr does not offer full set of distribiuted
transaction API commands, thus seriously complicating this matter. Luckily,
with advent of Solr 4.0 now we have abilitu to create not only the custom
Directory, but also completely tweak the index structure any way we like.
Based on this new feature I've created my custom Directory + custom codec,
integrating Solr with Oracle NoSQL key-value store.
My codec is based on Solr 4.10.1 API and Oracle NoSQL 1.2.1.8 Community
Edition. Fields in NoSQL storage are persisted using primary key, derived
from the document fields. The codec relays stored fields to the NOSQL store
while keeping all other index components in usual file-based storage
layout. The codec has been made with SolrCloud and NoSQL own fault
tolerance usage in mind, hence it's tried to ignore wrote commands to NoSQL
storage if index is being created at replica node which is not a Solr shard
leader currently. First stable version of the codec transparently supports
full index life cycle, includung segment creation, merging and deletion.
Source code and readme, detaling usage instructions for the codec can be
found at github: https://github.com/andrey42/onsqlcodec
I assume, there might be other developers, trying to solve similar
problems, so I'd be interested to hear about similar attempts & issues
encountered while trying to implement such an integration between Solr and
other NoSQL databases.