On 11/23/2016 11:27 AM, Prateek Jain J wrote:
> 1.       Solr is indexing engine but it stores both data and indexes in same 
> directory. Although we can select fields to store/persist in solr via 
> schema.xml. But in nutshell, it's not possible to distinguish between data 
> and indexes like, I can't remove all indexes and still have persisted data 
> with SOLR.

Solr uses Lucene for most of its functionality.  Although the Lucene
file format does have different files for stored data than it does for
the index, it's not separate enough that you can manually manipulate it
and delete one or the other while leaving part of it intact.  The files
that make up the Lucene index are NOT meant to be manipulated by
anything other than Lucene code.  Changing them in any other way can
lead to a corrupt index.

> 2.       Solr indexing capabilities are far better than any other nosql db 
> like mongodb etc. like faceting, weighted search.

This is vague.  Solr is good at search and associated details, databases
typically aren't.  I removed your next numbered point -- whether or not
mongodb uses shards doesn't matter.  Exactly how scaling happens isn't
all that important.
 
> 4.       We can have architecture where data is stored in separate db like 
> mongodb or mysql. SOLR can connect with db and index data (in SOLR).
>
> I tried googling for question "solr vs mongodb" and there are various threads 
> on sites like stackoverflow. But I still can't understand why would anyone go 
> for mongodb and when for SOLR (except for features like faceting, may be CAP 
> theorem). Are there any specific use-cases for choosing NoSQL databases like 
> mongoDB over SOLR?

Solr and MondoDB are designed for very different uses.  Although Solr
*can* be used as a NoSQL database, that is not what it is *designed*
for.  It is a *search engine*.  There are redundancy and scalability
features, and Solr does try really hard to never lose data, but it has
not been hardened against those problems.

Solr is good at combing a large dataset for random keywords plus other
filtering and returning the top N results, where N is typically a small
number that's two or three digits.  If you ask it for a million results,
it's going to be REALLY slow ... but if you ask a database for the same
thing, it is probably going to return it pretty quickly.

Those who have experience with Solr *as a search engine* will tell you
this:  "Always be prepared to completely rebuild your Solr indexes from
scratch, because a large percentage of changes will require a reindex." 
This is less problematic if you only use Solr as a data store, not for
searching ... but if that's the plan, why use Solr at all?

Slipping into the subjective:  This is purely my opinion.  Somewhat
informed, but still MY opinion:

I wouldn't use either Solr or MongoDB as the canonical datastore for
anything where I care about the reliability.  Solr is not designed for
it, and I've read from sources that are normally trustworthy that
MongoDB has serious issues with reliability.  Here's a couple of things
I found with only minimal poking:

https://aphyr.com/posts/322-jepsen-mongodb-stale-reads
http://hackingdistributed.com/2013/01/29/mongo-ft/

The Jepsen testing concluded that MongoDB had serious problems with its
architecture, not just bugs that lose data.

It's only fair to mention that SolrCloud was also subjected to Jepsen
testing.  Bugs were found, but because of its reliance on Zookeeper for
cluster management, it actually did fairly well:

https://lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-flaky-networks/

Thanks,
Shawn

Reply via email to