On 11/23/2016 11:27 AM, Prateek Jain J wrote: > 1. Solr is indexing engine but it stores both data and indexes in same > directory. Although we can select fields to store/persist in solr via > schema.xml. But in nutshell, it's not possible to distinguish between data > and indexes like, I can't remove all indexes and still have persisted data > with SOLR.
Solr uses Lucene for most of its functionality. Although the Lucene file format does have different files for stored data than it does for the index, it's not separate enough that you can manually manipulate it and delete one or the other while leaving part of it intact. The files that make up the Lucene index are NOT meant to be manipulated by anything other than Lucene code. Changing them in any other way can lead to a corrupt index. > 2. Solr indexing capabilities are far better than any other nosql db > like mongodb etc. like faceting, weighted search. This is vague. Solr is good at search and associated details, databases typically aren't. I removed your next numbered point -- whether or not mongodb uses shards doesn't matter. Exactly how scaling happens isn't all that important. > 4. We can have architecture where data is stored in separate db like > mongodb or mysql. SOLR can connect with db and index data (in SOLR). > > I tried googling for question "solr vs mongodb" and there are various threads > on sites like stackoverflow. But I still can't understand why would anyone go > for mongodb and when for SOLR (except for features like faceting, may be CAP > theorem). Are there any specific use-cases for choosing NoSQL databases like > mongoDB over SOLR? Solr and MondoDB are designed for very different uses. Although Solr *can* be used as a NoSQL database, that is not what it is *designed* for. It is a *search engine*. There are redundancy and scalability features, and Solr does try really hard to never lose data, but it has not been hardened against those problems. Solr is good at combing a large dataset for random keywords plus other filtering and returning the top N results, where N is typically a small number that's two or three digits. If you ask it for a million results, it's going to be REALLY slow ... but if you ask a database for the same thing, it is probably going to return it pretty quickly. Those who have experience with Solr *as a search engine* will tell you this: "Always be prepared to completely rebuild your Solr indexes from scratch, because a large percentage of changes will require a reindex." This is less problematic if you only use Solr as a data store, not for searching ... but if that's the plan, why use Solr at all? Slipping into the subjective: This is purely my opinion. Somewhat informed, but still MY opinion: I wouldn't use either Solr or MongoDB as the canonical datastore for anything where I care about the reliability. Solr is not designed for it, and I've read from sources that are normally trustworthy that MongoDB has serious issues with reliability. Here's a couple of things I found with only minimal poking: https://aphyr.com/posts/322-jepsen-mongodb-stale-reads http://hackingdistributed.com/2013/01/29/mongo-ft/ The Jepsen testing concluded that MongoDB had serious problems with its architecture, not just bugs that lose data. It's only fair to mention that SolrCloud was also subjected to Jepsen testing. Bugs were found, but because of its reliance on Zookeeper for cluster management, it actually did fairly well: https://lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-flaky-networks/ Thanks, Shawn