Re: getting started

Jonathan Rochkind Thu, 16 Jun 2011 15:22:20 -0700

On 6/16/2011 4:41 PM, Mari Masuda wrote:

One reservation I have is that eventually we would like to be able to type in "Iraq" and 
find records across all of the collections at once instead of having to search each collection 
separately.  Although I don't know anything about it at this stage, I did Google 
"sharding" after reading someone's recent post on this list and it sounds like that may 
be a potential answer to my question.

So this kind of stuff can be tricky, but with that eventual requirementI would NOT put these in seperate cores. Sharding isn't (IMO, if someonedisagrees, they will hopefully say so!) a good answer to searchingaccross entirely different 'schemas', or avoiding frequent-commit issues-- sharding is really just for scaling/performance when your index getsvery very large. (Which it doesn't sound like yours will be, but you candeal with that as a separate issue if it becomes so).

If you're going to want to search across all the collections, put themall in the same core. Either in the exact same indexed fields, or usingcertain common indexed fields -- those common ones are the ones you'llbe able to search across all collections on. It's okay if somecollections have unique indexed fields too --- documents in the corethat don't belong to that collection just won't have any terms in thatindexed field that is only used by a certain collection, no problem.(Then you can distribute this single core into shards if you need to forperformance reasons related to number of documents/size of index).

You're right to be thinking about the fact that very frequent commitscan be performance issues in Solr. But separating in different cores isgoing to create more problems for yourself (if you want to be able tosearch accross all collections), in an attempt to solve that one.(Among other things, not every Solr feature works in adistributed/sharded environment, it's just a more complicated andsomewhat less mature setup for Solr).

The way I deal with the frequent-commit issue is by NOT doing frequentcommits to my production Solr. Instead, I use Solr replication to have a'master' Solr index that I do commits to whenever I want, and a 'slave'Solr index that serves the production searches, and which onlyreplicates from master periodically -- not too often to betoo-frequent-commits. That seems to be a somewhat common solution, ifthat use pattern works for you.

There are also some "near real time" features in more recent versions ofSolr, that I'm not very familiar with. (not sure if any are included inthe current latest release, or if they are all only still in the repo)My sense is that they too only work for certain use patterns, theyaren't magic bullets for "commit whatever you want as often as you wantto Solr". In general Solr isn't so great at very frequent major changesto the index. Depending on exactly what sort of use pattern you arepredicting/planning for your commits, maybe people can give you adviceon how (or if) to do it.

But I personally don't think your idea of splitting your collections(that you'll eventually want to search accross into a single search)into shards is a good solution to frequent-commit issues. You'd becomplicating your setup and causing other problems for yourself, and notreally even entirely addressing the too-frequent-commit issue with thatsetup.

Re: getting started

Reply via email to