Re: indexing bigdata

Robert Stewart Fri, 09 Mar 2012 02:59:33 -0800

It very much depends on your data and also what query features you will use.  
How many fields, the size of each field, how many unique values per field, how 
many fields are stored vs. only indexed, etc.  I have a system with 3+ billion 
does, and each instance (each index core) has 120million docs and it flies.  
But the documents are tiny only 3 fields each, and the search is very simple 
single keyword match.  On another system we only have 7 million docs per 
instance and it is slower because documents are much much larger with many more 
fields, and we do a lot of faceting and other advanced search features.


Also other factors such as what type of features you will use for search 
(faceting, field collapsing, wildcard queries, etc.) can all increase search 
time vs. just simple keyword search.

Unfortunately it is one of those things you need to try it out to really get an 
answer IMO.


On Mar 8, 2012, at 11:39 PM, Sharath Jagannath wrote:

> Ok, My bad. I should have put it in a better way.
> Is it good idea to have all the 30M docs on a single instance, or should I
> consider distributed set-up.
> I have synthesized the data and the have configured schema and have made
> suitable changes to the config. Have tested out with a smaller data-set on
> my laptop and have a good work flow set-up.
> 
> I do not have a big machine and test it out.
> Wanted to make sure I have insight in either option I have before I decide
> to spin-up an amazon instance.
> 
> Thanks,
> Sharath
> 
> On Thu, Mar 8, 2012 at 6:18 PM, Erick Erickson <erickerick...@gmail.com>wrote:
> 
>> Your question is really unanswerable, there are about a zillion
>> factors that could influence the answer. I can index 5-7K docs/second
>> so it's "efficient". Others can index only a fraction of that. It all
>> depends...
>> 
>> Try it and see is about the only way to answer.
>> 
>> Best
>> Erick
>> 
>> On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
>> <shotsonclo...@gmail.com> wrote:
>>> Is indexing around 30 Million documents in a single solr instance
>> efficient?
>>> Has somebody experimented it? Planning to use it for an autosuggest
>> feature
>>> I am implementing, so expecting the response in few milliseconds.
>>> Should I be looking at sharding?
>>> 
>>> Thanks,
>>> Sharath
>>

Re: indexing bigdata

Reply via email to