Previously in the list a spreadsheet has been mentioned, taking into account 
that you already have documents in an index you could extract the needed 
information from your index and feed it into the spreadsheet and it probably 
will give you a rough approximated of the hardware you’ll bee needing. Also if 
I’m not mistaken no SolrCloud approximation is provided by this “tool”.

Greetings!

On Jan 28, 2014, at 11:02 PM, Susheel Kumar <susheel.ku...@thedigitalgroup.net> 
wrote:

> Thanks, Jack. That helps.
> 
> -----Original Message-----
> From: Jack Krupansky [mailto:j...@basetechnology.com] 
> Sent: Tuesday, January 28, 2014 8:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
> 
> Lucene and Solr work best if the full index can be cached in OS memory. 
> Sure, Lucene/Solr does work properly once the index no longer fits, but 
> performance will drop off.
> 
> I would say that you could fit 100 million moderate-size documents on a 
> single Solr server - provided that you give the OS enough RAM for the full 
> Lucene index. That said, if you want to configure a SolrCloud cluster with 
> shards, you can use more modest, commodity servers with less RAM, provided 
> each server still fits it's fraction of the total Lucene index in that 
> server's OS memory (file cache.)
> 
> You may also need to add replicas for each shard to accommodate query load - 
> proof-of-concept testing is needed to verify that. It is worth noting that 
> sharding can improve total query performance since each node only searches a 
> fraction of the total data and those searches are done in parallel  (since 
> they are on different machines.)
> 
> -- Jack Krupansky
> 
> -----Original Message-----
> From: Susheel Kumar
> Sent: Sunday, January 26, 2014 10:54 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr server requirements for 100+ million documents
> 
> Thank you Erick for your valuable inputs. Yes, we have to re-index data again 
> & again. I'll look into possibility of tuning db access.
> 
> On SolrJ and automating the indexing (incremental as well as one time) I want 
> to get your opinion on below two points. We will be indexing separate sets of 
> tables with similar data structure
> 
> - Should we use SolrJ and write Java programs that can be scheduled to 
> trigger indexing on demand/schedule based.
> 
> - Is using SolrJ a better idea even for searching than using SolrNet? As our 
> frontend is in .Net so we started using SolrNet but I am afraid down the road 
> when we scale/support SolrClod using SolrJ is better?
> 
> 
> Thanks
> Susheel
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, January 26, 2014 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
> 
> Dumping the raw data would probably be a good idea. I guarantee you'll be 
> re-indexing the data several times as you change the schema to accommodate 
> different requirements...
> 
> But it may also be worth spending some time figuring out why the DB access is 
> slow. Sometimes one can tune that.
> 
> If you go the SolrJ route, you also have the possibility of setting up N 
> clients to work simultaneously, sometimes that'll help.
> 
> FWIW,
> Erick
> 
> On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar 
> <susheel.ku...@thedigitalgroup.net> wrote:
>> Hi Kranti,
>> 
>> Attach are the solrconfig & schema xml for review. I did run indexing 
>> with just few fields (5-6 fields) in schema.xml & keeping the same db 
>> config but Indexing almost still taking similar time (average 1 
>> million records 1
>> hr) which confirms that the bottleneck is in the data acquisition 
>> which in our case is oracle database. I am thinking to not use 
>> dataimporthandler / jdbc to get data from Oracle but to rather dump 
>> data somehow from oracle using SQL loader and then index it. Any thoughts?
>> 
>> Thnx
>> 
>> -----Original Message-----
>> From: Kranti Parisa [mailto:kranti.par...@gmail.com]
>> Sent: Saturday, January 25, 2014 12:08 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr server requirements for 100+ million documents
>> 
>> can you post the complete solrconfig.xml file and schema.xml files to 
>> review all of your settings that would impact your indexing performance.
>> 
>> Thanks,
>> Kranti K. Parisa
>> http://www.linkedin.com/in/krantiparisa
>> 
>> 
>> 
>> On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar < 
>> susheel.ku...@thedigitalgroup.net> wrote:
>> 
>>> Thanks, Svante. Your indexing speed using db seems to really fast.
>>> Can you please provide some more detail on how you are indexing db 
>>> records. Is it thru DataImportHandler? And what database? Is that 
>>> local db?  We are indexing around 70 fields (60 multivalued) but data 
>>> is not populated always in all fields. The average size of document 
>>> is in
>>> 5-10 kbs.
>>> 
>>> -----Original Message-----
>>> From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf 
>>> Of svante karlsson
>>> Sent: Friday, January 24, 2014 5:05 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr server requirements for 100+ million documents
>>> 
>>> I just indexed 100 million db docs (records) with 22 fields (4
>>> multivalued) in 9524 sec using libcurl.
>>> 11 million took 763 seconds so the speed drops somewhat with 
>>> increasing dbsize.
>>> 
>>> We write 1000 docs (just an arbitrary number) in each request from 
>>> two threads. If you will be using solrcloud you will want more writer 
>>> threads.
>>> 
>>> The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with 
>>> one SSD and 32GB and the solr runs on ubuntu 13.10 inside a esxi 
>>> virtual machine.
>>> 
>>> /svante
>>> 
>>> 
>>> 
>>> 
>>> 2014/1/24 Susheel Kumar <susheel.ku...@thedigitalgroup.net>
>>> 
>>>> Thanks, Erick for the info.
>>>> 
>>>> For indexing I agree the more time is consumed in data acquisition 
>>>> which in our case from Database.  For indexing currently we are 
>>>> using the manual process i.e. Solr dashboard Data Import but now 
>>>> looking to automate.  How do you suggest to automate the index part.
>>>> Do you recommend to use SolrJ or should we try to automate using Curl?
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>>> Sent: Friday, January 24, 2014 2:59 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Solr server requirements for 100+ million documents
>>>> 
>>>> Can't be done with the information you provided, and can only be 
>>>> guessed at even with more comprehensive information.
>>>> 
>>>> Here's why:
>>>> 
>>>> 
>>>> http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why
>>>> -
>>>> we
>>>> -dont-have-a-definitive-answer/
>>>> 
>>>> Also, at a guess, your indexing speed is so slow due to data 
>>>> acquisition; I rather doubt you're being limited by raw Solr indexing.
>>>> If you're using SolrJ, try commenting out the
>>>> server.add() bit and running again. My guess is that your indexing 
>>>> speed will be almost unchanged, in which case it's the data 
>>>> acquisition process is where you should concentrate efforts. As a 
>>>> comparison, I can index 11M Wikipedia docs on my laptop in 45 
>>>> minutes without any attempts at parallelization.
>>>> 
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Fri, Jan 24, 2014 at 12:10 PM, Susheel Kumar < 
>>>> susheel.ku...@thedigitalgroup.net> wrote:
>>>>> Hi,
>>>>> 
>>>>> Currently we are indexing 10 million document from database (10 
>>>>> db data
>>>> entities) & index size is around 8 GB on windows virtual box.
>>>> Indexing in one shot taking 12+ hours while indexing parallel in 
>>>> separate cores & merging them together taking 4+ hours.
>>>>> 
>>>>> We are looking to scale to 100+ million documents and looking for
>>>> recommendation on servers requirements on below parameters for a 
>>>> Production environment. There can be 200+ users performing search 
>>>> same
>>> time.
>>>>> 
>>>>> No of physical servers (considering solr cloud) Memory 
>>>>> requirement Processor requirement (# cores) Linux as OS oppose to 
>>>>> windows
>>>>> 
>>>>> Thanks in advance.
>>>>> Susheel
>>>>> 
>>>> 
>>> 
> 

________________________________________________________________________________________________
III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

Reply via email to