Hi Fergus, Does the 6.6m doc resides on a single box (node) or multiple boxes ? Do u use distributed search ?
Regards, Sourav ----- Original Message ----- From: Fergus McMenemie <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> Sent: Wed Nov 05 08:21:45 2008 Subject: Re: Large Data Set Suggestions >Greetings! > >I've been asked to do some indexing performance testing on Solr 1.3 >using large XML document data sets (10M-60M docs) with DIH versus SolrJ. > > >Does anyone have any suggestions where I might find a good data set this >size? > >I saw the wikipedia dump reference in the DIH wiki, but that is only in >the 7M+ doc range. > >Any suggestions would be greatly appreciated. > >Thanks, > >Steve How large should each document be? I quite often do testing using the geonames_dd_dms_date_20081028 dataset from http://earth-info.nga.mil/gns/html/namefiles.htm. It has 6.6M Documents. It is actually a CVS separated file but it is trivial to convert to XML. -- =============================================================== Fergus McMenemie Email:[EMAIL PROTECTED] Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer =============================================================== **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS***