>Greetings! > >I've been asked to do some indexing performance testing on Solr 1.3 >using large XML document data sets (10M-60M docs) with DIH versus SolrJ. > > >Does anyone have any suggestions where I might find a good data set this >size? > >I saw the wikipedia dump reference in the DIH wiki, but that is only in >the 7M+ doc range. > >Any suggestions would be greatly appreciated. > >Thanks, > >Steve
How large should each document be? I quite often do testing using the geonames_dd_dms_date_20081028 dataset from http://earth-info.nga.mil/gns/html/namefiles.htm. It has 6.6M Documents. It is actually a CVS separated file but it is trivial to convert to XML. -- =============================================================== Fergus McMenemie Email:[EMAIL PROTECTED] Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================