Greetings! I've been asked to do some indexing performance testing on Solr 1.3 using large XML document data sets (10M-60M docs) with DIH versus SolrJ.
Does anyone have any suggestions where I might find a good data set this size? I saw the wikipedia dump reference in the DIH wiki, but that is only in the 7M+ doc range. Any suggestions would be greatly appreciated. Thanks, Steve