Obviously as the number of documents increase the index size must
increase to some degree -- I think linearly? But what index size will
result for 7M documents over 50K words where we're talking just 2 fields
per doc: 1 id field and one OCR field of ~1.4M? Ballpark?
Regarding single word queries, do you think, say, 0.5 sec/query to
return 7M score-ranked IDs is possible/reasonable in this scenario?
The only real advice I can add is to give it a try. If you have test
data, try testing it and see what happens. 1/2 sec queries is likely
possible with the right hardware and settings -- but run a few tests
before signing any contracts ;) If the index is really large, SOLR-303
should help make it more managable.
Let us know how things go and post add data to:
http://wiki.apache.org/solr/SolrPerformanceData
ryan