Hi. I currently have an index which is 16GB per machine (8 machines = 128GB) (data is stored externally, not in index) and is growing like crazy (we are indexing blogs which is crazy by nature) and have only allocated 2GB per machine to the SOLR app since we are running some other stuff there in parallell.
Each doc should be roughly the size of a blog post, no more than 20k. We currently have about 90M documents and it is increasing rapidly so getting into the G+ document range is not going to be too far away. Now due to search performance I think I need to move these instances to dedicated index/search machines (or index on some machines and search on others). Anyway I would like to get some feedback about two things: 1. What is the most important hardware aspect when it comes to add document to the index and optimize it. 1.1 Is it disk I|O write throghput ? (sequential or random-io ?) 1.2 Is it RAM ? 1.3 Is is CPU ? My guess would be disk-io, right, wrong ? 2. What is the most important hardware aspect when it comes to searching documents in my setup ? (result-set is limited to return only the top 10 matches with page handling) We facet and sort on the publishedDate of the entry (memory intensive I presume) 2.1 Is it disk read throughput ? (sequential or random-io ?) 2.2 Is it RAM ? 2.3 Is is CPU ? I have no clue since the data might not fit into memory. What is then the most important factor ? read-performance while scanning the index ? CPU while comparing fields and collecting results ? What I'm trying to find out is what I can do to get most bang for the buck with a limited (aren't we all limited?) budget. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/