On 1/7/2015 2:26 PM, Joseph Obernberger wrote: > Thank you Toke - yes - the data is indexed throughout the day. We are > handling very few searches - probably 50 a day; this is an R&D system. > Our HDFS cache, I believe, is too small at 10GBytes per shard. This > comes out to 20GBytes of HDFS cache per physical machine plus about > 10G each for the 2 JVMs running the shards. Each of those machines is > also running other services which leaves very little RAM available for > FS cache. > > Current parameters for running each shard are: > JAVA_OPTS="-XX:MaxDirectMemorySize=10g -XX:+UseLargePages > -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 > -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC > -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m > -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=70 -XX:CMSTriggerPermRatio=80 > -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled > -XX:+ParallelRefProcEnabled -XX:+AggressiveOpts > -XX:ParallelGCThreads=7 -Xmx10752m" > > I'd love to try SSDs, but don't have the budget at present to go that > route. I'd really like to get the HDFS option to work well as it > reduces system complexity. It seems to me that if our HDFS cluster > has lots/enough spindles, performance should be relatively good, as > long as the OS can actually do some caching. We will be adding more > HDFS nodes in the future, increasing spindle count and reducing the > amount of data stored into Solr. When we redo our Solr Cloud, we will > only run one shard per box, and supply more HDFS cache.
I can make very little comment about HDFS, because I've never used it. I can say that you want enough memory such that the data can be fully cached in the memory on the Solr machine. If you're in a situation where caching happens on the HDFS servers but then has to cross the network to get to Solr, then you'll have your network as a bottleneck ... a gigabit LAN is far slower than local RAM, and tends to be even slower than modern high-capacity disks, too. When it comes to GC options, I do have recent and relevant experience. Your GC options look a lot like the CMS options that I have been advising for quite a while ... but recently I have been getting better results with G1 and some specific tuning options on the latest Java versions. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks, Shawn