Auto correct not good Corrected below
Bill Bell Sent from mobile > On Aug 2, 2014, at 11:11 AM, Bill Bell <billnb...@gmail.com> wrote: > > Seems way overkill. Are you using /get at all ? If you need the docs avail > right away - why ? How about after 30 seconds ? How many docs do you get > added per second during peak ? Even Google has a delay when you do Adwords. > > One idea is to have an empty core that you insert into and then shard into > the queries. So one core would be called newdocs and then you would add this > core into your query. There are a couple issues with this with scoring but it > works nicely. I would not even use Solrcloud for that core. > > Try to reduce number of Java instances running. Reduce memory and use one > java per machine. > > Then if you need faster avail of docs you really need to ask why. Why not > later? Do you need search or just showing the user the info ? If for showing > maybe query a indexed table for the few not yet indexed ?? Or just store in a > db to show the user the info and index later? > > Bill Bell > Sent from mobile > > >> On Aug 1, 2014, at 4:19 AM, "anand.mahajan" <an...@zerebral.co.in> wrote: >> >> Hello all, >> >> Struggling to get this going with SolrCloud - >> >> Requirement in brief : >> - Ingest about 4M Used Cars listings a day and track all unique cars for >> changes >> - 4M automated searches a day (during the ingestion phase to check if a doc >> exists in the index (based on values of 4-5 key fields) or it is a new one >> or an updated version) >> - Of the 4 M - About 3M Updates to existing docs (for every non-key value >> change) >> - About 1M inserts a day (I'm assuming these many new listings come in >> every day) >> - Daily Bulk CSV exports of inserts / updates in last 24 hours of various >> snapshots of the data to various clients >> >> My current deployment : >> i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines >> - 24 Core + 96 GB RAM each. >> ii)There are over 190M docs in the SolrCloud at the moment (for all >> replicas its consuming overall disk 2340GB which implies - each doc is at >> about 5-8kb in size.) >> iii) The docs are split into 36 Shards - and 3 replica per shard (in all >> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs >> running on each host) >> iv) There are 60 fields per doc and all fields are stored at the moment :( >> (The backend is only Solr at the moment) >> v) The current shard/routing key is a combination of Car Year, Make and >> some other car level attributes that help classify the cars >> vi) We are mostly using the default Solr config as of now - no heavy caching >> as the search is pretty random in nature >> vii) Autocommit is on - with maxDocs = 1 >> >> Current throughput & Issues : >> With the above mentioned deployment the daily throughout is only at about >> 1.5M on average (Inserts + Updates) - falling way short of what is required. >> Search is slow - Some queries take about 15 seconds to return - and since >> insert is dependent on at least one Search that degrades the write >> throughput too. (This is not a Solr issue - but the app demands it so) >> >> Questions : >> >> 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing >> down indexing? Its a requirement that all docs are available as soon as >> indexed. >> >> 2. Should I have been better served had I deployed a Single Jetty Solr >> instance per server with multiple cores running inside? The servers do start >> to swap out after a couple of days of Solr uptime - right now we reboot the >> entire cluster every 4 days. >> >> 3. The routing key is not able to effectively balance the docs on available >> shards - There are a few shards with just about 2M docs - and others over >> 11M docs. Shall I split the larger shards? But I do not have more nodes / >> hardware to allocate to this deployment. In such case would splitting up the >> large shards give better read-write throughput? >> >> 4. To remain with the current hardware - would it help if I remove 1 replica >> each from a shard? But that would mean even when just 1 node goes down for a >> shard there would be only 1 live node left that would not serve the write >> requests. >> >> 5. Also, is there a way to control where the Split Shard replicas would go? >> Is there a pattern / rule that Solr follows when it creates replicas for >> split shards? >> >> 6. I read somewhere that creating a Core would cost the OS one thread and a >> file handle. Since a core repsents an index in its entirty would it not be >> allocated the configured number of write threads? (The dafault that is 8) >> >> 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance >> - Would separating the ZK cluster out help? >> >> Sorry for the long thread _ I thought of asking these all at once rather >> than posting separate ones. >> >> Thanks, >> Anand >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html >> Sent from the Solr - User mailing list archive at Nabble.com.