You said that the data expires, but you haven't said how many docs you need to host at a time. At 10M/second inserts you'll.... need a boatload of shards. All of the conversation about one beefy machine .vs. lots of not-so-beefy machines should wait until you answer that question. For instance, indexing some _very_ simple documents on my laptop can hit 10,000 docs/second/shard. So you're talking 1,000 shards here. Indexing more complex docs I might get 1,000 docs/second/shard, so then you'd need 10,000 shards. Don't take these as hard numbers, I'm just trying to emphasize that you'll need to do scaling exercises to see if what you want to do is reasonable given your constraints.
If those 10M docs/second are bursty and you can stand some latency, then that's one set of considerations. If it's steady-state it's another. In either case you need some _serious_ design work before you go forward. And then you want to facet (fq clauses aren't nearly as expensive) and want 2 second commit intervals. You _really_ need to stand up some test systems and see what performance you can get before launching off on trying to do the whole thing. Fortunately, you can stand up, say, a 4 shard system and tune it and drive it as fast as you possibly can and extrapolate from there. But to reiterate. This is a very high indexing rate that very few organizations have attempted. You _really_ need to do a proof-of-concept _then_ plan. Here's the long form of this argument: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Fri, Dec 16, 2016 at 5:19 AM, GW <thegeofo...@gmail.com> wrote: > Layer 2 bridge SAN is just for my Apache/apps on Conga so they can be spun > on up any host with a static IP. This has nothing to do with Solr which is > running on plain old hardware. > > Solrcloud is on a real cluster not on a SAN. > > The bit about dead with no error. I got this from a post I made asking > about the best way to deploy apps. Was shown some code on making your app > zookeeper aware. I am just getting to this so I'm talking from my ass. A ZK > aware program will have a list of nodes ready for business verses a plain > old Round Robin. If data on a machine is corrupted you can get 0 docs found > while a ZK aware app will know that node is shite. > > > > > > > > On 16 December 2016 at 07:20, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > >> On Fri, Dec 16, 2016 at 12:39 PM, GW <thegeofo...@gmail.com> wrote: >> >> > Dorian, >> > >> > From my reading, my belief is that you just need some beefy machines for >> > your zookeeper ensemble so they can think fast. >> >> Zookeeper need to think fast enough for cluster state/changes. So I think >> it scales with the number of machines/collections/shards and not documents. >> >> > After that your issues are >> > complicated by drive I/O which I believe is solved by using shards. If >> you >> > have a collection running on top of a single drive array it should not >> > compare to writing to a dozen drive arrays. So a whole bunch of light >> duty >> > machines that have a decent amount of memory and barely able process >> faster >> > than their drive I/O will serve you better. >> > >> My dataset will be lower than total memory, so I expect no query to hit >> disk. >> >> > >> > I think the Apache big data mandate was to be horizontally scalable to >> > infinity with cheap consumer hardware. In my minds eye you are not going >> to >> > get crazy input rates without a big horizontal drive system. >> > >> There is overhead with small machines, and with very big machines (pricy). >> So something in the middle. >> So small cluster of big machines or big cluster of small machines. >> >> > >> > I'm in the same boat. All the scaling and roll out documentation seems to >> > reference the Witch Doctor's secret handbook. >> > >> > I just started into making my applications ZK aware and really just >> > starting to understand the architecture. After a whole year I still feel >> > weak while at the same time I have traveled far. I still feel like an >> > amateur. >> > >> > My plans are to use bridge tools in Linux so all my machines are sitting >> on >> > the switch with layer 2. Then use Conga to monitor which apps need to be >> > running. If a server dies, it's apps are spun up on one of the other >> > servers using the original IP and mac address through a bridge firewall >> > gateway so there is no hold up with with mac phreaking like layer 3. >> Layer >> > 3 does not like to see a route change with a mac address. My apps will be >> > on a SAN ~ Data on as many shards/machines as financially possible. >> > >> By conga you mean https://sourceware.org/cluster/conga/spec/ ? >> Also SAN may/will suck like someone answered in your thread. >> >> > >> > I was going to put a bunch of Apache web servers in round robin to talk >> to >> > Solr but discovered that a Solr node can be dead and not report errors. >> > >> Please explain more "dead but no error". >> >> > It's all rough at the moment but it makes total sense to send Solr >> requests >> > based on what ZK says is available verses a round robin. >> > >> Yes, like I&other commenter wrote on your thread. >> >> > >> > Will keep you posted on my roll out if you like. >> > >> > Best, >> > >> > GW >> > >> > >> > >> > >> > >> > >> > >> > On 16 December 2016 at 03:31, Dorian Hoxha <dorian.ho...@gmail.com> >> wrote: >> > >> > > Hello searchers, >> > > >> > > I'm researching solr for a project that would require a >> > max-inserts(10M/s) >> > > and some heavy facet+fq on top of that, though on low qps. >> > > >> > > And I'm trying to find blogs/slides where people have used some big >> > > machines instead of hundreds of small ones. >> > > >> > > 1. Largest I've found is this >> > > <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs- >> > > 4-machines-1-solrcloud/> >> > > with 16cores + 384GB ram but they were using 25! solr4 instances / >> server >> > > which seems wasteful to me ? >> > > >> > > I know that 1 solr can have max ~29-30GB heap because GC is >> > wasteful/sucks >> > > after that, and you should leave the other amount to the os for >> > file-cache. >> > > 2. But do you think 1 instance will be able to fully-use a 256GB/20core >> > > machine ? >> > > >> > > 3. Like to share your findings/links with big-machine clusters ? >> > > >> > > Thank You >> > > >> > >>