I did read to the end—and though I don't have anything useful to contribute, I appreciate the discussion.
One random thought that's not directly related: I wonder if people who aren't search/database aficionados would appreciate adding Elasticsearch, Lucene, CirrusSearch, nodes, shards, primaries and replicas to the Glossary. They are all things that can be looked up, but a quick description in the glossary could be helpful as a refresher. It's especially helpful when you know the concepts, but you've forgotten which name is which—which one's recall and which one's precision? Which is a shard and which is a node? Thanks! —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation On Fri, Sep 16, 2016 at 11:51 AM, David Causse <[email protected]> wrote: > Le 16/09/2016 à 16:28, Guillaume Lederrey a écrit : > >> [...] >> >> The enwiki_content example: >> >> enwiki_content index is configured to have 6 shards and 3 replicas, >> for a total number of 24 shards. It also has the additional constraint >> that there is at most 1 enwiki_content per node. This ensures a >> maximum spread of enwiki_content shards over the cluster. Since >> enwiki_content is one of the index with the most traffic, this ensure >> that the load is well distributed over the cluster. >> > side note: in mediawiki config we updated shard count to 6 for > enwiki_content and set replica count to 4 for eqiad. This is still not > effective since we haven't rebuilt the eqiad enwiki index yet. > In short: > - eqiad effective settings for enwiki: 7*(3r+1p) => 28 shards > - eqiad future settings after a reindex: 6*(4r+1p) => 30 shards > - codfw for enwiki: 6*(3r+1p) => 28 shards > > Now the bad news: for codfw, which is a 24 node cluster, it means that >> reaching this perfect equilibrium of 1 shard per node is a serious >> challenge if you take into account the other constraint in place. Even >> with relaxing the constraint to 2 enwiki shards per node, we have seen >> unassigned shards during elasticsearch upgrade. >> >> >> Potential improvements: >> >> While ensuring that a large index has a number of shards close to the >> number of nodes in the cluster allows for optimally spreading load >> over the cluster, it degrade fast if all the stars are not aligned >> perfectly. There are 2 opposite solutions >> >> 1) decrease the number of shards to leave some room to move them around >> 2) increase the number of shards and allow multiple shards of the same >> index to be allocated on the same node >> >> 1) is probably impractical on our large indices, enwiki_content shards >> are already ~30Gb and this makes it impractical to move them around >> during relocation and recovery >> > > I'm leaning towards 1, our shards are very big I agree and it takes a non > negligible time to move them around. > But we also noticed that the number of indices/shards is also a source of > pain for the master. > I don't think we should reduce the number of primary shards, I'm more for > reducing the number of replicas. > Historically I think enwiki has been set to 3 replicas for performance > reasons, not really for HA reasons. > Now that we moved all the prefix queries to a dedicated index I'd be > curious to see if we can serve fulltext queries for enwiki with only 2 > replicas: 7*(2r+1p) => 21 shards total > I'd be curious to see how the load would look like if we isolate > autocomplete queries. > I think option 1 is more about how to trade HA vs shard count vs perf. > Another option would be 10*(1r+1p) => 20 smaller shards, we divide by 2 > the total size required to store enwiki. But losing only 2 nodes can cause > enwiki to be red (partial results) vs 3 nodes today. > > > >> 2) is probably our best bet. More smaller shards means that a single >> query load will be spread over more nodes, potentially improving >> response time. Increasing number of shards for enwiki_content from 6 >> to 20 (total shards = 80) means we have 80 / 24 = 3.3 shards per node. >> Removing the 1 shards per node constraint and letting elasticsearch >> spread the shards as best as it can means that in case 1 node is >> missing, or during an upgrade, we still have the ability to move >> shards around. Increasing this number even more might help keep the >> load evenly spread across the cluster (the difference between 8 or 9 >> shards per node is smaller than the difference between 3 or 4 shards >> per node). >> > > We should be cautious here concerning response times, there are steps in a > lucene query that do not really benefit from having more shards. Only > collecting docs will really benefit from this, rewrite (scan the lexicon) > and rescoring (sorting the topN and then rescore) will add more work if > done on more shards. But we can certainly reduce the rescore window with > more primaries. > Could we estimate how many shards per node we will have in the end with > this strategy? > Today we have ~370 shards/node on codfw vs ~300 for eqiad. > > >> David is going to do some tests to validate that those smaller shards >> don't impact the scoring (smaller shards mean worse frequency >> analysis). >> > Yes I'll try a 20 primaries enwiki index and see how it works. > > >> I probably forgot a few points, but this email is more than long >> enough already... >> >> Thanks to all of you who kept reading until the end! >> > > Thanks for writing it! > > >> MrG >> >> >> [1] https://www.elastic.co/guide/en/elasticsearch/reference/curr >> ent/_basic_concepts.html#_shards_amp_replicas >> [2] https://www.elastic.co/guide/en/elasticsearch/guide/current/ >> scoring-theory.html >> [3] https://wikitech.wikimedia.org/wiki/Search#Estimating_the_ >> number_of_shards_required >> >> > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
