[ 
https://issues.apache.org/jira/browse/SOLR-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-13867.
--------------------------------
    Resolution: Won't Fix

> Make Solrcloud stable and performant and capable of having passing tests.
> -------------------------------------------------------------------------
>
>                 Key: SOLR-13867
>                 URL: https://issues.apache.org/jira/browse/SOLR-13867
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Major
>             Fix For: master (9.0)
>
>
> After spending a bit of time away from SolrCloud after being deeply involved 
> in trying to stabilize it and it's tests, I came back in 2018 and went deep 
> into the system with the Starburst upgrade.
> What I found surprised me, though I guess it should not have. The system is 
> slow, often silly, super buggy, not good at connection reuse or thread safety 
> or efficient Zookeeper communication or efficient startup and shutdown.
> Often, the things we do to make tests pass make things worse because you 
> can't do things reasonably without some major code work and so we fight for 
> tests passes, not correctness.
> Twice now, I've seen the system in the shape it was supposed to take. FAST. 
> Not bug free, but 100X more solid at least and much, much, much, much faster.
> The current system is sick and actually getting worse under it's weight as 
> more is shoveled on top. Even since 1.5 years ago, the problems are worse, 
> not better. Tests will never pass. Yes, our tests where in pretty bad shape. 
> But you can put them in the best shape possible and it won't matter. The 
> system will still fail tests.
> Sadly, I'm smart enough to know what has to be done, but not smart enough to 
> keep my work around after addressing most of the problems twice.
> Non the less, it's time to fix SolrCloud. It's not supposed to be this way. 
> I've twice spent a week or two in a state with super fast SolrCloud. Super 
> fast build system. Developmenet is actually fun. You actually have a chance. 
> I'm talking tests you have never seen take under 45-60 seconds taking 5.  
> Consistently. A different world.
> I spent a lot of time after starburst making tests pass for me. Then a lot of 
> time on a better build system that can help us improve development and good 
> practices around the project. And then a lot of time making tests faster. 
> These are important steps, but little itty bitty baby steps without 
> addressing the core rot that is growing. We don't find a problem and fully 
> understand what is up and craft a careful solution. We find something that we 
> can toss into the grand canyon, listen to it bounce around for a while, and 
> if nobody screams, we move on to the next thing. That's not necessarily 
> anyone's choice, there is little else you can do until the system is fixed. 
> When that happens we can start making smart changes instead of just shoving 
> around the mess.
> Twice I have made the current system fast. What happens first? Nothing works. 
> The system doesn't know how to be fast. It doesn't have the thread safety or 
> proper logic to be fast. And that is not a place I want to be.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to