Dan, Do you have any idea on the resource usage for the hosts when Solr starts to become unresponsive? It could be that you need more resources or better AWS instances for the hosts.
We had what sounds like a similar scenario when attempting to move one of our solrcloud instances to a cloud computing platform. During periods of heaving indexing, segment merging, and searches, the cluster would become unresponsive due to solr waiting for numerous I/O operations which we being throttled. Solr can be very I/O intensive, especially when you can't cache the entire index in memory. Thanks, Chris On Tue, Oct 23, 2018 at 5:40 AM Daniel Carrasco <d.carra...@i2tic.com> wrote: > Hi, > El mar., 23 oct. 2018 a las 10:18, Charlie Hull (<char...@flax.co.uk>) > escribió: > > > On 23/10/2018 02:57, Daniel Carrasco wrote: > > > annoyingHello, > > > > > > I've a Solr Cluster that is created with 7 machines on AWS instances. > The > > > Solr version is 7.2.1 (b2b6438b37073bee1fca40374e85bf91aa457c0b) and > all > > > nodes are running on NTR mode and I've a replica by node (7 replicas). > > One > > > node is used to import, and the rest are just for serve data. > > > > > > My problem is that I'm having problems from about two weeks with a > MsSQL > > > import on my Solr Cluster: when the process becomes slow or takes too > > long, > > > the entire cluster goes down. > > > > How exactly are you importing from MsSQL to Solr? Are you using the Data > > Import Handler (DIH) and if so, how? > > > yeah, we're using import handler with jdbc connector: > > <dataConfig> > <dataSource type="JdbcDataSource" > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" > url="jdbc:sqlserver://......." user="..." password="..."/> > <entity name="products_baja_real" transformer="RegexTransformer" > query="A_Long_Query" /> > <field column="id" name="id"/> > ... A lot of fields configuration ... > </entity> > ... some entities similar to above ... > </document> > </dataConfig> > > > > > What evidence do you have that this is slow or takes too long? > > > > Well, the process normally takes less than 20 minutes and doesn't affect at > all to cluster (normally near 15m). I've a monit system that notice when > this process takes more than 25 minutes, and just a bit later after that > alert, the entire collection goes to recovery mode and then we're unable to > continue to serve the requests made by the webpage. We've to stop all the > requests until the collection is OK again. The rest of time the cluster > works perfect without downtime, but lately the problem is happen more often > (I'd to recover the cluster two times in less than an hour this night, and > it didn't fail again because we've stopped the import cron). > This is the soft problem, because sometimes the entire cluster becomes > unstable and affects to other collections. Sometimes even the node that is > Leader fails and we're unable to release that Leadership (even shutting > down the Leader server, running the FORCELEADER API command), and that make > hard to recovery the cluster. If we're lucky, the cluster recovers itself > even with recovering leader (taking so long, of course), but sometimes > we've no luck and we've to reboot all the machines to force a full recover. > > > > > > Charlie > > > > > > I'm confused, because the main reason to have a cluster is HA, and > every > > > time the import node "fails" (is not really failing, just taking more > > time > > > to finish), the entire cluster fails and I've to stop the webpage until > > > nodes are green again. > > > > > > I don't know if maybe I've to change something in configuration to > allow > > > the cluster to keep working even when the import freezes or the import > > node > > > dies, but is very annoying to wake up at 3AM to fix the cluster. > > > > > > Is there any way to avoid this?, maybe keeping the import node as NTR > and > > > convert the rest to TLOG? > > > > > > I'm a bit noob in Solr, so I don't know if I've to sent something to > help > > > to find the problem, and the cluster was created just creating a > > Zookeeper > > > cluster, connecting the Solr nodes to that Zk cluster, importing the > > > collections and adding réplicas manually to every collection. > > > Also I've upgraded that cluster from Solr 6 to Solr 7.1 and later to > Solr > > > 7.2.1. > > > > > > Thanks and greetings! > > > > > > > > > -- > > Charlie Hull > > Flax - Open Source Enterprise Search > > > > tel/fax: +44 (0)8700 118334 <+44%20870%20011%208334> > > mobile: +44 (0)7767 825828 <+44%207767%20825828> > > web: www.flax.co.uk > > > > > Thanks, and greetings!! > > -- > _________________________________________ > > Daniel Carrasco Marín > Ingeniería para la Innovación i2TIC, S.L. > Tlf: +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084> > www.i2tic.com > _________________________________________ >