Hi Ashwin Thanks for sharing this detail. Do you mind sharing how big are each of these indices ? I am almost sure this is network capacity and constraints related per your aws setup.
Yes if you can confirm that the backup is complete, or you just want the system to move on discarding the backup process, your removal of the backup flag from zookeeper will help Solr in moving on to the next task in the queue. It would also help to ensure your overseer is on a node with a role that exempts it from any Solr index responsibilities. > On Aug 10, 2020, at 6:43 PM, Ashwin Ramesh <ash...@canva.com.INVALID> wrote: > > Hey Aroop, the general process for our backup is: > - Connect all machines to an EFS drive (AWS's NFS service) > - Call the collections API to backup into EFS > - ZIP the directory once the backup is completed > - Copy the ZIP into an s3 bucket > > I'll probably have to see which part of the process is the slowest. > > On another note, can you simply remove the task from the ZK path to > continue the execution of tasks? > > Regards, > > Ash > > On Tue, Aug 11, 2020 at 11:40 AM Aroop Ganguly > <aroopgang...@icloud.com.invalid> wrote: > >> 12 hours is extreme, we take backups of 10TB worth of indexes in 15 mins >> using the collection backup api. >> How are you taking the backup? >> >> Do you actually see any backup progress or u are just seeing the task in >> the overseer queue linger ? >> I have seen restore tasks hanging in the queue forever despite process >> completing in Solr 77 so wouldn’t be surprised this happens with backup as >> well. And also observed that unless that unless that task is removed from >> the overseer-collection-queue the next ones do not proceed. >> >> Also adding replicas while backup seems like overkill, why don’t you just >> have the appropriate replication factor in the first place and have >> autoAddReplicas=true for indemnity? >> >>> On Aug 10, 2020, at 6:32 PM, Ashwin Ramesh <ash...@canva.com.INVALID> >> wrote: >>> >>> Hi everybody, >>> >>> We are using solr 7.6 (SolrCloud). We notices that when the backup is >>> running, we cannot add any replicas to the collection. By the looks of >> it, >>> the job to add the replica is put into the Overseer queue, but it is not >>> being processed. Is this expected? And are there any workarounds? >>> >>> Our backups take about 12 hours. Maybe we should try optimize that too. >>> >>> Regards, >>> >>> Ash >>> >>> -- >>> ** >>> ** <https://www.canva.com/>Empowering the world to design >>> Share accurate >>> information on COVID-19 and spread messages of support to your community. >>> >>> Here are some resources >>> < >> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr&utm_source=news&utm_campaign=covid19_templates> >> >>> that can help. >>> <https://twitter.com/canva> <https://facebook.com/canva> >>> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> >>> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> >>> <https://instagram.com/canva> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> > > -- > ** > ** <https://www.canva.com/>Empowering the world to design > Share accurate > information on COVID-19 and spread messages of support to your community. > > Here are some resources > <https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr&utm_source=news&utm_campaign=covid19_templates> > > that can help. > <https://twitter.com/canva> <https://facebook.com/canva> > <https://au.linkedin.com/company/canva> <https://twitter.com/canva> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > <https://instagram.com/canva> > > > > > > > > > >