In the seminar the graph of sequencing effort for Sanger/ rest of UK/ worldwide is very impressive.
On Thu, 4 Feb 2021 at 10:21, Tim Cutts <t...@sanger.ac.uk> wrote: > > > > On 3 Feb 2021, at 18:23, Jörg Saßmannshausen < > sassy-w...@sassy.formativ.net> wrote: > > > > Hi John, > > > > interesting stuff and good reading. > > > > For the IT interests on here: these sequencing machine are chucking out > large > > amount of data per day. The project I am involved in can chew out 400 GB > or so > > on raw data per day. That is a small machine. That then needs to be > processed > > before you actually can analyze it. So there is quite some data movement > etc > > involved here. > > > If anyone wants any details, just ask me, since the IT supporting all that > sequencing is my team’s baby. > > Actually, the sequencing capacity for this volume of COVID samples is not > great. The virus genome is so small (only 30,000 bases, compared to a > human’s 3 billion base pairs) that you can massively multiplex the samples > in a single sequencing run. > > Currently, we multiplex 384 samples per Novaseq sequencing lane. There > are four lanes per flowcell, and two flowcells per sequencer. The > sequencing run takes about 24 hours, so each instrument can sequence about > 3,000 samples per day. > > We have about 20 of these sequencers, so our total capacity is very high; > in fact we only use three sequencers for COVID at the moment, because > sample and library preparation is actually the bottleneck. Getting those > 384 samples ready for the sequencer. We are planning to increase it > though, both by increasing multiplexing and by using more sequencers. > > Sequencing itself is a bit less than a day, and the computational analysis > to de-multiplex and reconstruct the genomes is less than a day running on > our production-oriented OpenStack cluster (we keep critical projects like > Heron on a physically separate cluster from normal faculty research); we > can easily keep up with the sequencers. We then upload our results to the > folks at CLIMB, and that’s where the comparative genomics tends to take > place. > > There’s a lot of effort at the moment going into speeding up the > end-to-end process; for this sequencing to be as useful as possible for > close-to-real-time outbreak and mutation analysis, the turnaround time > needs to be as short as possible. It turns out you can see statistically > significant new mutation signatures very early on before infection rates > really start to rise (this was visible in Kent data for B.1.1.7), so the > sooner we can see this sort of thing the better we will get at taking > appropriate measures. > > For more details on the actual analysis, we released a public seminar a > couple of weeks ago: > > https://stream.venue-av.com/e/sanger_seminars/Barrett > > Tim > > > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf