Hi Tim,

Maybe SETI@home wasnt the right project to mention, just remembered there is 
another project but not in genomics on that distributed platform called 
Folding@home. So with genomics you cannot break it down into smaller chunks 
where the data can be crunched then returned to sender and then processed once 
the data is back or as its being received?

Regards,
Jonathan
________________________________
From: Tim Cutts <t...@sanger.ac.uk>
Sent: 04 February 2021 11:35
To: Jonathan Aquilina <jaquil...@eagleeyet.net>
Cc: Beowulf <beowulf@beowulf.org>
Subject: Re: [Beowulf] Project Heron at the Sanger Institute [EXT]

Compute capacity is not generally the issue.  For this pipeline, we only need 
about 200 cores to keep up with each sequencer, so a couple of servers.   
Genomics has not, historically, been a good fit for SETI@home style 
cycle-stealing, because the amount of compute you perform on a given unit of 
data is quite low.  A lot of genomics is already I/O bound even when the 
compute is right next to the data, so you don’t gain much by shipping it off to 
cycle-stealing desktops.

In fact, the direction most sequencing instrument suppliers are going is 
embedding the compute in the sequencer itself, at least for use cases where you 
don’t really need the sequence at all, you just need to know how it varies from 
a reference genome.  In such cases, it’s much more sensible to run the pipeline 
on or right next to the sequencer and just spit out the (very small) diffs.

Scientists are conservative folks though, they sometimes get a bit nervous at 
the thought of discarding the raw sequence data.

Tim

On 4 Feb 2021, at 10:27, Jonathan Aquilina 
<jaquil...@eagleeyet.net<mailto:jaquil...@eagleeyet.net>> wrote:

Would love to help you guys out in anyway i can in terms of hardware processing.

Have you guys thought of doing something like SETI@home and those projects to 
get idle compute power to help churn through the massive amounts of data?

Regards,
Jonathan
________________________________
From: Tim Cutts <t...@sanger.ac.uk<mailto:t...@sanger.ac.uk>>
Sent: 04 February 2021 11:26
To: Jonathan Aquilina <jaquil...@eagleeyet.net<mailto:jaquil...@eagleeyet.net>>
Cc: Beowulf <beowulf@beowulf.org<mailto:beowulf@beowulf.org>>
Subject: Re: [Beowulf] Project Heron at the Sanger Institute [EXT]



On 4 Feb 2021, at 10:14, Jonathan Aquilina via Beowulf 
<beowulf@beowulf.org<mailto:beowulf@beowulf.org>> wrote:

I am curious though to chunk out such large data is something like hadoop/HBase 
and the like of those platforms, are those whats being used?


It’s a combination of our home-grown sequencing pipeline which we use across 
the board, and then a specific COG-UK analysis of the genomes themselves.  This 
pipeline is common to all consortium members who are contributing sequence 
data.  It’s a Nextflow pipeline, and the code is here:

https://github.com/connor-lab/ncov2019-artic-nf 
[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_connor-2Dlab_ncov2019-2Dartic-2Dnf&d=DwMF-g&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=jJhOeZORmye7vKliXyqrCd2Kvbe5xu9pHhLw4rNQmHM&s=lSbHd9Jxd4Dy9P7rosnrdgOmieVt-yzUuVI-MPK7TM0&e=>

Being nextflow, you can run it on anything for which nextflow has a backend 
scheduler.   It supports data from both Illumina and Oxford Nanopore sequencers.

Tim
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a 
charity registered in England with number 1021457 and a company registered in 
England with number 2742969, whose registered office is 215 Euston Road, 
London, NW1 2BE.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a 
charity registered in England with number 1021457 and a company registered in 
England with number 2742969, whose registered office is 215 Euston Road, 
London, NW1 2BE.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to