Abstract: | Background - The process of generating raw genome sequence data continues to
become cheaper, faster, and more accurate. However, assembly of such data into
high-quality, finished genome sequences remains challenging. Many genome
assembly tools are available, but they differ greatly in terms of their
performance (speed, scalability, hardware requirements, acceptance of newer
read technologies) and in their final output (composition of assembled
sequence). More importantly, it remains largely unclear how to best assess the
quality of assembled genome sequences. The Assemblathon competitions are
intended to assess current state-of-the-art methods in genome assembly. Results
- In Assemblathon 2, we provided a variety of sequence data to be assembled for
three vertebrate species (a bird, a fish, and snake). This resulted in a total
of 43 submitted assemblies from 21 participating teams. We evaluated these
assemblies using a combination of optical map data, Fosmid sequences, and
several statistical methods. From over 100 different metrics, we chose ten key
measures by which to assess the overall quality of the assemblies. Conclusions
- Many current genome assemblers produced useful assemblies, containing a
significant representation of their genes, regulatory sequences, and overall
genome structure. However, the high degree of variability between the entries
suggests that there is still much room for improvement in the field of genome
assembly and that approaches which work well in assembling the genome of one
species may not necessarily work well for another. |