The Assemblathon

  • Background
  • Assemblathon 1
  • Assemblathon 2
  • Assemblathon 3
  • Mailing lists
  • Contact us
  • Archive
  • RSS
banner

Announcing the Assemblathon ‘publish for free’ contest

image

The Assemblathon 2 paper won the 2013 BioMed Central Research Award for Open Data. Typically these Research Awards are presented to individual scientists and winners receive a small cash prize in recognition for their efforts. Given that there are almost 100 authors on the Assemblathon 2 paper, it didn’t seem practical to split up any prize money.

Instead, we have decided to give that money back to the community by way of awarding waivers to cover Article Processing Charges (APCs). Specifically, BioMed Central (BMC) will grant two APC waivers that can be applied to any BMC journal. These waivers will be awarded to any interested parties who are currently writing manuscripts that fall into one of the following three areas:

  1. Manuscripts that describe new methods/tools that will advance the field of ‘omics’ assembly (genome, transcriptome, or metagenome)
  2. Manuscripts that propose other contests similar to the Assemblathon. I.e. contests that aim to determine the relative performance of bioinformatics tools that are focused on a particular problem area (e.g. read mapping, de novo gene finding, SNP calling)
  3. Manuscripts that help promote, or advance the use of, 'Open Data’ methodologies in the genomics and bioinformatics community

The first round of this contest will last from today (1st August) through to the end of September. If there are no suitable manuscripts submitted, we will extend the contest by another month and repeat this process until both APC waivers have been awarded. An overview of the contest along with detailed instructions for submitting entries are included below.

We hope that the prize money from our Open Data award can be put to good use and help lead to some stimulating new ideas being published in a BMC journal!

Contest Overview

  1. Entries will be judged by a panel made up of members from the Bioinformatics Core facility at the UC Davis Genome Center and the decision of the judging panel is final.
  2. To avoid potential conflict of interests, this contest is not open to Faculty who are members of the UC Davis Genome Center, nor to any of their lab members.
  3. Draft manuscripts may be submitted from now until the end of September when we will review all entries.
  4. If we have not received any suitable entries by the end of September, we will extend the deadline by another month and continue to repeat this process as appropriate.
  5. If a manuscript is deemed suitable to receive an APC waiver, it still must undergo the standard peer-review process by a BMC journal. If a manuscript is then rejected by the journal, we will allow others to then apply for that APC waiver.
  6. Winners of the APC waivers will be announced on the Assemblathon website and twitter account.

How to submit your entry

  1. To apply for one of the two APC waivers, please email a manuscript to [email protected].
  2. Manuscripts must be submitted as a single PDF document. No other formats will be accepted.
  3. Do not submit supplementary material unless specifically requested by the judging panel.
  4. Manuscripts do not necessarily need to be complete, but must contain enough data/results/ideas so that the judging panel can assess their suitability.
  5. Draft manuscripts will not be shared with anyone outside of the judging panel

 

    • #contest
    • #publishforfree
  • 3 years ago
  • 1
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

The Assemblathon 2 paper is the winner of the 2013 BioMed Central award for Open Data

We are pleased that our efforts to make the process of running the Assemblathon 2 contest as transparent as possible have been recognized by BioMed Central in their annual research awards.

Our paper has won the 2013 award for Open Data (sponsored by Lab Archives), an award that is defined as follows:

This award recognizes institutions which have done most to show leadership in taking steps to expand access to the published results of scholarly research. 

The citation for this award reads:

An offshoot of the Genome 10K project, and primarily organized by the UC Davis Genome Center, Assemblathons are contests to assess state-of-the-art methods in the field of genome assembly. Assemblathon 2 used real data from three vertebrate species and started in June 2011. The manuscript was published in July 2013.

The following teams submitted one or more assemblies to the Assemblathon 2 project: CSHL team: P Baranay, S Emrich, MC Schatz; MLK team: MD MacManes; ABL team: H Chitsaz; Symbiose team: R Chikhi, D Lavenier, G Chapuis, D Naquin, N Maillet; Ray team: S Boisvert, J Corbeil, F Laviolette, E Godzaridis; IOBUGA team: TI Shaw, W Chou; GAM team: S Scalabrin, R Vicedomini, F Vezzi, C Del Fabbro; Meraculous team: JA Chapman, IY Ho, DS Rokhsar; Allpaths team:S Gnerre, G Hall, DB Jaffe, I MacCallum, D Przybylski, FJ Ribeiro, T Sharpe, S Yin; CBCB team: S Koren, AM Phillippy; PRICE team: JG Ruby; SOAPdenovo team: R Luo, B Liu, Z Li, Y Shi, J Yuan, H Zhang, S Yiu, T Lam, Y Li, J Wang; Curtain team: M Haimel, PJ Kersey; CoBiG2 team; Bruno Miguel Vieira, Francisco Pina-Martins, Octávio S. Paulo; BCM-HGSC team: Y Liu, X Song, X Qin, H Jiang, J Qu, S Richards, KC Worley, RA Gibbs; ABySS team: I Birol, TR Docking, SD Jackman; Phusion team: Z Ning; CRACS team: NA Fonseca; SGA team: JT Simpson, R Durbin; Computer Technologies Department (CTD) team: A Alexandrov, P Fedotov, S Melnikov, S Kazakov, A Sergushichev, F Tsarev: Newber-454 team: JR Knight.

The resulting assemblies were assessed by teams from the Wellcome Trust Sanger Institute (M Hunt, T Otto), UC Santa Cruz (D Earl, Paten B) and UC Davis (K Bradnam, J Fass, I Korf). Resources (sequences, fosmids, and optical maps) were generated and validated by E Jarvis, S Zhou, S Goldstein, M Place, DC Schwartz, M Bechner, J Shendure, J Kitzman, J Hiatt, J Howard, G Ganapathy, and G Zhang. The principle organizers behind Assemblathon 2 were David Haussler, Ian Korf, and Erich Jarvis.

Thanks to everyone involved in generating data, submitting assemblies, and helping write the paper. I’d also like to specifically thank all of the co-authors for supporting our decision to use the arxiv preprint server, and for agreeing to submit the final paper to Gigascience which also allowed us to publish the data (including all of the submitted assemblies) as citable datasets in GigaDB.

Open science FTW!

2014-04-22 18.28 - Updated to include mention of award sponsor.

    • #assemblathon 2
  • 3 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Thoughts on Assemblathon 3

The Assemblathon 2 paper has finally been published and in the process it generated a lot of discussion. I have previously written some thoughts on the open nature of the project, but would now like to say a little about the prospects of Assemblathon 3, and how the lessons learned from the last Assemblathon might change how we would do things in future.

But first I’ll get straight to the point and say that there are no immediate plans for an Assemblathon 3 contest. This is due to two main reasons:


1. Assemblathon fatigue

I think that some of the organizers as well as the previous participants would like a bit of a break before we even consider doing this all again.

2. Low expectations

Many of the software tools used by teams in Assemblathon 2 have been superseded by newer versions and there are also several new assemblers out there. However, it is not at all certain if an Assemblathon 3 contest run today would produce a different outcome. I.e. it seems likely that we would still see a lack of consistency in the results between different assemblers.


There are other reasons as well (e.g. lack of funding), but I think that the above two points are why there will not be another Assemblathon in the immediate future. That’s not to stop anyone else organizing a genome assembly assessment exercise, and so the next two sections offer some thoughts on what could be improved.

What did we learn from Assemblathon 2?

Assemblathon 2 potentially suffered from having too many species, with too much sequence data for some species. Not all participants had the time, resources, and/or inclination to assemble all three genomes, meaning that we couldn’t fairly compare each assemblers’s performance across multiple species (only two teams assembled all three genomes).

Furthermore, very few teams utilized all of the diverse parrot sequencing data (Illumina, 454, and PacBio), with most teams opting to just use some of the Illumina reads. It could be argued that, while laudable, providing 285x coverage of the parrot genome is hardly a real-world scenario.

The final metrics that were used to judge assemblies were modified slightly throughout the evaluation process. This was in response to feedback from the participants. In hindsight — always a wonderful thing — we should have worked hard to finalize the metrics before the evaluation process started. Any time a single metric was changed, many downstream changes had to be made (e.g. a dozen or so figures had to be redrawn) and this slowed down work on the manuscript.

A criticism by one the manuscript’s reviewers — available here as part of the pre-publication history at GigaScience — was that Assemblathon 2 lacked a clear goal. To quote Mick Watson:

The first question which occurs to me is this: what was the purpose of this international effort? What were the authors trying to achieve? Was it:
1) To catalogue available assemblers?
2) To compare available assemblers?
3) To develop best practice?
4) To develop a set of guidelines? i.e. which assembler should I use on my data?
5) To compare assembly metrics?
6) To develop better assembly metrics?

This is valid criticism and I personally think that while we were trying to do a bit of everything on Mick’s list, we ended up spreading ourselves too thinly. Perhaps, if there had only been assemblies from one species to focus on, it may have been possible to push a bit harder at answering these questions.

Another weakness of Assemblathon 2 was that we should have been more rigorous in collecting information on how assemblies were generated by participating teams. Most teams provided instructions, though the amount of detail in those instructions varied a lot.

Proposed differences for Assemblathon 3

In light of what we learnt from Assemblathon 2, I propose a set of constraints/rules/guidelines for any potential Assemblathon 3 contest. Hopefully, these would help produce a speedier competition, with more of an emphasis on ‘real world’ genome assembly, and which could produce a more informative analysis.

  1. Focus on only one species and make it clear why we have chosen this species (something that was not obvious for Assemblathon 2).
  2. Generate a mixture of sequences based on allcommonly used NGS technologies (Illumina, 454, Ion Torrent, PacBio, and possibly Moleculo).
    • Sequence information will (initially) be kept private from participants.
    • Ideally, sequences would all be sourced from a single sequencing facility (for consistency of pre/post sequencing steps), or from the NGS companies themselves.
  3. Teams have to ‘buy’ sequence resources to better reflect real-world usage (where a typical research lab has limited resources):
    • Teams would be allocated a fictional budget, e.g. $20,000.
    • Teams could opt to use a mixture of $10,000 of PacBio sequence & $10,000 of Illumina, or just $20,000 of Illumina etc.
    • Only once a team has ‘placed their order’ would we make sequence data available to that team.
  4. All input sequence read data should be submitted to SRA/ENA/DDBJ as soon as it is generated, to prevent delays later on.
  5. As soon as submission deadline is closed, the final set metrics should be agreed with participants before analysis is started. This will also save lots of time later on.
  6. Make it a requirement that participation requires submission of full assembly instructions at the same time that a team submits their assembly (potentially using a detailed form to ensure we collect all necessary data).

Optional requirements and other suggestions

  1. A virtual cost of computing time/resources could also be factored into the budget if desired (e.g. an assembly that used a high CPU cluster for a week would have a cost added that would be higher than an assembly running on a desktop computer for a day). This might be difficult to assign costs though.
  2. Optionally request all assemblies use new FASTG format, in order to allow us to reward teams that better capture heterozygosity in their assemblies (will probably require new analysis tools).
  3. Maximum of two assemblies per participant (with teams encouraged to use experimental ideas for second assembly…in Assemblathon 2, these sometimes turned out to be better assemblies than the competitive entries).
  4. Potentially seek sponsorship for some sort of official prize/trophy for the winner(s) in order to encourage participation.

What species should Assemblathon 3 use?

Every now and then, people have made suggestions as to what species or genomes would be good candidates for an Assemblathon 3 contest. These suggestions often reflect the desire to have the community assemble a genome for someone’s favorite species, or sometimes just reflect the idea that we should be assembling something with a genome that is: large/complex/polyploid etc. E.g.

For Assemblathon 3 – if it happens – can we throw in a polyploid species? #G10K @assemblathon

— Mario Caccamo (@mcaccamo)
April 26, 2013

Will @assemblathon 3 attack some small but difficult genomes? e.g. Plasmodium?

— Jason Chin (@infoecho)
May 1, 2013

@assemblathon @GigaScience @BioMickWatson @ctitusbrown looking forward to a de novo plant genome assembly contest. The real stuff.

— Steven Robbens (@stevenrobbens)
July 22, 2013

Such feedback leads me to believe that if there is an Assemblathon 3, then it will probably disappoint many people as soon as a species is chosen! Personally, I still like the idea of using synthetic genome data (as in Assemblathon 1). It not only helps to know what the answer is meant to look like, but this approach could be broadened to include multiple (small) genome assemblies that all differ in a controlled fashion. E.g. we could make a series of small genomes that progressively differed in their heterozygosity and/or repeat content. This would allow us to see how well different assemblers fare under a range of test conditions that we get to control precisely.

Conclusion

If there is to be an Assemblathon 3, then you’ll most likely hear about it on this blog or on the Assemblathon’s twitter account. But even if there isn’t another Assemblathon, I will probably continue to use the twitter account to highlight news and developments in the field genome assembly.

Keith Bradnam

    • #assemblathon 3
  • 4 years ago
  • 1
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Writing the Assemblathon 2 paper, an experiment in openness

Throughout the process of putting the Assemblathon 2 paper together I have tried to be transparent about as much of the process as possible. Some of the discussion about the paper happened via my email contact list that contained just the names of the principal co-authors. I say ‘just’, but this was still a list of 50 people (just over half of the total number of co-authors). However, in most cases I chose to use the much wider Assemblathon mailing-list, a list that is open to anyone interested in genome assembly and which currently has about 250 members.

Faced with the daunting task of getting feedback from all of the co-authors on the first draft of the manuscript, I desperately wanted to avoid having to email copies of Microsoft Word documents back and forth. Instead I chose to do everything using Google Docs. This let everybody comment on the manuscript in one place and let all of the co-authors see all of the comments (with the option of getting email notifications of new comments). By and large, this process seemed to work well and allowed me to address comments in much more manageable way.

I have used twitter throughout the many months of the writing process to communicate updates about the paper’s progress — or lack of progress as was often the case. I also continue to use this account to tweet about anything to do with genome assembly, and as a result the Assemblathon twitter account now has over 1,300 followers. I also used twitter to canvas opinions as to who might make a suitable reviewer for the paper. I seem to recall that both C. Titus Brown and Mick Watson volunteered to do so (among others), and indeed they ended up with the job.

Once we had decided to publish in GigaScience it was great to find out that they did not object to the use of pre-print servers. This allowed us to put a very early version of the manuscript on the arXiv.org pre-print server. Two subsequent versions of the paper have also been made available on the same site.

This first version of the manuscript attracted interest from the Haldane’s Sieve blog who asked me to write a piece about the paper (which I duly did). About a month later, C. Titus Brown not only posted an edited version of his review online, but also wrote a wider, more free-ranging, Thoughts on the Assemblathon 2 paper blog post. This attracted a great deal of attention and the comments on that blog post reflect the very open discussion that people were having about the pre-print, and about the wider issues of genome assembly. C. Titus Brown’s blog post prompted fellow bioinformatician, Lex Nederbragt, to write a response (which in turn generated many more comments).

The Homolog.us blog has now contributed four blog posts regarding Assemblathon 2 and the associated pre-print and published paper. All of these have attracted comments from the community (e.g. see this post Notes on Assemblathon Paper). The most recent Homolog.us blog post beat me to the punch and talks about some of the issues of openness and transparency that I wanted to address here.

Finally, and perhaps most importantly, GigaScience has published the full pre-publication history of the paper. This contains the full reviews from both reviewers (who both attached their names), my responses to those reviews, as well as all intermediary versions of the paper that were submitted as part of the publication process.

I hope that this paper shows that trying to be open throughout the process of researching, writing, and publishing a paper can be seen as a rewarding experience, and as something that engages with the community and attracts useful feedback and commentary. I don’t feel that the impact of this paper has been lessened in any way by the pre-print first being made available some six months ago.

Thanks to everyone who has emailed, tweeted, blogged, or otherwise commented on the Assemblathon 2 contest. I appreciated all of the feedback (even if it was just to ask for the hundredth time ‘when is the paper going to be finished?’).

    • #assemblathon 2
  • 4 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

The Assemblathon 2 paper has been published!

image

Image from http://flickr.com/photos/jayneandd/

It has taken a lot of work (and time) but I’m happy to announce that the Assemblathon 2 paper has today been published in GigaScience. This final version of the manuscript contains several minor corrections compared to the last pre-print that was submitted to arXiv.org.

Supporting datasets for the paper, including the genome assemblies entered as part of the Assemblathon 2 competition, have been published in GigaDB, and are citable with DOIs for each dataset as follows:

  1. DOI 10.5524/100060 Assemblathon 2 assemblies.
  2. DOI 10.5524/100061 CEGMA gene predictions for Assemblathon 2 entries.
  3. DOI 10.5524/100062 Assembled Fosmid sequences used for assessment of Assemblathon 2 entries

Thanks are due to all of the participants who entered the Assemblathon 2 contest, and to the various groups that kindly provided sequences, resources and much needed analyses of results. I also should acknowledge the patience and support from the great team at GigaScience who were very understanding throughout the (long) process of turning the initial manuscript into the final paper.

I’ll post again later this week with some comments about the paper — not to mention some thoughts on the whole process of writing this paper — as well as some brief notes about the prospects of future Assemblathons.

    • #assemblathon 2
  • 4 years ago
  • 1
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+
Page 1 of 7
← Newer • Older →

About

Avatar

An offshoot of the Genome 10K project, and primarily organized by the UC Davis Genome Center, Assemblathons are contests to assess state-of-the-art methods in the field of genome assembly.

Assemblathon 1 occurred at the end of 2010 and the results were published in late 2011. A second Assemblathon, using real data from three vertebrate species, started in June 2011 and the Assemblathon 2 paper was published early in 2013.

Twitter

Top

  • RSS
  • Random
  • Archive
  • Mobile
Effector Theme — Tumblr themes by Pixel Union