Readme file (CADL)
12 September 2017
Description of HM342 Medicago sativa Cultivated Alfalfa at the Diploid Level* (CADL) v1.0 genome data
Funding: Medicago HapMap project (NSF Project IOS-1237993)
Sequencing: National Center for Genome Resources (NCGR)
Assembly and Analysis: NCGR, Noble Foundation, J. Craig Venter Institute, University of Minnesota
Joann Mudge, Nicholas P. Devitt , Diego A. Fajardo, Thiru Ramaraj, Andrew D. Farmer, Xinbin Dai, Zhaohong Zhuang, Peng Zhou, Joseph Guhlin, Christopher D. Town, Maria J. Monteros, Patrick X. Zhao, Jason R. Miller, Kevin A. T. Silverstein, Nevin D. Young
LIST OF FILES
This assembly is provided as is for the community with no claims on the quality or completeness of the sequence or gene coverage. Please be aware that: Medicago sativa is a highly heterozygous organism with an expected haploid genome size of 800 Mb. The sequence similarity of the two haplotypes in the diploid CADL varies, often diverging enough from each other that they are assembled separately. This has resulted in an assembly size of ~1200 Mb rather than the expected 800 Mb, suggesting that at least half of the genome is represented by the assembly by two distinct haplotypes. In regions of the most divergence, presence/absence differences in gene content can be seen between the haplotypes. This implies that any attempts to remove redundancies in the assembly to retain only one haplotype would result in gene loss. This also implies that the gene content of the current assembly contains a significant proportion of allelic copies of genes, a supposition that is confirmed by both alignment to the related Medicago truncatula genome and by analysis of genes that are typically found in single copies in plant genomes.
RESTRICTIONS ON USE
The CADL assemblies available here, including the previous and current version, are made available to the research community by the Medicago HapMap consortium under the Toronto Agreement [ http://www.nature.com/nature/journal/v461/n7261/full/461168a.html]. As producers of these data, we reserve the right to be the first to publish a genome-wide analysis of the data.
The pre-publication data released here is embargoed for publication except for analyses of single gene loci or small (< 10 kb) genome regions. Researchers are encouraged to contact us if there are queries about referencing or publishing analyses based on the pre-publication data obtained via this website. Researchers are also invited to consider collaborations with the Medicago Hapmap consortium for larger studies or if the limitations here restrict further work.
CADL SOURCE MATERIAL (Renamed HM342 as part of the Hapmap project)
A single plant was clonally propagated at the University of Minnesota. DNA was isolated by Amplicon Express in February, 2015.
CADL ASSEMBLY VERSION v1.0
This assembly was generated with ~100X PacBio Reads (based on the haploid genome size of 800 Mb) and Dovetail HiRise scaffolding. Note that Dovetail does not size gaps but adds in a string of 100 Ns. The assembly statistics are described in the following table:
|HM Number||Name||Chemistry||Mean subread length||Subread N50||Subread total length||Number of subreads||Max length||Coverage|
Falcon v. 0.4 was used for correction and assembly followed by Quiver polishing. An additional round of Quiver polishing was performed after integrating Dovetail HiRise scaffolding. The assembly statistics are in the table below:
|Total Contig Length||1,250,961,487|
|Total Scaffold Length||1,251,062,122|
Copyright © 2017, Noble Research Institute, LLC.