About
On this site, you find the test data that were used for efficient dynamic construction of a compressed de Bruijn subgraph for pan-genome analysis.
E.coli
The following E.coli sequences (or when suitable the reverse complement) were downloaded from the page www.ncbi.nlm.nih.gov/nuccore/ by the following accession numbers.
FM180568 | 01.FM180568.sequence.fasta | (md5sum: 3939f45d6d97afa76a2fbb603307a237) | |
FN554766 | 02.FN554766.sequence.fasta | (md5sum: b5726c3bae898831d5240f8897736c12) | |
CP000247 | 03.CP000247.sequence.fasta | (md5sum: 08643a4078ec97b36ea3da402bef95f6) | |
CU928145 | 04.CU928145.sequence.fasta | (md5sum: d6e80db065ddf221b5925888ff8edd67) | |
CP001671 | 05.CP001671.sequence.fasta | (md5sum: 97f209b1693b222e97a828d3c5a9c449) | |
CP000468 | 06.CP000468.sequence.fasta | (md5sum: 80d233b0ffab129579f55789b136ad2e) |
Human Genome
An exemplary plot drawn by the program can be viewed here.
The test file was created by concatenation of the following 10 files in the following order:
hg16 (NCBI34) from July 2003
Download (md5sum: 9c4567258b47b6dd466225c58da65eb4)
Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg16/chromosomes/
Comment: Modified file - converted lowercase to uppercase and removed 3 characters (RR and M) from chromosome 3.
hg17 (NCBI35) from May 2004
Download (md5sum: 57f5af6e6004497f82b284b75a712486)
Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17/chromosomes/
Comment: Modified file - converted lowercase to uppercase and removed 3 characters (RR and M) from chromosome 3.
hg18 (NCBI36) from Mar. 2006
Download (md5sum: f37590f3007ac483488891113f222dc8)
Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/
Comment: Modified file - converted lowercase to uppercase and removed 3 characters (RR and M) from chromosome 3.
hg19 (GRch37) from Feb. 2009
Download (md5sum: 55c0eb9b019d9f727b0d0ae42b5ca237)
Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
Comment: Modified file - converted lowercase to uppercase.
hg38 (GRch38) from Dec. 2013
Download (md5sum: ea47ff706942f5e58b327aac61e528d6)
Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/
Comment: Modified file - converted lowercase to uppercase.
maternal haplotype of NA12878
The Gerstein Lab at Yale University has created a version of the NA12878 genome based on NCBI build 36 and incororating SNPs, indels and SVs identified by the 1000 Genomes project. This genome sequence is available at http://sv.gersteinlab.org/NA12878_diploid.
Download (md5sum: 4a5e7ffec07364de66e56022d5864107)
Comment: Users of this assembly are requested to cite: Rozowsky J et al. (2011). AlleleSeq: Analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology, 7, 522.
paternal haplotype of NA12878
The Gerstein Lab at Yale University has created a version of the NA12878 genome based on NCBI build 36 and incororating SNPs, indels and SVs identified by the 1000 Genomes project. This genome sequence is available at http://sv.gersteinlab.org/NA12878_diploid.
Download (md5sum: 75e170b383de42aeb14732cabeab9a00)
Comment: Users of this assembly are requested to cite: Rozowsky J et al. (2011). AlleleSeq: Analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology, 7, 522.
GRCh38.p12
Download (md5sum: d4f40c80dd774652f18367f62f3421eb)
Src: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.27_GRCh38.p12/
Comment: Modified file - joined chromosomes into one fasta sequence
HuRef
Download (md5sum: 4c0bf63c64fcd205d59683cb1554c4c8)
Src: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/
Comment: Modified file - joined chromosomes into one fasta sequence
CHM1_1.1
Download (md5sum: 8eca87e0b52f9b60a059cd09a53ccc29)
Src: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/
Comment: Modified file - joined chromosomes into one fasta sequence