Skip to Content

In-house composite database: genomesDB

genomesDB is a composite database built from the proteome FASTA files obtained by the NCBI Reference Sequences database (RefSeq) for all fully sequenced bacterial and archaeal genomes. Each genome, chromosome, and protein in the file is tagged with a unique internal numerical identifier. In addition, taxonomic and contextual information is parsed from the NCBI Entrez Genome Project database. For every entry, taxonomic information is collected for the corresponding kingdom, phylum, class, order, family, genus and species. When available, further contextual data is included pertaining to genome size, guanine-cytosine content, Gram staining, shape, arrangement, endospore formation, motility, salinity, oxygen, habitat and temperature range.