Yliopiston etusivulle Suomeksi På svenska In English Helsingin yliopisto

De-novo assembly and finishing of the genome of neuro-toxin (anatoxin-a) producing cyanobacterium, Anabaena sp. strain 37

Show simple item record

dc.contributor Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta, Matematiikan ja tilastotieteen laitos fi
dc.contributor.author Narayanasamy, Shaman fi
dc.date.accessioned 2012-10-04T12:00:36Z
dc.date.available 2012-10-04T12:00:36Z
dc.date.issued 2012-10-04
dc.identifier.uri http://hdl.handle.net/10138/37084
dc.description Vain tiivistelmä. Opinnäytteiden arkistokappaleet ovat luettavissa Helsingin yliopiston kirjastossa. Hae HELKA-tietokannasta (http://www.helsinki.fi/helka/index.htm). fi
dc.description Abstract only. The paper copy of the whole thesis is available for reading room use at the Helsinki University Library. Search HELKA online catalog (http://www.helsinki.fi/helka/index.htm). en
dc.description Endast avhandlingens sammandrag. Pappersexemplaret av hela avhandlingen finns för läsesalsbruk i Helsingfors universitets bibliotek. Sök i HELKA-databasen (http://www.helsinki.fi/helka/index.htm). sv
dc.description.abstract Cyanobacteria are ancient photosynthetic microorganisms found in both fresh and saline water bodies all over the world. Anabaena is a genus of filamentous heterocystous diazotrophic cyanobacteria that are common in freshwater lakes and often implicated in the formation of blooms. They are known to play a vital role in the nitrogen cycle and to produce harmful toxins. The reason for this toxic producing nature is still unknown. The Anabaena sp. strain 37, isolated from lake Sääksjärvi, western Finland was found to produce the neurotoxin, anatoxin-a which affects the nervous systems of humans and animals, capable of causing paralysis. During the past decade, genome sequencing has aided in the understanding of genetic information in many organisms including cyanobacteria. A whole genome sequencing project was carried out to understand the mechanism of anatoxin-a production in the Anabaena sp. strain 37. The 454 pyrosequencing produced 258,430 reads with a coverage of approximately 22X. The data was subjected to a de novo assembly which produced a draft genome, made up of 828 contigs above 500 bp, an N50 contig of 10,548 bp and a longest contig of 47,660 bp. The draft assembly underwent a finishing procedure which included scaffolding, gap closure and error correction. Two types of mate pair libraries; 3 Kb and 8 Kb were constructed and sequenced for scaffolding. The scaffolding using 196,221 of 3 Kb mate pair reads yielded 31 major scaffolds with an N50 scaffold of 344,872 bp. A second scaffolding using 34,498, 8 Kb mate pair reads resulted in 16 scaffolds, and an N50 scaffold of 1,085,340 bp. Three automated gap closure rounds were carried out using consed autofinish. The primers amplified the genomic DNA with PCR and the products were sequenced using Sanger sequencing. A total of 1,406 Sanger reads were used to closed more than 800 gaps in the draft assembly. In addition, the 454-based draft assembly contained many sequencing errors among single nucleotide homopolymeric regions of three-mers and above. Moreover, these errors were found in coding regions, namely the anatoxin-a synthetase gene cluster and was further confirmed with additional PCR and Sanger sequencing. There were 370,648 single nucleotide homopolymer sites of three mers and above that accounted for 38.18% of the genome length and a density of 668.1 per 10 Kb. A correction procedure was carried out by incorporating 100X coverage Illumina/Solexa data into the assembly. The high depth data corrected an estimated 1,888 single nucleotide homopolymer error sites of three-mers and above which translates to a 454 single nucleotide homopolymer error rate of 0.51% or 3.37 per 10 Kb. The correction also increased the overall quality of the Q20. The current assembly is made up of 14 scaffolds out of which six are major scaffolds. The assembly has an N50 scaffold of 1,085,340 bp where 99.7% of the consensus bases are of phred Q20 bases and an overall error rate of 8.21 per 10 Kb. Finally, the genome has a GC-content of 38.3% with four ribosomal RNA operons and the anatoxin-a synthetase gene cluster confirmed. fi
dc.language.iso en fi
dc.title De-novo assembly and finishing of the genome of neuro-toxin (anatoxin-a) producing cyanobacterium, Anabaena sp. strain 37 fi
dc.type.ontasot Pro gradu -työ fi
dc.subject.discipline Bioinformatics fi

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Helda


Advanced Search

Browse

My Account