SCIENCE-JOBS-DE
Improvement of ncRNA annotation (Heidelberg)
Master project: Improvement of ncRNA annotation
The Bioinformatics (HUSAR) group at the German Cancer Research Center
developed a set of pipelines for the analysis of small/non-coding RNAs.
One of these pipelines is a tool for classifying non-coding RNA
sequence reads from high-throughput sequencing experiments.
It maps the reads against a database of known non-coding RNAs of
different types. This database has been developed in our group and contains sequences from publicly available databanks like miRBase,
Ensembl, Rfam, and others. The pipeline computes the distribution of the reads with respect to the different classes and the coverage for the single ncRNAs.
The database involved is not just a merge of the available data
sources. To avoid duplicate entries we cluster identical sequences from the same organism according to certain rules. As new databases
for several classes are now available (e.g. PiRBase, a web resource assisting piRNA functional study. Database 2014; doi: 10.1093/database
/bau110), these databases should be integrated into our consensus database. The corresponding rules will need to be defined and implemented.
Another issue is the use of the mapper for comparing the sequence reads
against the database. Currently we use bowtie and we would like to
check whether the use of other mappers could improve the mapping
quality. Furthermore our database contains ncRNAs of different types that differ quite a bit by their length (like lincRNA in contrast to miRNA). So we would also like to analyse whether the use of different mappers for different classes might help to efficiently map the reads.
Finally, the improvements need to be implemented in our pipeline.
Karl-Heinz Glatting
glatting@dkfz.de
Deutsches Krebsforschungszentrum
Heidelberg
Ansprechpartner: genome@dkfz.de