Research Topics at TGIL


Algorithms for genomic sequence analysis

We develop novel algorithms/methods for genomic sequence analysis. Genomic sequences contain much more information than we usually think and plan to use. The complexity of the human genome forces us to develop more sophisticated algorithms to maximize the information we can get from sequence data.


We focus on various kinds of current analysis problems including, but not limited to:

  • Somatic mutation detection

  • Structural variation (e.g. duplication, translocation and gene-fusion) detection

  • CNV detection and verification

  • Sequencing simulation

  • Model based variant detection (e.g. cancer stem cell model and sample heterogeity)

Translational disease-mutation analysis

Genomic research is now an essential protocol for identifying disease related genetic abberation. With the advent of genomic medicine, finding "true" mutations is a short-cut to novel therapeutic development. We collaborate with many clinicians and molecular biologists to uncover the real causes of genetic diseases. 


Appropriate use of bioinformatics analysis increases the chance for true genetic causes. We implement an efficient sequence analysis pipeline not only to find point mutations but to extend the search scope to cover indels, CNVs, SVs and viral sequence abberations. We are developing the "C2G" pipeline (Context-dependent and expandable workflows for genomic analysis) that will be mounted in our genome center. 

Future sequence analysis

We plan to prepare for the future sequencing analysis technologies. For example, single-cell sequence analysis is now ​generally accepted as an essential process for tumor heterogeity investigation. However, efficient analysis methods should follow the sequencing technologies.


This ambitious goal include following research topics:

  • Preprocessing methods for whole-genome amplification data

  • Analysis methods for the Third generation sequencing 

  • Variant calling, genome-assembly and CNV analysis for single cell sequencing data