Using the SVM method to predict enhancers from tissues and cell lines

A team of researchers have developed a computational bioinformatic method to predict and accurately locate enhancer regions on cell lines.


Gene expression is a complex process that is regulated by a set of factors. These include transcriptional regulatory elements.

One type of regulatory element include short DNA regions that help in transcription efficiency through the use of several transcription factors. These are called enhancers.

The researchers mentioned how, in order to further research on gene expression, the identification of such enhancer regions is a crucial element. One of the problems described with this is identifying the regions. As enhancers are independent of their distances and orientations to the target genes, it becomes difficult to accurately locate these regions.

The development of the high-throughput ChIP-sequencing (Chromatin Immunoprecipitation sequencing) methods, computational methods have also been developed to predict these enhancer regions.

Due to most computational methods relying on p300 binding sites, along with (or instead of) DNase I hypersensitive sites (DHSs) for selecting positive training samples, the process could also be seen as a hindrance. Subsequently, this could led to unsatisfactory prediction performance. 

The suggested method by the researchers, including Dr Jihong Guan, proposes a method based on support vector machines (SVMs) to investigate enhancer prediction on cell lines and tissues.

The research mainly focused on the prediction of enhancers at various developmental stages of heart and lung disease.

The team mentioned that the results of this research were ‘quite satisfactory’. The developed method, unlike previous procedures, achieved good performance on most cell lines and tissues, significantly outperforming most modern methods on heart and lungs.

Moreover, it was also easier to predict enhancers from tissues of adult stage than from tissues of fetal stage, which is proven on both heart and lung tissues.

The results were published in the journal Current Bioinformatics.