Abstract:
Experimental studies used in investigating the binding of specific factors to DNA along the genome are time consuming and expensive. Additionally, the increasing amount of data being produced with binding sites for different transcriptional regulators call for modern computational techniques for analysing binding patterns of several factors. In this paper, flexible statistical modelling techniques in the form of multivariate Hawkes processes have been used to model the occurrences of transcriptional regulatory elements (TREs) and their interaction along DNA sequence using 1% human genome ENCODE pilot data. We employed statistical procedures and techniques to model the transcription factor binding sites of addition, similar patterns of interaction effects of each TREs on the others are observed. In all cases, the Hawkes log kernel model gives a better fit. The model, which is also in terms of histone modification elements, adequately captures the extreme inter-distances that usually characterise the transform point processes.ten TREs through favoured or avoided distances. It is generally revealed that there is interaction among transcription factor binding sites. In addition, similar patterns of interaction effects of each TREs on the others are observed. In all cases, the Hawkes log kernel model gives a better fit. The model, which is also in terms of histone modification elements, adequately captures the extreme inter-distances that usually characterise the transform point processes