MicroRNAs are small non-coding RNAs involved in post-transcriptional regulation of gene expression. Due to the poor annotation of
primary microRNA (pri-
microRNA) transcripts, the precise location of promoter regions driving expression of many
microRNA genes is enigmatic. This deficiency hinders our understanding of
microRNA-mediated regulatory networks. In this study, we develop a computational approach to identify the promoter region and transcription start site (TSS) of pri-
microRNAs actively transcribed using genome-wide
RNA Polymerase II (RPol II) binding patterns derived from ChIP-seq data. Based upon the assumption that the distribution of RPol II binding patterns around the TSS of
microRNA and
protein coding genes are similar, we designed a statistical model to mimic RPol II binding patterns around the TSS of highly expressed, well-annotated promoter regions of
protein coding genes. We used this model to systematically scan the regions upstream of all intergenic
microRNAs for RPol II binding patterns similar to those of TSS from
protein coding genes. We validated our findings by examining the conservation, CpG content, and activating histone marks in the identified promoter regions. We applied our model to assess changes in
microRNA transcription in
steroid hormone-treated
breast cancer cells. The results demonstrate many
microRNA genes have lost
hormone-dependent regulation in
tamoxifen-resistant
breast cancer cells.
MicroRNA promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription, and therefore allows comparison of transcription activities between different conditions, such as normal and disease states.