description |
Motivation: In a large number of dimeric cases of transcription factor (TF) binding, the specificity of the dimeric motif has been observed to differ notably from what could be expected were the factors to bind to DNA independently of each other. The current motif discovery methods are unable to learn monomeric and dimeric motifs in a modular fashion such that this deviation from the expected product of independent monomer motifs would become explicit.Results: We propose MODER, an expectation maximization (EM) algorithm and a software tool for discovering monomeric TF binding motifs and their dimeric combinations. The algorithm uses probabilistic mixture modeling in which monomeric motifs are represented as standard position-specific probability matrices (PPMs), and dimeric motifs are represented in modular fashion as pairs of monomeric PPMs, with associated orientation and spacing information. For dimers with over- lapping monomers the model represents explicitly the deviation from the purely modular model in which a dimeric PPM would equal the product of two inde- pendent monomer PPMs. Given training data and seeds for monomeric PPMs, MODER learns all model components and their orientation and spacing prefer- ences simultaneously by EM. The tool can analyze in reasonable time a training data of several Mbps in size. Validation results show robust and accurate perfor- mance. |