Analysis of biological sequence data demands more and more sophisticated and fine-grained models, but these in turn introduce hard computational problems. A class of probabilistic-logic models is considered, which increases the expressibility from HMM's and SCFG's regular and context-free languages to, in principle, Turing complete languages. In general, such models are computationally far to complex for direct use, so optimization by pruning and approximation are needed. % The first steps are made towards a methodology for optimizing such models by approximations using auxiliary models for preprocessing or splitting them into submodels. An evaluation method for approximating models is suggested based on automatic generation of samples. These models and evaluation processes are illustrated in the PRISM system developed by other authors.
Main Research Area:
Statistical and Relational Learning in Bioinformatics; Workshop of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008