Title: | Are grammatical representations useful for learning from biological sequence data? - a case study. |
Authors: | C.H. Bryant, S.H. Muggleton, A. Srinivasan, A. Whittaker, S. Topp, and C. Rawlings |
Series: | Linköping Electronic
Articles in Computer and Information Science ISSN 1401-9841 |
Issue: | Vol. 6(2001): nr 013 |
URL: | http://www.ep.liu.se/ea/cis/2001/013/ |
Abstract: |
This paper investigates whether Chomsky-like grammar representations
are useful for learning cost-effective, comprehensible predictors of
members of biological sequence families. The Inductive Logic
Programming (ILP) Bayesian approach to learning from positive examples
is used to generate a grammar for recognising a class of proteins
known as human neuropeptide precursors (NPPs). Collectively, five of
the co-authors of this paper, have extensive expertise on NPPs and
general bioinformatics methods. Their motivation for generating a NPP
grammar was that none of the existing bioinformatics methods could
provide sufficient cost-savings during the search for new NPPs. Prior
to this project experienced specialists at SmithKline Beecham had
tried for many months to hand-code such a grammar but without
success. Our best predictor makes the search for novel NPPs more
than 100 times more efficient than randomly selecting proteins for
synthesis and testing them for biological activity. As far as these
authors are aware, this is both the first biological grammar learnt
using ILP and the first real-world scientific application of the ILP
Bayesian approach to learning from positive examples.
A group of features is derived from this grammar. Other groups of
features of NPPs are derived using other learning strategies.
Amalgams of these groups are formed. A recognition model is generated
for each amalgam using C4.5 and C4.5rules and its performance is
measured using both predictive accuracy and a new cost function,
Relative Advantage ( |
---|---|
Keywords: |
Intended publication 2001-08-30 |
Postscript Checksum |
---|---|
Info from authors | |
Third-party information |
Editor-in-chief: editor@ep.liu.se Webmaster: webmaster@ep.liu.se | ~ |