Mining Residue Contacts in Proteins Using Local Structure Predictions Mohammed J. Zaki Shan Jin Chris Bystroff Today we are witnessing a paradigm shift in predicting protein structure from its known amino acid sequence. The traditional or Ab initio folding method employed first principles to derive the 3D structure of proteins. However, even though considerable progress has been made in understanding the chemistry and biology of folding, the success of ab initio folding has been quite limited. Instead of simulation studies, an alternative approach is to employ learning from examples using a database of known protein structures. For example, the Brookhaven Protein Database (PDB) records the 3D coordinates of the atoms of thousands of protein structures. Most of these proteins cluster into around 700 fold-families based on their similarity. It is conjectured that there will be on the order of 1000 fold-families for the natural proteins [17]. The PDB thus offers a new paradigm to protein structure prediction by employing data mining methods like clustering, classification, association rules, hidden Markov models, etc. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY cs-00-05
Mining Residue Contacts in Proteins Using Local Structure Predictions
Mohammed J. Zaki
Shan Jin
Chris Bystroff
Today we are witnessing a paradigm shift in predicting protein structure from its known amino acid sequence. The traditional or Ab initio folding method employed first principles to derive the 3D structure of proteins. However, even though considerable progress has been made in understanding the chemistry and biology of folding, the success of ab initio folding has been quite limited. Instead of simulation studies, an alternative approach is to employ learning from examples using a database of known protein structures. For example, the Brookhaven Protein Database (PDB) records the 3D coordinates of the atoms of thousands of protein structures. Most of these proteins cluster into around 700 fold-families based on their similarity. It is conjectured that there will be on the order of 1000 fold-families for the natural proteins [17]. The PDB thus offers a new paradigm to protein structure prediction by employing data mining methods like clustering, classification, association rules, hidden Markov models, etc.
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY
cs-00-05