Protein codification and applications
ProtDCal suite is a web interface containing a set of tools for studying proteins.
The main module is a software, ProtDCal, useful for data mining analysis of protein data. ProtDCal allows generating a machine-learning-friendly vector from the structural information of each protein. Thus, these vectors can be used as input for pattern recognition techniques, to develop models linking structure and activity/function data.
Additional modules provide access to tools for predicting the likeliness of protein-protein and protein-peptide interactions, for identifying enzymatic proteins and for the prediction of lysine methylation sites. Next developments will include the design of antibacterial peptides and the prediction of post translational modifications.
In this suite, we will continue incorporating new applications based on ProtDCal features. Thus, we also kindly encourage other authors who had employed ProtDCal's descriptors to develop new methods, to contact us for implementing their algorithms in the suite.
This tool uses a divide-and-conquer methodology based on extracting properties from diverse groups of residues and aggregating each of them into particular descriptors. This way, a large features vector is produced, for every structure or sequence, whose elements balance local and global characteristics of the protein. Such vector can be effectively used for machine learning analyses by means of proper attribute selection and modeling techniques (SVM, Random Forest, ANN, etc).
The code is implemented in Java, which makes it ideal for a combination with powerful machine learning packages such as Weka.