Machine-learning methods for structure prediction of β-barrel membrane proteins

Savojardo, Castrense (2013) Machine-learning methods for structure prediction of β-barrel membrane proteins, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Informatica, 25 Ciclo. DOI 10.6092/unibo/amsdottorato/5429.
Documenti full-text disponibili:
Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Download (1MB) | Anteprima


Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Tipologia del documento
Tesi di dottorato
Savojardo, Castrense
Dottorato di ricerca
Scuola di dottorato
Scienze e ingegneria dell'informazione
Settore disciplinare
Settore concorsuale
Data di discussione
8 Aprile 2013

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi