YaleGerstein Lab

Predicting interactions in protein networks by completing defective cliques

Haiyuan Yu, Alberto Paccanaro, Valery Trifinov and Mark Gerstein

Abstract Datasets obtained by large-scale, high-throughput methods for detecting protein-protein interactions typically suffer from a relatively high level of noise. We describe a novel method for improving the quality of these datasets by predicting missed protein-protein interactions, using only the topology of the protein interaction network observed by the large-scale experiment. The central idea of the method is to search the protein interaction network for defective cliques (nearly complete complexes of pair-wise interacting proteins), and predict the interactions that complete them. We formulate an algorithm for applying this method to large-scale networks, and show that in practice it is efficient and has good predictive performance.


Scripts and binaries
       C scripts
 1. Find maximal cliques
 2. Complete defective cliques
 3. All C scripts
       Binaries
 1. Convert input datasets into appropirate format
 2. Find maximal cliques
 3. Complete defective cliques
 Please note that you first have to convert your input file into a binary file using "convert2mtx" before you can ran "maxcliq" or "dcc". In other words, "maxcliq" and "dcc" only take the output from "convert2mtx" as input!!!

Datasets
 1. Universal dataset (56x56) for Gold-standards
 2. Universal dataset (56x56) for large-scale experiments
 3. Protein names for the universal dataset
 4. Large-scale experimental dataset
 5. Help file for the datasets

Last modified on June 23rd, 2006