摘要
Fang-Xiang Wu Habtom Ressom Michael J. Dunn Proteins are vital large molecules in living organisms and perform a vast array of functions during all kinds of biological processes. Therefore, studies that focus on the structures and functions of proteins are very crucial in understanding biological processes in a living organism. With the development of advanced technologies such as tandem mass spectrometry (MS/MS), yeast two hybrid analysis (Y2H), protein-fragment complementation assays (PCA), affinity purification/mass spectrometry (AP/MS), and protein microarrays, a large amount of proteomic data have been and continue to be produced. These data allow us to study the structure and functions of proteins on a large scale. Bioinformatic tools play a very important role in addressing the challenges to analyze, integrate, and utilize these proteomic data. The 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) took place during 12 – 15 November 2011 in Atlanta, GA, USA, and provided a forum for disseminating the latest research in bioinformatics, biomedicine, and health informatics. BIBM 2011 received 299 research paper submissions from 36 countries, many of which were related to proteomics. This Focus Issue consists of ten articles which are substantially extended versions of accepted papers selected from BIBM 2011. These ten articles present recent proteomic research conducted by the bioinformatics community. We hope that the papers will encourage researchers to adopt more extensive use of bioinformatic techniques for analyzing proteomic data. With the completion of the human genome project, a large number of protein (primary structure) sequences are deposited in protein sequence database and are publicly available. Nevertheless, the functions of proteins are mainly determined by the 3-D structures of proteins. Although techniques such as X-ray crystallography or NMR spectroscopy can be employed to determine the 3-D structure of proteins, they are generally time-consuming and costly. Therefore, bioinformatics has become an important approach to predict the structures of proteins. De novo protein structure prediction typically generates a large population of candidates (models) from protein sequences, and then selects near-native models through clustering methods. The existing structure model clustering methods are time-consuming. Zhang and Xu in “Fast algorithm for population-based protein structural model analysis” present a novel method for fast model clustering without losing the clustering accuracy, by introducing two new measures, Dscore1 and Dscore2, based on the distance matrix comparison of the structures. Fawcett et al. in “An artificial neural network approach to improving the correlation between protein energetics and the backbone structure” introduce a new approach to evaluate protein structures based on analysis of energy profiles produced by the SCOPE software package. They have demonstrated that when combined with an artificial neural network (ANN), the energy profile produced by SCOPE can potentially improve the structural quality of an unknown protein. Another approach to studying the functions of proteins is to identify proteins involved in a specific biological process or tissue. For example, if a protein is significantly expressed in a specific tissue, one may imply that this protein functions within this tissue. Recently, tandem mass spectrometry (MS/MS) has become a very important tool in identifying proteins from biological samples that have been taken from some specific tissues or biofluids. Typically, a large number of MS/MS data are analyzed to identify peptides and proteins present in a biological sample. Traditionally, as the first step peptide sequences are identified from tandem mass spectra via de novo sequence or database search methods and in the second step proteins are inferred from the identified peptides. Actually these two steps should not be separated because of dependences between peptides and proteins. Shi et al. in “Unifying protein inference and peptide identification with feedback to update consistency between peptides” develop a method for unifying protein inference and peptide identification by adding a feedback from protein inference to peptide identification. Identification of small molecule metabolites by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) poses a significant challenge due to the lack of spectral libraries and well-established computational tools. Zhou et al. in “Prioritization of putative metabolite IDs in LC-MS/MS experiments using a computational pipeline” propose a computational pipeline to assist in metabolite identification on the basis of information derived from LC-MS and MS/MS experimental data as well as databases. They have demonstrated that this computation pipeline leads to retrieval of appropriate putative identifications that are prioritized to guide subsequent metabolite verification experiments. It is now well acknowledged that a protein does not execute its function alone, but through interactions with other proteins. Although a variety of experimental techniques such as Y2H, PCA, AP/MS, and protein microarrays have been developed to determine the interactions between proteins, bioinformatic tools play a major role in predicting protein-protein interactions (PPI). Lin and Chen in “Heterogeneous data integration by tree-augmented naïve Bayes for protein-protein interactions prediction” introduce a tree-augmented naïve Bayes (TAN) classifier to predict PPI by integrating heterogeneous biological data. With PPI data, PPI networks can be constructed. Recently much attention has been paid to studying proteins functions through detecting protein complexes and/or functional modules from PPI networks. To do that, typically one protein is selected as a seed and a complex is grown from the seed according to some criteria such as density and connectivity between a complex and its outside. Chen et al. in “Identifying protein complexes in protein-protein interaction networks by using clique seeds and graph entropy” propose a method of using the clique as seeds and graph entropy as the criterion to detect complexes in PPI networks. Lei et al. in “Clustering and overlapping modules detection in PPI network based on IBFO” propose an improved clustering method based on a bacteria foraging optimization (BFO) mechanism and intuitionistic fuzzy set (short for IBFO) to detect the overlapping modules. Li et al. in “hF-measure: a new measurement for evaluating clusters in protein-protein interaction networks” propose two new types of measurements to evaluate clusters (protein complexes) more finely and distinctly. One is hF-measureTf, a topology-free measurement while another is hF-measureTb, a topology-based measurement. Wang et al. in “Construction and application of dynamic protein interaction network based on time course gene expression data” propose a method to construct dynamic PPI networks by incorporating time-course gene expression data and advocate detecting protein complexes from such dynamic PPI networks. Huang et al. in “Predicting adverse drug reaction profiles by integrating protein interaction networks with drug structures” propose a framework for predicting 18 types of Adverse Drug Reactions (ADRs) by integrating PPI networks and chemical structure information. Thirteen submissions were originally invited for this Focus Issue. Each submission was reviewed by at least two experts in the field. We wish to thank all the Reviewers for their great efforts and expert comments in evaluating the manuscripts. We accepted ten submissions for this Focus Issue according to their quality and relevance. We also wish to thank the Editor-in-Chief Professor Michael J. Dunn and the Managing Editor Dr. Hans-Joachim Kraus for their great commitments and help in producing this Focus Issue. Fang-Xiang Wu Habtom Ressom Michael J. Dunn