Missing proteins account for about 18% of the human proteome, mainly a group of proteins that are speculatively expressed in bioinformatics but cannot be identified by antibody or mass spectrometry technology. There are still many functional proteins that have not been identified. Therefore, its physiological significance and function cannot be speculated at all. Under the operation of the Human Proteome Organization (HUPO), the Human Proteome Project (HPP) has brought together the fields of proteomics, bioinformatics, and molecular biology in 2010 to define the catalog of protein bodies and study biological and function of the disease, various countries have established joint research teams, and Taiwan has also been officially designated to perform identification and functional analysis of missing proteins on chromosome 4.
There are two major sub-studies in this project, namely "Chromosome-based Human Proteome Project (C-HPP)" and "Biology ∕ Disease Human" Proteome Project, B ∕ D-HPP) ". Among them, the main task of C-HPP is to analyze the correspondence between gene expression and protein expression. So far, in the comparative analysis of the human protein database (neXtProt), there are 20159 proteins that can be translated from the human genome. Among these proteins, whether by mass spectrometer or antibody, 17,008 have been confirmed to exist. In addition, there are about 1939 complementary DNA (complementary DNA, cDNA) reverse transcribed only by messenger RNA (messenger RNA, mRNA) from gene transcription, but there is no evidence of protein; there are 563 Proteins are defined as homologous or non-homologous proteins, that is, these protein sequences are so close that they may be misidentified as the same protein and cause the actual and predicted quantity estimates to be different; there are 77 proteins that belong to bioinformatics software The prediction should be a protein, but there is no evidence of gene transcription and protein translation. Therefore, adding up these three groups, a total of 2579 can't prove their performance by mass spectrometer or antibody, they are collectively called lost protein. At present, the genes of humans and mice have been completely decoded, and the corresponding genes between the chromosomes and the proteins they express can be compared. This platform will facilitate the discussion of various disease-related mechanisms, and provide the targets for basic and clinical research. The Nano Liquid Chromatography Analyzer and Quadrupole Column Common Rail Ion Well Mass Spectrometer (nLC-Q-Orbi) were used to analyze the missing proteins, and then screened & functioned by Human Protein Atlas (HPA).
Missing proteins have not been detected in biological samples or MS experiments and in order to understand the reasons for the lack of these proteins, including (1) the detection capability of MS technology limits the detection of multiple proteins, such as low-abundance proteins, hydrophobic proteins, membrane proteins and A protein with a protein content of less than 50 amino acids. (2) Some proteins may be located in abnormal tissues or cell types, or have not yet completed development, some specific disease states, or some specific cellular stress responses, you can define the score of PE (PE1 ~ 5) from neXtProt , To classify PE1 (identified protein), PE2 (transcription stage), PE3 (inferred from homology), PE4 (presumed presence of protein), PE5 (predicted suspicious or uncertain protein), such as embryo or fetus . The possible situation of protein analysis by mass spectrometry is as follows: (1) Gene is present but not expressed. (2) There is a sequencing error in the gene or transcription itself, resulting in a misprediction of the protein sequence. (3) Genes are expressed in certain experimental or pathophysiological environments. (4) The expression of the expressed genome in the sample is low. (5) The expressed genomes in the sample are difficult to extract because these genomes may be integrated into the cell membrane (membrane protein) or localized within the cell membrane (subcellular localization). (6) The protein expressed by the gene does not contain the trypsin cleavage point or cannot be cleaved. (7) Proteins with less than 9 amino acids of trypsin or peptides that cannot be amplified by mass spectrometry.
Proteomic technology based on mass spectrometry in the past ten years has matured and developed into a powerful and effective tool for systematic analysis of the entire protein profile in biological samples. Proteins are the units in biomolecules that mainly perform cellular functions. The research scope of the proteosome analysis platform includes the development of novel analytical techniques and clinical sample analysis. It mainly provides information on proteosomes to combine with biological information and other technologies to systematically study biological samples such as humans, animals or cell lines. The analysis of disease protein molecules by the full proteomic map, combined with the analysis of clinical pathology, to help researchers effectively use proteomics in clinical research.
At present, we provide whole proteosome analysis and human whole chromosome (22 + X + Y) missing protein analysis. The main basic technologies of the proteosome analysis platform are liquid chromatography mass spectrometry, matrix assisted laser desorption mass spectrometry, Mascot analysis The order analysis technique can analyze the missing protein analysis