Owen Borville Learning: Ideas for a Better World
  • HOME
  • ARCHAEOLOGY BIBLE HISTORY
  • ASTRONOMY PHYSICS
  • BIOSCIENCES BIOMEDICAL
  • ENGINEERING TECHNOLOGY
  • ENVIRONMENTAL SCIENCES
  • PHILOSOPHY RELIGION
  • POLITICS LAW
  • TRAVEL GEOGRAPHY
  • ABOUT
  • MANAGEMENT BUSINESS EDUCATION LEADERSHIP
Unknown Protein Database
by Owen Borville
July 29, 2024
Biology, Biosciences

Scientists have created an “unknome” of proteins encoded by human genes. These proteins are known to exist, but their functions remain mostly mysterious. 

The Unknome database: To create the unknome, researchers started with the approximately 20,000 genes that code for proteins identified in humans. They grouped closely related human genes or proteins based on similar functions, resulting in around 7,500 protein clusters.

Scoring proteins: Each protein cluster received a score based on how much information exists about its members. This scoring considers entries in the Gene Ontology Resource, which catalogs gene functions. Even if a human protein hasn’t been directly studied, it scores highly if an equivalent protein has been well-studied in another animal.

Surprising discoveries: Over 2,200 proteins have scores below 2, and more than 800 score 0. These low-scoring proteins might not have been studied because they were assumed to be unimportant. However, using RNA interference (RNAi) in fruit flies, researchers found that some of these proteins are essential for survival. This revelation challenged assumptions about what constitutes an “important” gene.

In summary, the unknome sheds light on our vast protein landscape, revealing that there’s still much to learn about these mysterious molecules that were part of a special creation by an Intelligent Designer.

When it comes to studying unknown proteins, scientists employ various methods to unravel their functions and properties. Some key approaches are:

Bioinformatics and sequence analysis: Homology search: Researchers compare the amino acid sequence of an unknown protein to known proteins in databases (such as UniProt or NCBI) to find similar sequences. If a known protein has a similar sequence, it might share similar functions.

Domain prediction: Identifying conserved domains within the protein sequence can provide clues about its function. Tools like Pfam or SMART help with domain prediction.

Expression and purification: Cloning and Expression: Scientists clone the gene encoding the unknown protein into a host (like E. coli or yeast) and express it. This allows them to produce the protein in larger quantities.

Purification: techniques like affinity chromatography or gel filtration help isolate the protein from other cellular components.

Structural biology: X-ray crystallography determines the 3D structure of proteins by analyzing diffraction patterns from crystallized protein samples. NMR Spectroscopy: provides structural information based on nuclear magnetic resonance interactions. Cryo-Electron Microscopy (Cryo-EM): visualizes protein structures at near-atomic resolution without crystallization.

Functional assays: Enzyme assays: measure enzymatic activity (e.g., substrate conversion) to infer protein function. Binding assays: test interactions with other molecules (ligands, DNA, RNA, etc.). Cell-based assays: observe protein effects in living cells.

Proteomics and mass spectrometry: Shotgun Proteomics: Identifies proteins in complex mixtures by digesting them into peptides and analyzing their mass spectra.
Tandem Mass Spectrometry (MS/MS): Provides sequence information for peptides, aiding protein identification.

Mass spectrometry is a powerful technique used to analyze proteins. In the “bottom-up” approach, full-length proteins are first broken down into smaller fragments called peptides. These peptides are then measured using mass spectrometers, which provide information about their mass-to-charge ratios. By analyzing the resulting spectra, researchers can identify the presence of specific peptides and infer the original protein sequence.

Functional Genomics: RNA Interference (RNAi): Silences specific genes to study the effects on cellular processes. CRISPR/Cas9 allows precise gene editing to investigate protein function.

Collaboration and literature mining: Scientists collaborate across disciplines and explore existing literature to find relevant information about similar proteins or pathways.
The path to understanding unknown proteins often involves a combination of these methods.  

The Dark Proteome: Beyond the known proteins, there’s an even more mysterious realm called the “dark proteome.” Inside this dark proteome are proteins that scientists believe should exist but haven’t yet been found. It includes proteins with unknown structures and roles, waiting to be unraveled by future research.

Edman Degradation, developed by Pehr Edman, is a method that allows scientists to sequence amino acids in a peptide. The amino-terminal residue of the protein is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues.

Additionally, recent research has revealed a previously unknown mechanism for degrading short-lived nuclear proteins. Scientists discovered a protein called midnolin that directly grabs these proteins and pulls them into the cellular waste-disposal system (the proteasome), where they are destroyed. This discovery could have implications for controlling protein levels related to brain function, immune response, and development.

Predicting protein structures is a challenge in computational biology. The homology-based structure prediction approach relies on the fact that proteins with similar sequences often have similar structures. Scientists compare the target protein’s sequence to known structures (templates) in databases. If a close match is found, the predicted structure is based on the template’s structure.

Threading (fold recognition) identifies compatible folds for a given sequence by threading it onto known structures. It assesses how well the sequence fits into different structural templates. Limitation: It assumes that the native fold exists among the templates considered.

Advanced tools (e.g., AlphaFold-2): AlphaFold-2, developed by DeepMind, uses neural networks to predict protein structures. AlphaFold-2 combines evolutionary information, multiple sequence alignments, and 3D modeling. AlphaFold-2’s accuracy has revolutionized protein structure prediction.

Predicting protein structures is complex, but it contributes to our understanding of biological molecules.

The protein folding problem is the challenge scientists face when attempting to predict the 3D structure of proteins based solely on their amino acid sequence. Proteins are synthesized as linear chains, but they fold into complex, globular shapes. Understanding protein structure is crucial because it directly relates to their function.

However, the DNA sequence alone provides only the primary structure, leaving us with the task of determining secondary and tertiary structures, which are the exact 3D shapes of these intricate molecules. Experimental methods and computational predictions play a vital role in unraveling this mystery.

blog.rootsofprogress.org
dw.com
ebi.ac.uk
bitesizebio.com
en.wikipedia.org
academic.oup.com
phys.org
atascientific.com.au
hms.harvard.edu
sciencedaily.com
scitechdaily.com
link.springer.com
genengnews.com
biorxiv.org
newscientist.com
bing.com
cen.acs.org
Archaeology Astronomy Bible Studies Biosciences Business Education Engineering Environmental Patterns in Nature Philosophy & Religion Politics Travel Home About Contact
Owen Borville Learning: Ideas for a Better World offers an online, innovative, learning platform for students and researchers that are passionate for learning, research, and have a desire to challenge the established consensus of thought and improve the world.
​
Copyright 2018-2026. Owen Borville Learning: Ideas for a Better World
  • HOME
  • ARCHAEOLOGY BIBLE HISTORY
  • ASTRONOMY PHYSICS
  • BIOSCIENCES BIOMEDICAL
  • ENGINEERING TECHNOLOGY
  • ENVIRONMENTAL SCIENCES
  • PHILOSOPHY RELIGION
  • POLITICS LAW
  • TRAVEL GEOGRAPHY
  • ABOUT
  • MANAGEMENT BUSINESS EDUCATION LEADERSHIP