[No authors listed]
Metalloproteins represent a ubiquitous group of molecules which are crucial to the survival of all living organisms. While several metal-binding motifs have been defined, it remains challenging to confidently identify metalloproteins from primary protein sequences using computational approaches alone. Here, we describe a comprehensive strategy based on a machine learning approach to design and assess a penalized generalized linear model. We used this strategy to detect members of the iron-sulfur cluster protein family. A new category of descriptors, whose profile is based on profile hidden Markov models, encoding structural information was combined with public descriptors into a linear model. The model was trained and tested on distinct datasets composed of well-characterized iron-sulfur protein sequences, and the resulting model provided higher sensitivity compared to a motif-based approach, while maintaining a good level of specificity. Analysis of this linear model allows us to detect and quantify the contribution of each descriptor, providing us with a better understanding of this complex protein family along with valuable indications for further experimental characterization. Two newly-identified proteins, YhcC and YdiJ, were functionally validated as genuine iron-sulfur proteins, confirming the prediction. The computational model was then applied to over 550 prokaryotic genomes to screen for iron-sulfur proteomes; the results are publicly available at: . This study represents a proof-of-concept for the application of a penalized linear model to identify metalloprotein superfamilies on a large-scale. The application employed here, screening for iron-sulfur proteomes, provides new candidates for further biochemical and structural analysis as well as new resources for an extensive exploration of iron-sulfuromes in the microbial world.
KEYWORDS: {{ getKeywords(articleDetailText.words) }}
Sample name | Organism | Experiment title | Sample type | Library instrument | Attributes | |||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
{{attr}} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
{{ dataList.sampleTitle }} | {{ dataList.organism }} | {{ dataList.expermentTitle }} | {{ dataList.sampleType }} | {{ dataList.libraryInstrument }} | {{ showAttributeName(index,attr,dataList.attributes) }} |
{{ list.authorName }} {{ list.authorName }} |