Protein Secondary Structure Prediction

The predictor was trained using Bidirectional LSTM (Long Short-Term Memory), which is a kind of Recurrent Neural Network. It is useful for sequential data such that outputs of nearby amino acids in a sequence will have an influence on each other.

The dataset contains more than 394k amino acid sequences and their corresponding secondary structures.

The accuracy of the predictor is 85%.

Click here for further reading materials

Amino acid sequence should be entered according to the one-letter codes.

The output will be their secondary structure represented by eight letters using DSSP (Define Secondary Structure of Proteins).
   G = 3-turn helix. Min length 3 residues
   H = 4-turn helix (α helix). Minimum length 4 residues
   I = 5-turn helix (ϖ helix). Minimum length 5 residues
   T = hydrogen bonded turn (3, 4, or 5 turns)
   E = extended strand in parallel and/or anti-parallel β-sheet conformation. Min length 2 residues
   B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)
   S = bend (the only non-hydrogen-bond based assignment)
   C = coil (residues which are not in any of the above conformations)

We merge (E, B) into E, (H, G, I) into H, and (C, S, T) into C so that the ouput will only contain these three letters to represent more generalized structures.

Click here for further reading materials

Proteins have four levels of structure: primary, secondary, tertiary, and quaternary.

This program predicts only the secondary structure what refers to local folded structures that form within a polypeptide due to interactions between atoms of the backbone.

Click here for further reading materials

Enter your amino acid sequence here

Minimum length is 3 and maximum length is 256, so only the first 256 amino acids will be read if you enter more than that

Nonstandard amino acids (B, O, U, X, and Z) will be represented by *

Invalid Input