Department of Biochemistry and Molecular Biology Sealy Center for Structural Biology Computational Biology About Us
SDAP Tools · New!

AllergenAI

A deep learning model predicting allergenicity based on protein sequence

Alphabetical listing of allergens ABCDEFGHIJKLMNOPQRSTUVWXYZ
AllergenAI Overview

Innovations in protein engineering can help redesign allergenic proteins to reduce adverse reactions in sensitive individuals. To accomplish this aim, a better knowledge of the molecular properties of allergenic proteins and the molecular features that make a protein allergenic is needed.

We present a novel AI-based tool, AllergenAI, to quantify the allergenic potential of a given protein. Our approach is solely based on protein sequences, differentiating it from previous tools that use some knowledge of the allergens’ physicochemical and other properties in addition to sequence homology.

We used the collected data on protein sequences of allergenic proteins as archived in the three well-established databases, SDAP 2.0, COMPARE, and AlgPred 2, to train a convolutional neural network and assessed its prediction performance by cross-validation.

We then used AllergenAI to find novel potential proteins of the cupin family in date palm, spinach, maize, and red clover plants with a high allergenicity score that might have an adverse allergenic effect on sensitive individuals. By analyzing the feature importance scores (FIS) of vicilins, we identified a proline-alanine-rich (P-A) motif in the top 50% of FIS regions that overlapped with known IgE epitope regions of vicilin allergens.

Furthermore, using ~1600 allergen structures in our SDAP database, we showed the potential to incorporate 3D information in a CNN model. AllergenAI is a novel foundation for identifying the critical features that distinguish allergenic proteins.

AllergenAI Prediction
⚠️ AllergenAI is trained on allergenic proteins having fewer than 1000 amino acids. Results for longer sequences may be unreliable.
Protein sequence in FASTA format. Maximum length: 1000 amino acids.
Use one word, e.g. Test
For information purpose only
Training and Validation Data
Available Models and Code
Step 1 — Pre-process input protein sequence
# Example command python AllergenAI_preprocess.py input.fasta # Example with provided test file python AllergenAI_preprocess.py Cupin.fasta
Step 2 — Run AllergenAI prediction
# Example command python Run_AllergenAI.py input.txt # Example with cupin test file python Run_AllergenAI.py Cupin.txt
  • Example one-hot encoded input (output of pre-processing step): Cupin.txt
Requirements
Software and packages required to train and run AllergenAI
  • Python 3
  • tensorflow
  • keras 2.11
  • numpy
  • pandas
# Create a dedicated conda environment conda update conda conda create -n allergenai python=3 conda activate allergenai # Install required packages conda install numpy conda install pandas conda install tensorflow pip install --upgrade tensorflow conda install keras=2.11
Mirror Website
Related Citation
  • Liu J, Negi SS, Yang C, Zhou X, Schein CH, Braun W, Kim P. AllergenAI: a deep learning model predicting allergenicity based on protein sequence. BMC Bioinformatics. 2025 Nov 18;26(1):279. [Abstract] [Full Paper]
  • Negi SS, Schein CH, Braun W. The updated Structural Database of Allergenic Proteins (SDAP 2.0) provides 3D models for allergens and incorporated Bioinformatics Tools. J Allergy Clin Immunol Glob. 2023;2(4):100162. [Abstract] [Full Paper]