Visualize HLA protein structures
Source:vignettes/articles/visualize-hla-structure.Rmd
visualize-hla-structure.Rmd
Introduction
Kamil Slowikowski
2024-06-05
In this vignette, we explore a few different methods for visualizing the molecular structure of HLA proteins. First, we’ll look at an example of how to use the NGLVieweR R package to show HLA protein structures. Next, we’ll use PyMOL to do the same thing.
What are the PDB identifiers for each HLA gene?
Here is a list of PDB identifiers you might consider using to represent each HLA protein:
HLA-A 2xpg
HLA-B 2bvp
HLA-C 4nt6
HLA-DP 3lqz
HLA-DQ 4z7w
HLA-DR 3pdo
Also try searching the PDB website for, e.g., "HLA-DR"
and see if there is a more appropriate structure for your analysis.
Using NGLVieweR
Let’s try to visualize the amino acid at PDB position 9 in the HLA-B protein structure.
We will visualize the structure of 2bvp from the Protein Data Bank (PDB).
Here is an example of how to do this with the NGLVieweR R package by Niels van der Velden:
# devtools::install_github("nvelden/NGLVieweR") # we need the latest version
library(NGLVieweR)
library(magrittr)
my_sele <- "9:A"
NGLVieweR("2bvp") %>%
stageParameters(
backgroundColor = "white",
zoomSpeed = 1,
cameraFov = 80
) %>%
addRepresentation(
type = "cartoon"
) %>%
addRepresentation(
type = "ball+stick",
param = list(
sele = my_sele
)
) %>%
addRepresentation(
type = "label",
param = list(
sele = my_sele,
labelType = "format",
labelFormat = "[%(resname)s]%(resno)s", # or enter custom text
labelGrouping = "residue", # or "atom" (eg. sele = "20:A.CB")
color = "black",
fontFamiliy = "sans-serif",
xOffset = 1,
yOffset = 0,
zOffset = 0,
fixedSize = TRUE,
radiusType = 1,
radiusSize = 5.5, # Label size
showBackground = TRUE
# backgroundColor="black",
# backgroundOpacity=0.5
)
) %>%
zoomMove(
center = my_sele,
zoom = my_sele,
duration = 0, # animation time in ms
z_offSet = -20
) %>%
setSpin()
In the view above, we see the blue peptide and the red HLA-B protein.
The tyrosine at PDB position 9 is highlighted with a
ball+stick
representation, and it is also labeled with a
text label. The structure is rotating so we can getter a better
view.
We can use hlabud to answer some questions about this HLA-B amino acid sequence.
The first question we need to ask is:
- Which IMGT position corresponds to the tyrosine at PDB position 9?
library(hlabud)
a <- hla_alignments("B")
We need to open the PDB Sequence Annotations tool in order to figure out which IMGT number corresponds to the PDB number 9. Here is a screenshot from that tool:
Next, we can view the amino acid sequence numbering from IMGT:
library(stringr)
a$alleles[which(str_detect(rownames(a$alleles), "B*57:03")),][1,1:50]
#> n30 n29 n28 n27 n26 n25 n24 n23
#> "M" "R" "V" "T" "A" "P" "R" "T"
#> n22 n22_n21 n21 n20 n19 n18 n17 n16
#> "V" "......" "L" "L" "L" "L" "W" "G"
#> n15 n14 n13 n12 n11 n10 n9 n8
#> "A" "V" "A" "L" "T" "E" "T" "W"
#> n7 n6 n5 n4 n3 n2 n1 1
#> "A" "G" "S" "H" "S" "M" "R" "Y"
#> 2 3 4 5 6 7 8 9
#> "F" "Y" "T" "A" "M" "S" "R" "P"
#> 10 11 12 13 14 15 16 17
#> "G" "R" "G" "E" "P" "R" "F" "I"
#> 18 18_19
#> "A" "....."
By eye, we can see that the sequence YFYT
starting at
PDB position 9 corresponds to the YFYT
sequence at IMGT
position 3. So, we have manually confirmed that PDB position 9 matches
with IMGT position 3.
Next, we might ask which HLA-B alleles have Y3?
head(my_alleles, 20)
#> [1] "B*07:02:01:01" "B*07:02:01:02" "B*07:02:01:03" "B*07:02:01:04"
#> [5] "B*07:02:01:05" "B*07:02:01:06" "B*07:02:01:07" "B*07:02:01:08"
#> [9] "B*07:02:01:09" "B*07:02:01:10" "B*07:02:01:11" "B*07:02:01:12"
#> [13] "B*07:02:01:13" "B*07:02:01:14" "B*07:02:01:15" "B*07:02:01:16"
#> [17] "B*07:02:01:17" "B*07:02:01:18" "B*07:02:01:19" "B*07:02:01:20"
What fraction of reported HLA-B alleles have tyrosine at IMGT position 3 (Y3)?
As it turns out, almost all of the HLA-B alleles have Y3.
Using PyMOL
PyMOL is one of my favorite methods for visualizing protein structures, because it allows us to change a residue in an existing protein and visualize the new mutated protein.
It only takes few lines of PyMOL to create a nice figure.
For example, if we want to quickly highlight positions 13 and 45 in HLA-DQB1, this snippet of PyMOL code will produce the figure below.
Here is a Bash script that will:
- Write a PyMOL script
- Run the PyMOL script with the
pymol
command
#!/usr/bin/env bash
# Write a pymol script
cat << EOF > script.pml
fetch 7kei
show cartoon
remove solvent
remove chain D
remove chain H
color teal, chain A
color orange, chain B
color purple, chain C
color red, chain B & resi 13
color red, chain B & resi 45
label n. CA and chain B & resi 13, "%s %s" % (resi, resn)
label n. CA and chain B & resi 45, "%s %s" % (resi, resn)
png 7kei.png, width=1200, height=800, dpi=300
EOF
# On Linux, we can just use `pymol` without making an alias
# On macOS, we need to make an alias
alias pymol=/Applications/PyMOL.app/Contents/MacOS/PyMOL
pymol -c script.pml
Here is what the PyMOL script will do:
- Load a structure from the Protein
Data Bank (PDB).
7kei
is the identifier for a published protein structure. - Color the HLA-DQA1 protein teal.
- Color the HLA-DQB1 protein orange.
- Color the peptide purple.
- We color residues 13 and 45 in HLA-DQB1 red.
- Label those residues with their positions and names.
- Write a PNG file with a view of the structure.
In the image above, I manually rotated the structure with my mouse
and added more text labels like "PDB: 7kei"
after saving
the file.
Other software for viewing PDB data
ChimeraX:
Python:
Javascript:
- https://www.rcsb.org/3d-view
- https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=7kei&bu=1
- https://github.com/nglviewer/ngl
- https://github.com/biasmv/pv
R: