First, we convert each allele name (e.g. A*01:01
) to an amino acid sequence.
The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length.
The amino acid distance matrix we use is the one published by Grantham 1974 (doi:10.1126/science.185.4154.862), based on three physical properties of amino acids (composition, polarity, and molecular volume) that are correlated with an estimate of relative substitution frequency.
Usage
hla_divergence(
alleles = c("A*01:01,A*02:01"),
method = "grantham",
release = "latest",
verbose = FALSE
)
Arguments
- alleles
A character vector of comma-delimited alleles for each individual. We usually expect two alleles per individual, but it is possible to have more (or fewer) copies due to copy number alterations. This function still works when each individual has a different number of alleles.
- method
A pairwise amino acid matrix, or a method name:
"grantham"
or"uniform"
to indicate which pairwise amino acid distance matrix to use. If you choose to pass a matrix, then it should be a 20x20 symmetric matrix with zeros on the diagonal, and the rownames and colnames should be the one-letter amino acid codesA R N D C Q E G H I L K M F P S T W Y V
.- release
Default is "latest". Should be a release name like "3.51.0".
- verbose
If TRUE, print messages along the way.
Details
The code in this function is a translation of the original Perl code by Tobias Lenz, which was published in Pierini & Lenz 2018 MolBiolEvol (https://doi.org/10.1093/molbev/msy116).
When comparing two amino acid sequences, only characters that are one of the 20 amino acids are considered in the divergence calculation, so gaps (and any other characters) do not count.
See also
hla_releases()
to get a complete list of all release names.
amino_distance_matrix()
to get a amino acid distance matrix that you can use with hla_divergence()
.
Examples
my_genos <- c("A*23:01:12,A*24:550", "A*25:12N,A*11:27",
"A*24:381,A*33:85", "A*01:01:,A*01:01,A*02:01")
hla_divergence(my_genos, method = "grantham")
#> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85
#> 0.5131579 3.4736842 5.1078947
#> A*01:01:,A*01:01,A*02:01
#> 3.9982456
# This is equivalent
hla_divergence(my_genos, method = amino_distance_matrix("grantham"))
#> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85
#> 0.5131579 3.4736842 5.1078947
#> A*01:01:,A*01:01,A*02:01
#> 3.9982456