Skip to contents

First, we convert each allele name (e.g. A*01:01) to an amino acid sequence. The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length. The amino acid distance matrix we use is the one published by Grantham 1974 (doi:10.1126/science.185.4154.862), based on three physical properties of amino acids (composition, polarity, and molecular volume) that are correlated with an estimate of relative substitution frequency.

Usage

hla_divergence(
  alleles = c("A*01:01,A*02:01"),
  method = "grantham",
  release = "latest",
  verbose = FALSE
)

Arguments

alleles

A character vector of comma-delimited alleles for each individual. We usually expect two alleles per individual, but it is possible to have more (or fewer) copies due to copy number alterations. This function still works when each individual has a different number of alleles.

method

A pairwise amino acid matrix, or a method name: "grantham" or "uniform" to indicate which pairwise amino acid distance matrix to use. If you choose to pass a matrix, then it should be a 20x20 symmetric matrix with zeros on the diagonal, and the rownames and colnames should be the one-letter amino acid codes A R N D C Q E G H I L K M F P S T W Y V.

release

Default is "latest". Should be a release name like "3.51.0".

verbose

If TRUE, print messages along the way.

Value

A dataframe with divergence for each individual.

Details

The code in this function is a translation of the original Perl code by Tobias Lenz, which was published in Pierini & Lenz 2018 MolBiolEvol (https://doi.org/10.1093/molbev/msy116).

When comparing two amino acid sequences, only characters that are one of the 20 amino acids are considered in the divergence calculation, so gaps (and any other characters) do not count.

See also

hla_releases() to get a complete list of all release names.

amino_distance_matrix() to get a amino acid distance matrix that you can use with hla_divergence().

Examples

my_genos <- c("A*23:01:12,A*24:550", "A*25:12N,A*11:27",
              "A*24:381,A*33:85", "A*01:01:,A*01:01,A*02:01")
hla_divergence(my_genos, method = "grantham")
#>      A*23:01:12,A*24:550         A*25:12N,A*11:27         A*24:381,A*33:85 
#>                0.5131579                3.4736842                5.1078947 
#> A*01:01:,A*01:01,A*02:01 
#>                3.9982456 

# This is equivalent
hla_divergence(my_genos, method = amino_distance_matrix("grantham"))
#>      A*23:01:12,A*24:550         A*25:12N,A*11:27         A*24:381,A*33:85 
#>                0.5131579                3.4736842                5.1078947 
#> A*01:01:,A*01:01,A*02:01 
#>                3.9982456