Calculate HLA divergence for each individual

First, we convert each allele name (e.g. A*01:01) to an amino acid sequence. The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length. The amino acid distance matrix we use is the one published by Grantham 1974 (doi:10.1126/science.185.4154.862), based on three physical properties of amino acids (composition, polarity, and molecular volume) that are correlated with an estimate of relative substitution frequency.

Usage

hla_divergence(
  alleles = c("A*01:01,A*02:01"),
  method = "grantham",
  release = "latest",
  positions = NULL,
  verbose = FALSE
)

Arguments

alleles: A character vector of comma-delimited alleles for each individual. We usually expect two alleles per individual, but it is possible to have more (or fewer) copies due to copy number alterations. This function still works when each individual has a different number of alleles.
method: A pairwise amino acid matrix, or a method name: "grantham" or "uniform" to indicate which pairwise amino acid distance matrix to use. If you choose to pass a matrix, then it should be a 20x20 symmetric matrix with zeros on the diagonal, and the rownames and colnames should be the one-letter amino acid codes A R N D C Q E G H I L K M F P S T W Y V.
release: Default is "latest". Should be a release name like "3.51.0".
positions: Only use these positions in the calculation. Should be column names from the hla_alignments()$alleles matrix.
verbose: If TRUE, print messages along the way.

Value

A dataframe with divergence for each individual.

Details

The code in this function is a translation of the original Perl code by Tobias Lenz, which was published in Pierini & Lenz 2018 MolBiolEvol (https://doi.org/10.1093/molbev/msy116).

When comparing two amino acid sequences, only characters that are one of the 20 amino acids are considered in the divergence calculation, so gaps (and any other characters) do not count.

Examples

my_genos <- c("A*23:01:12,A*24:550",
              "A*25:12N,A*11:27",
              "A*24:381,A*33:85",
              "A*01:01:,A*01:01,A*02:01") # this individual has 3 alleles
hla_divergence(my_genos, method = "grantham")
#>      A*23:01:12,A*24:550         A*25:12N,A*11:27         A*24:381,A*33:85 
#>                 0.511811                 3.464567                 5.094488 
#> A*01:01:,A*01:01,A*02:01 
#>                 3.987752 

# This is equivalent
hla_divergence(my_genos, method = amino_distance_matrix("grantham"))
#>      A*23:01:12,A*24:550         A*25:12N,A*11:27         A*24:381,A*33:85 
#>                 0.511811                 3.464567                 5.094488 
#> A*01:01:,A*01:01,A*02:01 
#>                 3.987752 

# Compute divergence on a subset of positions:
hla_divergence(my_genos, positions = c("n1", "1", "2", "3", "4"))
#>      A*23:01:12,A*24:550         A*25:12N,A*11:27         A*24:381,A*33:85 
#>                        0                        0                        0 
#> A*01:01:,A*01:01,A*02:01 
#>                        0

Usage

Arguments

Value

Details

See also

Examples