Convert a set of genotype names into a dosage matrix of each residue at each position
Source:R/hlabud.R
dosage.Rd
For each genotype name, return the the dosage matrix for each residue (amino acid or nucleotide) at each position.
Arguments
- mat
A one-hot encoded matrix with one row per allele and one column for each residue (amino acid or nucleotide) at each position.
- names
Input character vector with one genotype for each individual. All entries must be present in
rownames(mat)
.- drop_constants
Filter out constant amino acid positions. TRUE by default.
- drop_duplicates
Filter out duplicate amino acid positions. FALSE by default.
- verbose
If TRUE, print messages along the way.
Value
A matrix with one row for each input genotype, and one column for each residue at each position.
Details
Each genotype should be represented like this "HLA-A*01:01,HLA-A*01:01"
By default, the returned matrix is filtered to exclude:
positions where all input genotypes have the same allele
Examples
DRB1_file <- file.path(
"https://github.com/ANHIG/IMGTHLA/raw",
"5f2c562056f8ffa89aeea0631f2a52300ee0de17",
"alignments/DRB1_prot.txt"
)
a <- read_alignments(DRB1_file)
genotypes <- c(
"DRB1*12:02:02:03,DRB1*12:02:02:03,DRB1*14:54:02",
"DRB1*04:174,DRB1*15:152",
"DRB1*04:56:02,DRB1*15:01:48",
"DRB1*14:172,DRB1*04:160",
"DRB1*04:359,DRB1*04:284:02"
)
dosage <- dosage(a$onehot, genotypes)
dosage[,1:5]
#> n29unk Mn29 n28unk Vn28 n27unk
#> DRB1*12:02:02:03,DRB1*12:02:02:03,DRB1*14:54:02 1 2 1 2 1
#> DRB1*04:174,DRB1*15:152 2 0 2 0 2
#> DRB1*04:56:02,DRB1*15:01:48 2 0 2 0 2
#> DRB1*14:172,DRB1*04:160 2 0 2 0 2
#> DRB1*04:359,DRB1*04:284:02 2 0 2 0 2