This function reads the txt files that are provided by IMGTHLA.
Value
A list with a character vector called sequences
and two matrices alleles
and onehot
.
The matrix alleles
has one row for each allele, and one column for each position, with the values representing the residues at each position in each allele.
The matrix onehot
has a one-hot encoding of the variants that distinguish the alleles, with one row for each allele and one column for each amino acid at each position.
Details
Consider using hla_alignments()
instead of this function. If you already have your own txt file that you want to read, then you can read it with read_alignments("myfile.txt")
.
These are the sequences contained in each file:
{gene}_prot.txt
has the amino acid sequence for each HLA allele.{gene}_nuc.txt
has the nucleotide sequence for the exons.{gene}_gen.txt
has the genomic sequence for the exons and introns.
Examples
my_file <- file.path(
"https://github.com/ANHIG/IMGTHLA/raw",
"5f2c562056f8ffa89aeea0631f2a52300ee0de17",
"alignments/DRB1_prot.txt"
)
a <- read_alignments(my_file)
head(a$sequences)
#> DRB1*01:01:01:01
#> "MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR.VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLGAGLFIYFRNQKGHSGLQPTGFLS"
#> DRB1*01:01:01:02
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------"
#> DRB1*01:01:01:03
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------"
#> DRB1*01:01:01:04
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------"
#> DRB1*01:01:01:05
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------"
#> DRB1*01:01:01:06
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------"
a$alleles[1:5,1:5]
#> n29 n28 n27 n26 n25
#> DRB1*01:01:01:01 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:02 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:03 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:04 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:05 "M" "V" "C" "L" "K"
a$onehot[1:5,1:5]
#> n29unk Mn29 n28unk Vn28 n27unk
#> DRB1*01:01:01:01 0 1 0 1 0
#> DRB1*01:01:01:02 0 1 0 1 0
#> DRB1*01:01:01:03 0 1 0 1 0
#> DRB1*01:01:01:04 0 1 0 1 0
#> DRB1*01:01:01:05 0 1 0 1 0