Skip to contents

This function reads the txt files that are provided by IMGTHLA.

Usage

read_alignments(file)

Arguments

file

File name for a txt file from IMGTHLA like "DQB1_prot.txt"

Value

A list with a character vector called sequences and two matrices alleles and onehot.

The matrix alleles has one row for each allele, and one column for each position, with the values representing the residues at each position in each allele. The matrix onehot has a one-hot encoding of the variants that distinguish the alleles, with one row for each allele and one column for each amino acid at each position.

Details

Consider using hla_alignments() instead of this function. If you already have your own txt file that you want to read, then you can read it with read_alignments("myfile.txt").

These are the sequences contained in each file:

  • {gene}_prot.txt has the amino acid sequence for each HLA allele.

  • {gene}_nuc.txt has the nucleotide sequence for the exons.

  • {gene}_gen.txt has the genomic sequence for the exons and introns.

Examples

my_file <- file.path(
  "https://github.com/ANHIG/IMGTHLA/raw",
  "5f2c562056f8ffa89aeea0631f2a52300ee0de17",
  "alignments/DRB1_prot.txt"
)
a <- read_alignments(my_file)
head(a$sequences)
#>                                                                                                                                                                                                                                                                DRB1*01:01:01:01 
#> "MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR.VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLGAGLFIYFRNQKGHSGLQPTGFLS" 
#>                                                                                                                                                                                                                                                                DRB1*01:01:01:02 
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------" 
#>                                                                                                                                                                                                                                                                DRB1*01:01:01:03 
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------" 
#>                                                                                                                                                                                                                                                                DRB1*01:01:01:04 
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------" 
#>                                                                                                                                                                                                                                                                DRB1*01:01:01:05 
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------" 
#>                                                                                                                                                                                                                                                                DRB1*01:01:01:06 
#> "------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------" 
a$alleles[1:5,1:5]
#>                  n29 n28 n27 n26 n25
#> DRB1*01:01:01:01 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:02 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:03 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:04 "M" "V" "C" "L" "K"
#> DRB1*01:01:01:05 "M" "V" "C" "L" "K"
a$onehot[1:5,1:5]
#>                  n29unk Mn29 n28unk Vn28 n27unk
#> DRB1*01:01:01:01      0    1      0    1      0
#> DRB1*01:01:01:02      0    1      0    1      0
#> DRB1*01:01:01:03      0    1      0    1      0
#> DRB1*01:01:01:04      0    1      0    1      0
#> DRB1*01:01:01:05      0    1      0    1      0