1. picardmetrics(1)
  2. picardmetrics manual
  3. picardmetrics(1)

NAME

picardmetrics - Run Picard tools and collate multiple metrics files

SYNOPSIS

picardmetrics run [-f FILE] [-o DIR] [-r] [-k] <file.bam>

picardmetrics collate PREFIX DIR

picardmetrics refFlat <file.gtf[.gz]>

picardmetrics rRNA <file.gtf[.gz]>


# Example with provided data:

picardmetrics run -r -o out data/project1/sample1/sample1.bam

picardmetrics run -r -o out data/project1/sample2/sample2.bam

picardmetrics collate out/project1 out

DESCRIPTION

Picardmetrics is a Bash script that simplifies calling Picard tools and collates the different output files generated by Picard. It also has functions for generating the two input files required by CollectRnaSeqMetrics.

In order, picardmetrics run will do the following:

  1. Automatically create a sequence dictionary using your reference sequence.
  2. Create a new temporary BAM file that you can keep with option -k.
  3. Reorder the header of the BAM file to match the reference.
  4. Sort the reads in the BAM file by coordinate.
  5. Mark duplicates in the BAM file and report duplicate metrics.
  6. Run up to 8 additional Picard tools.

After running the tools, use picardmetrics collate to merge all of the generated metrics from multiple BAM files into tab-delimited files. Additionally, all of these tab-delimited files are consolidated into a single file with all metrics from all BAM files and all Picard tools.

These are the tools called by picardmetrics:

Read about the meaning of each metric: Picard metrics definitions.

COMMANDS AND OPTIONS

run

picardmetrics run [-f FILE] [-o DIR] [-r] <file.bam>

collate

picardmetrics collate PREFIX DIR

Collate output metrics files in DIR into one file with all metrics from all Picard tools and all BAM files:

PREFIX-all-metrics.tsv

Also write 5 collated histogram files:

PREFIX-base-distribution-by-cycle-histogram.tsv
PREFIX-gc-bias-histogram.tsv
PREFIX-insert-size-histogram.tsv
PREFIX-library-complexity-histogram.tsv
PREFIX-quality-histogram.tsv

refFlat

picardmetrics refFlat <file.gtf[.gz]>

Create <file.refFlat> for the REF_FLAT argument of the CollectRnaSeqMetrics tool. Run this command on your optionally gzipped GTF file, and the output file will be written to the same directory as the GTF file.

picardmetrics run will automatically create the .refFlat file for you if you define the GTF variable in the configuration file.

rRNA

picardmetrics rRNA <file.gtf[.gz]>

Create <file.rRNA.list> for the RIBOSOMAL_INTERVALS argument of the CollectRnaSeqMetrics tool. Run this command on your optionally gzipped GTF file, and the output file will be written to the same directory as the GTF file.

picardmetrics run will automatically create the .rRNA.list file for you if you define the GTF variable in the configuration file.

CONFIGURATION FILE

The picardmetrics.conf file must define the following variables:

EXAMPLES

Here are three examples of how you can run the program:

  1. Run picardmetrics sequentially in a for loop on multiple BAM files.

  2. Run in parallel with GNU parallel, using multiple processors or multiple servers.

  3. Run in parallel with an LSF queue, distributing jobs to multiple servers.

Example 1: Sequential

Run picardmetrics on the provided example BAM files:

for f in data/project1/sample?/sample?.bam; do
  picardmetrics run -r -o out $f
done

Collate the generated metrics files:

picardmetrics collate out/project1 out

Next, use the file out/project1-all-metrics.tsv to explore the metrics.

Example 2: GNU parallel

Run 2 jobs in parallel:

parallel -j2 \
  picardmetrics run -o /path/to/out -r {} ::: data/project1/sample?/sample?.bam

If you have many files, or if you want to run jobs on multiple servers, it's a good idea to put the full paths in a text file.

Here, we have ssh access to server1 and server2. We're launching 16 jobs on server1 and 8 jobs on server2. You'll have to make sure that picardmetrics is in your PATH on all servers.

ls /full/path/to/data/project1/sample*/sample*.bam > bams.txt
parallel -S 16/server1,8/server2 \
  picardmetrics run -r -o /path/to/out {} :::: bams.txt

Example 3: LSF

I recommend you install and use asub (see below) to submit jobs easily. This command will submit a job for each BAM file to the myqueue LSF queue.

cat bams.txt | xargs -i echo picardmetrics run -r -o /path/to/out {} \
  | asub -j picardmetrics_jobs -q myqueue

SOURCE CODE

Find the source code here:
https://github.com/slowkow/picardmetrics

BUGS

Please report issues here:
https://github.com/slowkow/picardmetrics/issues

AUTHOR

Kamil Slowikowski from Harvard University wrote picardmetrics. Many developers at the Broad Institute wrote Picard. Heng Li from the Sanger Institute wrote samtools. Aaron Quinlan from the University of Utah wrote stats.

SEE ALSO

Picard
samtools
stats
GNU parallel
LSF
asub

  1. picardmetrics-0.2.4
  2. July 2016
  3. picardmetrics(1)