Package 'SeqKat'

Title: Detection of Kataegis
Description: Kataegis is a localized hypermutation occurring when a region is enriched in somatic SNVs. Kataegis can result from multiple cytosine deaminations catalyzed by the AID/APOBEC family of proteins. This package contains functions to detect kataegis from SNVs in BED format. This package reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score. Yousif, F. et al.; The Origins and Consequences of Localized and Global Somatic Hypermutation; Biorxiv 2018 <doi:10.1101/287839>.
Authors: Fouad Yousif, Xihui Lin, Fan Fan, Christopher Lalansingh, John Macdonald
Maintainer: Paul C. Boutros <[email protected]>
License: GPL-2
Version: 0.0.8
Built: 2024-11-07 03:56:43 UTC
Source: https://github.com/cran/SeqKat

Help Index


Combine Table

Description

Merges overlapped windows to identify genomic boundaries of kataegic events. This function also assigns hypermuation and kataegic score for combined windows

Usage

combine.table(test.table, somatic, mutdistance, segnum, output.name)

Arguments

test.table

Data frame of kataegis test scores

somatic

Data frame of somatic variants

mutdistance

The maximum intermutational distance allowed for SNVs to be grouped in the same kataegic event. Recommended value: 3.2

segnum

Minimum mutation count. The minimum number of mutations required within a cluster to be identified as kataegic. Recommended value: 4

output.name

Name of the generated output directory.

Author(s)

Fouad Yousif

Fan Fan

Examples

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/somatic.rda"
		)
	);

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/final.score.rda"
		)
	);

combine.table(
	final.score,
	somatic,
	3.2,
	4,
	tempdir()
	);

Final Score

Description

Assigns hypermutation score (hm.score) and kataegic score (k.score)

Usage

final.score(test.table, cutoff, somatic, output.name)

Arguments

test.table

Data frame of kataegis test scores

cutoff

The minimum hypermutation score used to classify the windows in the sliding binomial test as significant windows. The score is calculated per window as follows: -log10(binomial test p-value). Recommended value: 5

somatic

Data frame of somatic variants

output.name

Name of the generated output directory.

Author(s)

Fan Fan

Fouad Yousif

Examples

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/somatic.rda"
		)
	);

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/test.table.rda"
		)
	);

final.score(
	test.table,
	5,
	somatic,
	tempdir()
	);

Get Context

Description

Gets the 5' and 3' neighboring bases to the mutated base

Usage

get.context(file, start)

Arguments

file

Reference files directory

start

The position of the mutation gene

Value

The trinucleotide context.

Author(s)

Fouad Yousif

Fan Fan

Examples

example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);
get.context(file.path(example.ref.dir, 'chr4.fa'), c(1582933, 1611781))

get.exprobntcx

Description

Gets the expected probability for each trinucleotide and total number of tcx

Usage

get.exprobntcx(somatic, ref.dir, trinucleotide.count.file)

Arguments

somatic

Data frame of somatic variants

ref.dir

Path to a directory containing the reference genome.

trinucleotide.count.file

A tab seprarated file containing a count of all trinucleotides present in the reference genome. This can be generated with the get.trinucleotide.counts() function in this package.

Author(s)

Fan Fan

Fouad Yousif

Examples

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/somatic.rda"
		)
	);

trinucleotide.count.file <- paste0(
	path.package("SeqKat"),
	"/extdata/tn_count.txt"
	);

example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);

get.exprobntcx(somatic, example.ref.dir, trinucleotide.count.file)

Get Nucleotide Chunk Counts

Description

Obtain counts for all possible trinucleotides within a specified genomic region

Usage

get.nucleotide.chunk.counts(key, chr, upstream = 1, downstream = 1,
  start = 1, end = -1)

Arguments

key

List of specify trinucleotides to count

chr

Chromosome

upstream

Length upstream to read

downstream

Length downstream to read

start

Starting position

end

Ending position

Author(s)

Fouad Yousif

Examples

example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);

bases.raw <- c('A','C','G','T','N');
tri.types.raw <- c(
	outer(
		c(outer(bases.raw, bases.raw, function(x, y) paste0(x,y))),
		bases.raw, function(x, y) paste0(x,y))
	);
tri.types.raw <- sort(tri.types.raw);
get.nucleotide.chunk.counts(
	tri.types.raw,
	file.path(example.ref.dir, 'chr4.fa'),
	upstream = 1,
	downstream = 1,
	start = 1,
	end = -1
	);

Get Pair

Description

Generates the reverse compliment of a nucleotide sequence

Usage

get.pair(x)

Arguments

x

asdf

Details

Reverses and compliments the bases of the input string. Bases must be (A, C, G, T, or N).

Author(s)

Fouad Yousif

Examples

get.pair("GATTACA")

Get Trinucleotides

Description

Count the frequencies of 32 trinucleotide in a region respectively

Usage

get.tn(chr, start.bp, end.bp, ref.dir)

Arguments

chr

Chromosome

start.bp

Starting position

end.bp

Ending position

ref.dir

Path to a directory containing the reference genome.

Author(s)

Fan Fan

Examples

example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);
get.tn(chr=4, start.bp=1, end.bp=-1, example.ref.dir)

Get Top Trinucleotides

Description

Generate a tri-nucleotide summary for each sliding window

Usage

get.toptn(somatic.subset, chr, start.bp, end.bp, ref.dir)

Arguments

somatic.subset

Data frame of somatic variants subset for a specific chromosome

chr

Chromosome

start.bp

Starting position

end.bp

Ending position

ref.dir

Path to a directory containing the reference genome.

Author(s)

Fan Fan

Fouad Yousif

Examples

## Not run: 
get.toptn(somatic.subset, chr, start.bp, end.bp, ref.dir)

## End(Not run)

Get Trinucleotide Counts

Description

Aggregates the total counts of each possible trinucleotide.

Usage

get.trinucleotide.counts(ref.dir, ref.name, output.dir)

Arguments

ref.dir

Path to a directory containing the reference genome.

ref.name

Name of the reference genome being used (i.e. hg19, GRCh38, etc)

output.dir

Path to a directory where output will be created.

Author(s)

Fan Fan

Fouad Yousif

Examples

## Not run: 
get.trinucleotide.counts(ref.dir, "hg19", tempdir());

## End(Not run)

SeqKat

Description

Kataegis detection from SNV BED files

Usage

seqkat(sigcutoff = 5, mutdistance = 3.2, segnum = 4, ref.dir = NULL,
  bed.file = "./", output.dir = "./", chromosome = "all",
  chromosome.length.file = NULL, trinucleotide.count.file = NULL)

Arguments

sigcutoff

The minimum hypermutation score used to classify the windows in the sliding binomial test as significant windows. The score is calculated per window as follows: -log10(binomial test p-value). Recommended value: 5

mutdistance

The maximum intermutational distance allowed for SNVs to be grouped in the same kataegic event. Recommended value: 3.2

segnum

Minimum mutation count. The minimum number of mutations required within a cluster to be identified as kataegic. Recommended value: 4

ref.dir

Path to a directory containing the reference genome. Each chromosome should have its own .fa file and chromosomes X and Y are named as chr23 and chr24. The fasta files should contain no header

bed.file

Path to the SNV BED file. The BED file should contain the following information: Chromosome, Position, Reference allele, Alternate allele

output.dir

Path to a directory where output will be created.

chromosome

The chromosome to be analysed. This can be (1, 2, ..., 23, 24) or "all" to run sequentially on all chromosomes.

chromosome.length.file

A tab separated file containing the lengths of all chromosomes in the reference genome.

trinucleotide.count.file

A tab seprarated file containing a count of all trinucleotides present in the reference genome. This can be generated with the get.trinucleotide.counts() function in this package.

Details

The default paramters in SeqKat have been optimized using Alexanrov's "Signatures of mutational processes in human cancer" dataset. SeqKat accepts a BED file and outputs the results in TXT format. A file per chromosome is generated if a kataegic event is detected, otherwise no file is generated. SeqKat reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score.

Author(s)

Fouad Yousif

Fan Fan

Christopher Lalansingh

Examples

example.bed.file <- paste0(
	path.package("SeqKat"),
	"/extdata/test/PD4120a-chr4-1-2000000_test_snvs.bed"
	);
example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);
example.chromosome.length.file <- paste0(
	path.package("SeqKat"),
	"/extdata/test/length_hg19_chr_test.txt"
	);
seqkat(
	5,
	3.2,
	2,
	bed.file = example.bed.file,
	output.dir = tempdir(),
	chromosome = "4",
	ref.dir = example.ref.dir,
	chromosome.length.file = example.chromosome.length.file
	);

Test Kataegis

Description

Performs exact binomial test to test the deviation of the 32 tri-nucleotides counts from expected

Usage

test.kataegis(chromosome.num, somatic, units, exprobntcx, output.name, ref.dir,
  chromosome.length.file)

Arguments

chromosome.num

Chromosome

somatic

Data frame of somatic variants

units

Base window size

exprobntcx

Expected probability for each trinucleotide and total number of tcx

output.name

Name of the generated output directory.

ref.dir

Path to a directory containing the reference genome.

chromosome.length.file

A tab separated file containing the lengths of all chromosomes in the reference genome.

Author(s)

Fouad Yousif

Examples

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/somatic.rda"
		)
	);

load(
	paste0(
		path.package("SeqKat"),
		"/extdata/test/exprobntcx.rda"
		)
	);

example.chromosome.length.file <- paste0(
	path.package("SeqKat"),
	"/extdata/test/length_hg19_chr_test.txt"
	);

example.ref.dir <- paste0(
	path.package("SeqKat"),
	"/extdata/test/ref/"
	);

test.kataegis(
	4,
	somatic,
	2,
	exprobntcx,
	tempdir(),
	example.ref.dir,
	example.chromosome.length.file
	);