Title: | Detection of Kataegis |
---|---|
Description: | Kataegis is a localized hypermutation occurring when a region is enriched in somatic SNVs. Kataegis can result from multiple cytosine deaminations catalyzed by the AID/APOBEC family of proteins. This package contains functions to detect kataegis from SNVs in BED format. This package reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score. Yousif, F. et al.; The Origins and Consequences of Localized and Global Somatic Hypermutation; Biorxiv 2018 <doi:10.1101/287839>. |
Authors: | Fouad Yousif, Xihui Lin, Fan Fan, Christopher Lalansingh, John Macdonald |
Maintainer: | Paul C. Boutros <[email protected]> |
License: | GPL-2 |
Version: | 0.0.8 |
Built: | 2024-11-07 03:56:43 UTC |
Source: | https://github.com/cran/SeqKat |
Merges overlapped windows to identify genomic boundaries of kataegic events. This function also assigns hypermuation and kataegic score for combined windows
combine.table(test.table, somatic, mutdistance, segnum, output.name)
combine.table(test.table, somatic, mutdistance, segnum, output.name)
test.table |
Data frame of kataegis test scores |
somatic |
Data frame of somatic variants |
mutdistance |
The maximum intermutational distance allowed for SNVs to be grouped in the same kataegic event. Recommended value: 3.2 |
segnum |
Minimum mutation count. The minimum number of mutations required within a cluster to be identified as kataegic. Recommended value: 4 |
output.name |
Name of the generated output directory. |
Fouad Yousif
Fan Fan
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); load( paste0( path.package("SeqKat"), "/extdata/test/final.score.rda" ) ); combine.table( final.score, somatic, 3.2, 4, tempdir() );
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); load( paste0( path.package("SeqKat"), "/extdata/test/final.score.rda" ) ); combine.table( final.score, somatic, 3.2, 4, tempdir() );
Assigns hypermutation score (hm.score) and kataegic score (k.score)
final.score(test.table, cutoff, somatic, output.name)
final.score(test.table, cutoff, somatic, output.name)
test.table |
Data frame of kataegis test scores |
cutoff |
The minimum hypermutation score used to classify the windows in the sliding binomial test as significant windows. The score is calculated per window as follows: -log10(binomial test p-value). Recommended value: 5 |
somatic |
Data frame of somatic variants |
output.name |
Name of the generated output directory. |
Fan Fan
Fouad Yousif
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); load( paste0( path.package("SeqKat"), "/extdata/test/test.table.rda" ) ); final.score( test.table, 5, somatic, tempdir() );
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); load( paste0( path.package("SeqKat"), "/extdata/test/test.table.rda" ) ); final.score( test.table, 5, somatic, tempdir() );
Gets the 5' and 3' neighboring bases to the mutated base
get.context(file, start)
get.context(file, start)
file |
Reference files directory |
start |
The position of the mutation gene |
The trinucleotide context.
Fouad Yousif
Fan Fan
example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); get.context(file.path(example.ref.dir, 'chr4.fa'), c(1582933, 1611781))
example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); get.context(file.path(example.ref.dir, 'chr4.fa'), c(1582933, 1611781))
Gets the expected probability for each trinucleotide and total number of tcx
get.exprobntcx(somatic, ref.dir, trinucleotide.count.file)
get.exprobntcx(somatic, ref.dir, trinucleotide.count.file)
somatic |
Data frame of somatic variants |
ref.dir |
Path to a directory containing the reference genome. |
trinucleotide.count.file |
A tab seprarated file containing a count of all trinucleotides present in the reference genome. This can be generated with the get.trinucleotide.counts() function in this package. |
Fan Fan
Fouad Yousif
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); trinucleotide.count.file <- paste0( path.package("SeqKat"), "/extdata/tn_count.txt" ); example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); get.exprobntcx(somatic, example.ref.dir, trinucleotide.count.file)
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); trinucleotide.count.file <- paste0( path.package("SeqKat"), "/extdata/tn_count.txt" ); example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); get.exprobntcx(somatic, example.ref.dir, trinucleotide.count.file)
Obtain counts for all possible trinucleotides within a specified genomic region
get.nucleotide.chunk.counts(key, chr, upstream = 1, downstream = 1, start = 1, end = -1)
get.nucleotide.chunk.counts(key, chr, upstream = 1, downstream = 1, start = 1, end = -1)
key |
List of specify trinucleotides to count |
chr |
Chromosome |
upstream |
Length upstream to read |
downstream |
Length downstream to read |
start |
Starting position |
end |
Ending position |
Fouad Yousif
example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); bases.raw <- c('A','C','G','T','N'); tri.types.raw <- c( outer( c(outer(bases.raw, bases.raw, function(x, y) paste0(x,y))), bases.raw, function(x, y) paste0(x,y)) ); tri.types.raw <- sort(tri.types.raw); get.nucleotide.chunk.counts( tri.types.raw, file.path(example.ref.dir, 'chr4.fa'), upstream = 1, downstream = 1, start = 1, end = -1 );
example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); bases.raw <- c('A','C','G','T','N'); tri.types.raw <- c( outer( c(outer(bases.raw, bases.raw, function(x, y) paste0(x,y))), bases.raw, function(x, y) paste0(x,y)) ); tri.types.raw <- sort(tri.types.raw); get.nucleotide.chunk.counts( tri.types.raw, file.path(example.ref.dir, 'chr4.fa'), upstream = 1, downstream = 1, start = 1, end = -1 );
Generates the reverse compliment of a nucleotide sequence
get.pair(x)
get.pair(x)
x |
asdf |
Reverses and compliments the bases of the input string. Bases must be (A, C, G, T, or N).
Fouad Yousif
get.pair("GATTACA")
get.pair("GATTACA")
Count the frequencies of 32 trinucleotide in a region respectively
get.tn(chr, start.bp, end.bp, ref.dir)
get.tn(chr, start.bp, end.bp, ref.dir)
chr |
Chromosome |
start.bp |
Starting position |
end.bp |
Ending position |
ref.dir |
Path to a directory containing the reference genome. |
Fan Fan
example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); get.tn(chr=4, start.bp=1, end.bp=-1, example.ref.dir)
example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); get.tn(chr=4, start.bp=1, end.bp=-1, example.ref.dir)
Generate a tri-nucleotide summary for each sliding window
get.toptn(somatic.subset, chr, start.bp, end.bp, ref.dir)
get.toptn(somatic.subset, chr, start.bp, end.bp, ref.dir)
somatic.subset |
Data frame of somatic variants subset for a specific chromosome |
chr |
Chromosome |
start.bp |
Starting position |
end.bp |
Ending position |
ref.dir |
Path to a directory containing the reference genome. |
Fan Fan
Fouad Yousif
## Not run: get.toptn(somatic.subset, chr, start.bp, end.bp, ref.dir) ## End(Not run)
## Not run: get.toptn(somatic.subset, chr, start.bp, end.bp, ref.dir) ## End(Not run)
Aggregates the total counts of each possible trinucleotide.
get.trinucleotide.counts(ref.dir, ref.name, output.dir)
get.trinucleotide.counts(ref.dir, ref.name, output.dir)
ref.dir |
Path to a directory containing the reference genome. |
ref.name |
Name of the reference genome being used (i.e. hg19, GRCh38, etc) |
output.dir |
Path to a directory where output will be created. |
Fan Fan
Fouad Yousif
## Not run: get.trinucleotide.counts(ref.dir, "hg19", tempdir()); ## End(Not run)
## Not run: get.trinucleotide.counts(ref.dir, "hg19", tempdir()); ## End(Not run)
Kataegis detection from SNV BED files
seqkat(sigcutoff = 5, mutdistance = 3.2, segnum = 4, ref.dir = NULL, bed.file = "./", output.dir = "./", chromosome = "all", chromosome.length.file = NULL, trinucleotide.count.file = NULL)
seqkat(sigcutoff = 5, mutdistance = 3.2, segnum = 4, ref.dir = NULL, bed.file = "./", output.dir = "./", chromosome = "all", chromosome.length.file = NULL, trinucleotide.count.file = NULL)
sigcutoff |
The minimum hypermutation score used to classify the windows in the sliding binomial test as significant windows. The score is calculated per window as follows: -log10(binomial test p-value). Recommended value: 5 |
mutdistance |
The maximum intermutational distance allowed for SNVs to be grouped in the same kataegic event. Recommended value: 3.2 |
segnum |
Minimum mutation count. The minimum number of mutations required within a cluster to be identified as kataegic. Recommended value: 4 |
ref.dir |
Path to a directory containing the reference genome. Each chromosome should have its own .fa file and chromosomes X and Y are named as chr23 and chr24. The fasta files should contain no header |
bed.file |
Path to the SNV BED file. The BED file should contain the following information: Chromosome, Position, Reference allele, Alternate allele |
output.dir |
Path to a directory where output will be created. |
chromosome |
The chromosome to be analysed. This can be (1, 2, ..., 23, 24) or "all" to run sequentially on all chromosomes. |
chromosome.length.file |
A tab separated file containing the lengths of all chromosomes in the reference genome. |
trinucleotide.count.file |
A tab seprarated file containing a count of all trinucleotides present in the reference genome. This can be generated with the get.trinucleotide.counts() function in this package. |
The default paramters in SeqKat have been optimized using Alexanrov's "Signatures of mutational processes in human cancer" dataset. SeqKat accepts a BED file and outputs the results in TXT format. A file per chromosome is generated if a kataegic event is detected, otherwise no file is generated. SeqKat reports two scores per kataegic event, a hypermutation score and an APOBEC mediated kataegic score.
Fouad Yousif
Fan Fan
Christopher Lalansingh
example.bed.file <- paste0( path.package("SeqKat"), "/extdata/test/PD4120a-chr4-1-2000000_test_snvs.bed" ); example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); example.chromosome.length.file <- paste0( path.package("SeqKat"), "/extdata/test/length_hg19_chr_test.txt" ); seqkat( 5, 3.2, 2, bed.file = example.bed.file, output.dir = tempdir(), chromosome = "4", ref.dir = example.ref.dir, chromosome.length.file = example.chromosome.length.file );
example.bed.file <- paste0( path.package("SeqKat"), "/extdata/test/PD4120a-chr4-1-2000000_test_snvs.bed" ); example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); example.chromosome.length.file <- paste0( path.package("SeqKat"), "/extdata/test/length_hg19_chr_test.txt" ); seqkat( 5, 3.2, 2, bed.file = example.bed.file, output.dir = tempdir(), chromosome = "4", ref.dir = example.ref.dir, chromosome.length.file = example.chromosome.length.file );
Performs exact binomial test to test the deviation of the 32 tri-nucleotides counts from expected
test.kataegis(chromosome.num, somatic, units, exprobntcx, output.name, ref.dir, chromosome.length.file)
test.kataegis(chromosome.num, somatic, units, exprobntcx, output.name, ref.dir, chromosome.length.file)
chromosome.num |
Chromosome |
somatic |
Data frame of somatic variants |
units |
Base window size |
exprobntcx |
Expected probability for each trinucleotide and total number of tcx |
output.name |
Name of the generated output directory. |
ref.dir |
Path to a directory containing the reference genome. |
chromosome.length.file |
A tab separated file containing the lengths of all chromosomes in the reference genome. |
Fouad Yousif
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); load( paste0( path.package("SeqKat"), "/extdata/test/exprobntcx.rda" ) ); example.chromosome.length.file <- paste0( path.package("SeqKat"), "/extdata/test/length_hg19_chr_test.txt" ); example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); test.kataegis( 4, somatic, 2, exprobntcx, tempdir(), example.ref.dir, example.chromosome.length.file );
load( paste0( path.package("SeqKat"), "/extdata/test/somatic.rda" ) ); load( paste0( path.package("SeqKat"), "/extdata/test/exprobntcx.rda" ) ); example.chromosome.length.file <- paste0( path.package("SeqKat"), "/extdata/test/length_hg19_chr_test.txt" ); example.ref.dir <- paste0( path.package("SeqKat"), "/extdata/test/ref/" ); test.kataegis( 4, somatic, 2, exprobntcx, tempdir(), example.ref.dir, example.chromosome.length.file );