Package 'bedr' reference manual

Title:	Genomic Region Processing using Tools Such as 'BEDTools', 'BEDOPS' and 'Tabix'
Description:	Genomic regions processing using open-source command line tools such as 'BEDTools', 'BEDOPS' and 'Tabix'. These tools offer scalable and efficient utilities to perform genome arithmetic e.g indexing, formatting and merging. bedr API enhances access to these tools as well as offers additional utilities for genomic regions processing.
Authors:	Syed Haider [aut], Daryl Waggott [aut], Emilie Lalonde [ctb], Clement Fung [ctb], Paul C. Boutros [aut, cre, cph]
Maintainer:	Paul C. Boutros <[email protected]>
License:	GPL-2
Version:	1.0.7
Built:	2025-03-28 03:34:59 UTC
Source:	https://github.com/cran/bedr

A bedtools wrapper for working with genomic ranges in R

Description

A bedtools wrapper that allows using a mix of internal R objects and external R files. A number of convenience functions are provided for simplifying analysis workflows in R.

Details

Package:	bedr
Type:	Package
Version:	1.0.4
Date:	2016-09-19
License:	GPL2

Author(s)

Daryl Waggott
Syed Haider

checks if regions in object a are found in object b

Description

checks if regions in object a are found in object b

Usage

x %in.region% y
x %in.region% y

Arguments

`x`	first region index in the form chr:start-stop. regions in this index will be checked for intersection in the values of the second index.
`y`	second region index.

Details

The function can also be called using syntax similar to the %in% operator, for example "region1 %in.region% region2"

Value

Returns a logical vector the length of x.

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr.sort.region(a);
b <- bedr.sort.region(b);

d <- a %in.region% b

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr.sort.region(a);
b <- bedr.sort.region(b);

d <- a %in.region% b

}

bed dataframe to index string

Description

convert a dataframe in bed format to an index string

Usage

bed2index(x, sort = TRUE)
bed2index(x, sort = TRUE)

Arguments

`x`	a object region object or index
`sort`	should the index be sorted

Value

Returns a vector of string based genomic regions

Author(s)

Daryl Waggott

Examples


test.regions <- get.random.regions(10);
bed2index(test.regions);
test.regions <- get.random.regions(10);
bed2index(test.regions);

convert bed to vcf

Description

convert bed to vcf

Usage

bed2vcf(x, filename = NULL, zero.based = TRUE, header = NULL, fasta = NULL)
bed2vcf(x, filename = NULL, zero.based = TRUE, header = NULL, fasta = NULL)

Arguments

`x`	the input bed object
`filename`	a filename to write to.
`zero.based`	is the file zero based i.e. bed format. defaults to true.
`header`	a list of things to put in the header. repeated elements such as INFO, FILTER, FORMAT can be put in data.frames.
`fasta`	build of the genome in fasta format

Author(s)

Daryl Waggott

Examples

## Not run: 
  bed2vcf(x)
  
## End(Not run)
## Not run: 
  bed2vcf(x)
  
## End(Not run)

Main bedtools wrapper function.

Description

Main bedtools wrapper function.

Usage

bedr(engine = "bedtools", 
	params = NULL,
	input = list(),
	method = NULL,
	tmpDir = NULL,
	deleteTmpDir = TRUE,
	outputDir = NULL,
	outputFile = NULL,
	check.chr = TRUE,
	check.zero.based = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	);

bedr(engine = "bedtools", 
	params = NULL,
	input = list(),
	method = NULL,
	tmpDir = NULL,
	deleteTmpDir = TRUE,
	outputDir = NULL,
	outputFile = NULL,
	check.chr = TRUE,
	check.zero.based = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	);

Arguments

`engine`	What analytical engine to use i.e. bedtools, bedops, tabix, unix
`params`	A string that includes all the extra parameters and arguments to the bedtools commmand. For example if you wanted to do a left outer join you would specificy method as intersect and use params = c("-loj -header"). If you leave input and method as defaults then this is this string represents the full command.
`input`	A list of input items to be used by bedtools. Each item should be named by its parameter name in bedtools for example input = list(a=xxx, b=yyy, i=zzz). Items can be R objects or external files. R objects need to be in bed format i.e. have chr, start, stop as the first three columns, or, have an position index as the first column or rowname i.e. chr1:100-1000.
`method`	What bedtools method. This can be intersect, sort, merge etc. See bedtools documentation for specifcis.
`tmpDir`	The directory to be used for writing files
`deleteTmpDir`	Should tmp files be deleted. helpful for diagnostics.
`outputDir`	The output directory. Only used if outputFile is specified. It defaults to the current working directory.
`outputFile`	The name of the output file. If this is specified the output will be sent to a file not an R object
`check.chr`	check for chr prefix
`check.zero.based`	check for zero based coordinates
`check.valid`	do all region integrity checks
`check.sort`	check if region is sorted
`check.merge`	check if region is merged
`verbose`	Should messages be printed to screen.

Value

The output of command with some parsing to keep it consistent with the input.

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {
set.seed(666)

index <- get.example.regions();
a <- index[[1]];
b <- index[[2]];

### check
is.a.valid  <- is.valid.region(a);
is.b.valid  <- is.valid.region(b);
a <- a[is.a.valid];
b <- b[is.b.valid];

### sort
is.sorted <- is.sorted.region(a);
a.sort1 <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b.sort1 <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
a.sort2 <- bedr(engine = "bedops",   input = list(i = a), method = "sort", params = "");

a.sort3 <- bedr.sort.region(a);
a.sort4 <- bedr.sort.region(a, engine = "unix", method = "natural");
a.sort5 <- bedr.sort.region(a, engine = "R", method = "natural");

### merge
is.merged <- is.merged.region(a.sort1);
a.merge1 <- bedr(engine = "bedtools", input = list(i = a.sort1), method = "merge", params = "");
b.merge1 <- bedr(engine = "bedtools", input = list(i = b.sort1), method = "merge", params = "");
a.merge2 <- bedr(engine = "bedops",   input = list(i = a.sort1), method = "merge", params = "");
# a.merge3 <- bedr.merge.region(a); this will throw an error b/c region is not sorted

### subtract
a.sub1 <- bedr(input = list(a = a.merge1, b = b.merge1), method = "subtract", params="");
a.sub2 <- bedr.subtract.region(a.merge1, b.merge1);

### in.region
is.region <- in.region(a.merge1, b.merge1);
#is.region <- a.merge1 %in.region% b.merge1
### intersect 
# note for bedtools its recommended to bedr.sort.regions before intersect for faster processing
# also if regions are not merged this can cause unexpected behaviour
a.int1 <- bedr(input = list(a = a.sort1, b = b.sort1), method = "intersect", params = "-loj");
a.int1 <- bedr(input = list(a = a.sort1, b = b.sort1), method="intersect",params ="-loj -sorted");
a.int2 <- bedr(input = list(a = a.merge1, b = b.merge1), method="intersect",params="-loj -sorted");
a.int3 <- bedr.join.region(a.merge1, b.merge1);

### multiple join
d <- get.random.regions(100, chr="chr1", sort = TRUE);
a.mult <- bedr.join.multiple.region(x = list(a.merge1,b.merge1,bedr.sort.region(d)));

## Not run: 
  ### groupby 
  # note the "g" column number is based on bed format i.e. first three columns chr, start, stop
  # note the use of first, first, last on the region columns i.e. the union of the regions
  # note currently missing values are not dealt with in bedtools.  also the 5th column is 
  # assumed to be "score" and gets a default "-1" not a "."
  cnv.gene <- bedr(
    input = list(i=cnv.gene), method = "groupby", params = paste(
      "-g 16 -c ", 
      paste(1:15, collapse = ","), 
      " -o ", "first,first,last, ", 
      paste(rep("sum",12), collapse = ","), 
      sep = ""
      )
    );

  ### example 1
  ###  workflow adding gene names to exome sequencing target file
  # download refseq genes from ucsc or query biomart for ensemble gene names.  
  # format them in basic bed format.
  # sort, merge target
  # sort, merge -nms target.  Overlapping genes/features get merged.  
  # this may not be ideal if there are some really big features.
  # intersect -loj target, genes.
  # alternatively, do not merge the target and apply the merge after the intersect.  
  # this will provide precision at the level of the exon.
  
## End(Not run)
}
if (check.binary("bedtools")) {
set.seed(666)

index <- get.example.regions();
a <- index[[1]];
b <- index[[2]];

### check
is.a.valid  <- is.valid.region(a);
is.b.valid  <- is.valid.region(b);
a <- a[is.a.valid];
b <- b[is.b.valid];

### sort
is.sorted <- is.sorted.region(a);
a.sort1 <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b.sort1 <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
a.sort2 <- bedr(engine = "bedops",   input = list(i = a), method = "sort", params = "");

a.sort3 <- bedr.sort.region(a);
a.sort4 <- bedr.sort.region(a, engine = "unix", method = "natural");
a.sort5 <- bedr.sort.region(a, engine = "R", method = "natural");

### merge
is.merged <- is.merged.region(a.sort1);
a.merge1 <- bedr(engine = "bedtools", input = list(i = a.sort1), method = "merge", params = "");
b.merge1 <- bedr(engine = "bedtools", input = list(i = b.sort1), method = "merge", params = "");
a.merge2 <- bedr(engine = "bedops",   input = list(i = a.sort1), method = "merge", params = "");
# a.merge3 <- bedr.merge.region(a); this will throw an error b/c region is not sorted

### subtract
a.sub1 <- bedr(input = list(a = a.merge1, b = b.merge1), method = "subtract", params="");
a.sub2 <- bedr.subtract.region(a.merge1, b.merge1);

### in.region
is.region <- in.region(a.merge1, b.merge1);
#is.region <- a.merge1 %in.region% b.merge1
### intersect 
# note for bedtools its recommended to bedr.sort.regions before intersect for faster processing
# also if regions are not merged this can cause unexpected behaviour
a.int1 <- bedr(input = list(a = a.sort1, b = b.sort1), method = "intersect", params = "-loj");
a.int1 <- bedr(input = list(a = a.sort1, b = b.sort1), method="intersect",params ="-loj -sorted");
a.int2 <- bedr(input = list(a = a.merge1, b = b.merge1), method="intersect",params="-loj -sorted");
a.int3 <- bedr.join.region(a.merge1, b.merge1);

### multiple join
d <- get.random.regions(100, chr="chr1", sort = TRUE);
a.mult <- bedr.join.multiple.region(x = list(a.merge1,b.merge1,bedr.sort.region(d)));

## Not run: 
  ### groupby 
  # note the "g" column number is based on bed format i.e. first three columns chr, start, stop
  # note the use of first, first, last on the region columns i.e. the union of the regions
  # note currently missing values are not dealt with in bedtools.  also the 5th column is 
  # assumed to be "score" and gets a default "-1" not a "."
  cnv.gene <- bedr(
    input = list(i=cnv.gene), method = "groupby", params = paste(
      "-g 16 -c ", 
      paste(1:15, collapse = ","), 
      " -o ", "first,first,last, ", 
      paste(rep("sum",12), collapse = ","), 
      sep = ""
      )
    );

  ### example 1
  ###  workflow adding gene names to exome sequencing target file
  # download refseq genes from ucsc or query biomart for ensemble gene names.  
  # format them in basic bed format.
  # sort, merge target
  # sort, merge -nms target.  Overlapping genes/features get merged.  
  # this may not be ideal if there are some really big features.
  # intersect -loj target, genes.
  # alternatively, do not merge the target and apply the merge after the intersect.  
  # this will provide precision at the level of the exon.
  
## End(Not run)
}

join multiple region objects

Description

join multiple objects

Usage

bedr.join.multiple.region(
	x = list(),
	fraction.overlap = 1/1e9,
	empty = FALSE,
	missing.values = ".",
	cluster = FALSE,
	species = "human",
	build = "hg19",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
bedr.join.multiple.region(
	x = list(),
	fraction.overlap = 1/1e9,
	empty = FALSE,
	missing.values = ".",
	cluster = FALSE,
	species = "human",
	build = "hg19",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	list of region objects
`fraction.overlap`	proportion of bases to be considered an overlap
`empty`	print rows if no match
`missing.values`	missing value character
`cluster`	TRUE/FALSE for clustering
`species`	species i.e. human or mouse
`build`	genome build to use for empty regions
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	check if region is valid
`check.sort`	check if region is sorted
`check.merge`	check if overlapping regions are merged
`verbose`	messages and checks

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/multiinter.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];

a.sort <- bedr.sort.region(a);
b.sort <- bedr.sort.region(b);
d <- get.random.regions(100, chr="chr1", sort = TRUE);

a.mult <- bedr.join.multiple.region(x = list(a.sort,b.sort,bedr.sort.region(d)));
}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];

a.sort <- bedr.sort.region(a);
b.sort <- bedr.sort.region(b);
d <- get.random.regions(100, chr="chr1", sort = TRUE);

a.mult <- bedr.join.multiple.region(x = list(a.sort,b.sort,bedr.sort.region(d)));
}

join two region objects using a left outer join

Description

join two region objects using a left outer join

Usage

bedr.join.region(
	x,
	y,
	fraction.overlap = 1/1e9,
	reciporical = FALSE,
	report.n.overlap = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
bedr.join.region(
	x,
	y,
	fraction.overlap = 1/1e9,
	reciporical = FALSE,
	report.n.overlap = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	object a
`y`	object b
`fraction.overlap`	proportion of overlap to be considered a match
`report.n.overlap`	should the number of overlapping bases be reported
`reciporical`	should the fraction overlap be applied to object b as well
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	check if region is valid
`check.sort`	check if region is sorted
`check.merge`	check if overlapping regions are merged
`verbose`	messages and checks

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];

a.sort <- bedr.sort.region(a);
b.sort <- bedr.sort.region(b);

d <- bedr.join.region(a.sort, b.sort);
}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];

a.sort <- bedr.sort.region(a);
b.sort <- bedr.sort.region(b);

d <- bedr.join.region(a.sort, b.sort);
}

merge i.e. collapse overlpaping regions

Description

merge i.e. collapse overlpaping regions

Usage

bedr.merge.region(
	x,
	distance = 0,
	list.names = TRUE,
	number = FALSE,
	stratify.by = NULL,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	verbose = TRUE
	);
bedr.merge.region(
	x,
	distance = 0,
	list.names = TRUE,
	number = FALSE,
	stratify.by = NULL,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	verbose = TRUE
	);

Arguments

`x`	input
`distance`	maximum distance between regions to be merged. defaults to 0 which means overlapping or bookended features. note that you can use negative distances to enforce a minimum overlap.
`list.names`	output list of names for merged items
`number`	output number of merged items
`stratify.by`	a column name indicating the groups to stratify merging within i.e. gene name. merging will not happen between groups.
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should the sort order be checked
`verbose`	should log messages and checking take place

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/merge.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

a.sort   <- bedr.sort.region(a);
a.merged <- bedr.merge.region(a.sort);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

a.sort   <- bedr.sort.region(a);
a.merged <- bedr.merge.region(a.sort);

}

Visualize regions or intervals

Description

Visualize regions or intervals. e.g. VennDiagrams of intersections of distinct intervals, base pairs and genes.

Usage

bedr.plot.region(
	input,
	filename = NULL,
	type = "venn",
	feature = "interval",
	fraction.overlap = 0.000000001,
	group = NULL,
	params = list(),
	verbose = TRUE
	)
bedr.plot.region(
	input,
	filename = NULL,
	type = "venn",
	feature = "interval",
	fraction.overlap = 0.000000001,
	group = NULL,
	params = list(),
	verbose = TRUE
	)

Arguments

`input`	A list of input regions or indices
`filename`	The name of the output image file
`type`	The type of plot. only 'venn' is supported for intersections at the moment.
`feature`	How should the regions be intersected. By unique "interval", "gene", "size" or "other" to use the features in the first item in the input list.
`fraction.overlap`	Minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).
`group`	A grouping parameter for barplots. Possible values include "input", "chr", or a categorical vector of length equal to the sum of the input.
`params`	Additional parameters for plotting or intersecting. See `venn.diagram` function for possible options.
`verbose`	Include text messages.

Details

By default a venn diagram is output. If a single input is used then the plot shows the number of unique and collapsed regions after applying a merge.

Value

Plots!

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {
	# example data
	a <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	b <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	d <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	e <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	f <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)

	pdf("bedr.plot.region.ex.pdf")

	# basic venn diagrams
	bedr.plot.region(input = list(a=a,b=b))
	bedr.plot.region(input = list(a=a,b=b,d=d))
	#bedr.plot.region(input = list(a=a,b=b,d=d,e=e))
	#bedr.plot.region(input = list(a=a,b=b,d=d,e=e,f=f))

	### change venn parameters
	bedr.plot.region(
		input = list(a=a,b=b,d=d),
		params = list(lty = 2, label.col = "black", main = "Region Overlap")
		)

	### try with different 
	#bedr.plot.region(input = list(a=a,b=b), feature = "gene")
	#bedr.plot.region(input = list(a=a,b=b), feature = "reference")
	#bedr.plot.region(input = list(a=a,b=b), feature = "interval")
	#bedr.plot.region(input = list(a=a,b=b), feature = "cluster")
	#bedr.plot.region(input = list(a=a,b=b), feature = "bp")

	dev.off()
	}
if (check.binary("bedtools")) {
	# example data
	a <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	b <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	d <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	e <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)
	f <- get.random.regions(n = 1000, chr = "chr22", size.mean = 10)

	pdf("bedr.plot.region.ex.pdf")

	# basic venn diagrams
	bedr.plot.region(input = list(a=a,b=b))
	bedr.plot.region(input = list(a=a,b=b,d=d))
	#bedr.plot.region(input = list(a=a,b=b,d=d,e=e))
	#bedr.plot.region(input = list(a=a,b=b,d=d,e=e,f=f))

	### change venn parameters
	bedr.plot.region(
		input = list(a=a,b=b,d=d),
		params = list(lty = 2, label.col = "black", main = "Region Overlap")
		)

	### try with different 
	#bedr.plot.region(input = list(a=a,b=b), feature = "gene")
	#bedr.plot.region(input = list(a=a,b=b), feature = "reference")
	#bedr.plot.region(input = list(a=a,b=b), feature = "interval")
	#bedr.plot.region(input = list(a=a,b=b), feature = "cluster")
	#bedr.plot.region(input = list(a=a,b=b), feature = "bp")

	dev.off()
	}

Initialize some config settings for bedr

Description

Initialize some config settings for bedr. This includes downloading useful datasets if requested.

Usage

bedr.setup(datasets = "all", data.dir = paste0(Sys.getenv("HOME"), "/bedr/data"))
bedr.setup(datasets = "all", data.dir = paste0(Sys.getenv("HOME"), "/bedr/data"))

Arguments

`datasets`	A list of datasets to download. Defaults to "all" i.e. c("refgene","hg19","b37","hugo", "cosmic","clinvar")
`data.dir`	A directory to put the data. Defaults to ~/bedr/data

Details

The default config file is at ~/bedr/config.yml. It's in yaml format.

Author(s)

Daryl Waggott

Examples

## Not run: 
  bedr.setup()
  
## End(Not run)
## Not run: 
  bedr.setup()
  
## End(Not run)

sort a region file

Description

Sort and merge regions object in one step

Usage

bedr.snm.region(
	x,
	method = "lexicographical",
	distance = 0,
	list.names = TRUE,
	number = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	verbose = TRUE
	)

bedr.snm.region(
	x,
	method = "lexicographical",
	distance = 0,
	list.names = TRUE,
	number = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	verbose = TRUE
	)

Arguments

`x`	a object region object or index
`method`	natural or lexicographic
`distance`	distance between regions to be merged
`list.names`	output list of names for merged items
`number`	output number of merged items
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`verbose`	should log messages and checking take place

Value

Sorted and merged regions object

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

b <- bedr.snm.region(a);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

b <- bedr.snm.region(a);

}

sort a region file

Description

sort a region file

Usage

bedr.sort.region(
	x,
	method = "lexicographical",
	engine = "R",
	chr.to.num = c("X" = 23, "Y" = 24, "M" = 25),
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

bedr.sort.region(
	x,
	method = "lexicographical",
	engine = "R",
	chr.to.num = c("X" = 23, "Y" = 24, "M" = 25),
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	a region object or index
`method`	natural or lexicographic
`engine`	what analytical engine to use for sorting i.e. bedtools, bedops, gnu unix
`chr.to.num`	chromosome letter names to numbers map. Defaults to Homo sapiens i.e c("X" = 23, "Y" = 24, "M" = 25)
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checked for integerity
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/sort.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
b <- bedr.sort.region(a);

}
if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
b <- bedr.sort.region(a);

}

subtracts features or ranges in object b from object a

Description

subtracts features or ranges in object b from object a

Usage

bedr.subtract.region(
	x,
	y,
	fraction.overlap = 1/1e9,
	remove.whole.feature = TRUE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
bedr.subtract.region(
	x,
	y,
	fraction.overlap = 1/1e9,
	remove.whole.feature = TRUE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	item a
`y`	item b
`fraction.overlap`	what portion of A to be considered an overlap
`remove.whole.feature`	should whole feature be removed
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	check if region is sorted
`check.merge`	check if overlapping regions are merged
`verbose`	messages and checks

Value

Regions exclusive to one object of regions.

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/subtract.html

Examples

if (check.binary("bedtools")) {
index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
d <- bedr.subtract.region(a,b);
}
if (check.binary("bedtools")) {
index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
d <- bedr.subtract.region(a,b);
}

outputs text if verbose flag is set

Description

outputs text if verbose flag is set

Usage

catv(x)
catv(x)

Arguments

`x`	some text string

Value

Prints the text string

Author(s)

Daryl Waggott

Examples


verbose <- TRUE;
catv("Hello Universe!");
verbose <- FALSE;
catv("Goodbye Universe!")

verbose <- TRUE;
catv("Hello Universe!");
verbose <- FALSE;
catv("Goodbye Universe!")

checks if binary is in the path

Description

check if a binary is in the path. Specifically used for bedtools, bedops and tabix.

Usage

check.binary(x = "bedtools", verbose = TRUE)
check.binary(x = "bedtools", verbose = TRUE)

Arguments

`x`	a string referring to a binary/executable i.e. bedtools, bedops, tabix
`verbose`	print log

Details

Used internally to determine functionality selection options.

Value

TRUE or FALSE

Author(s)

Daryl Waggott

Examples

check.binary("bedtools")
check.binary("bedtools")

cluster intervals

Description

cluster intervals

Usage

cluster.region(
	x,
	distance = 0,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	verbose = TRUE
	)
cluster.region(
	x,
	distance = 0,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	verbose = TRUE
	)

Arguments

`x`	The region
`distance`	maximum distance between regions to be merged. defaults to 0 which means overlapping or bookended features. note that you can use negative distances to enforce a minimum overlap.
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`verbose`	should log messages and checking take place

Details

clusters adjacent features of a specified distance.

Value

A data.frame in bed format

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/cluster.html

Examples


if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

b <- cluster.region(a, distance = 0);
d <- cluster.region(a, distance = 100);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

b <- cluster.region(a, distance = 0);
d <- cluster.region(a, distance = 100);

}

convert object to bed format

Description

checks if an object can be converted into a bed style data.frame then does the conversion.

Usage

convert2bed(
	x,
	set.type = TRUE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
convert2bed(
	x,
	set.type = TRUE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	A region index (i.e. chr1:10-100, ...) either as a vector or row.names/first column of a data.frame. Or a data.frame with the first three columns "chr", s"start", "end"
`set.type`	should the attribute input.type be set. Sometimes it is desirable to avoid setting it when applying intermediate conversion
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checked for integerity
`check.sort`	should the region be checked to see if it is sorted
`check.merge`	should the region be checked for overlapping regions
`verbose`	messages and text

Details

Very useful to convert data before using other bedr functions

Value

Returns x converted to bedformat, as a data frame

Author(s)

Daryl Waggott

Examples

## Not run: 
  a.bed <- convert.to.bed(a) 
  
## End(Not run)
## Not run: 
  a.bed <- convert.to.bed(a) 
  
## End(Not run)

output R objects as tmpfiles

Description

output R objects as tmpfiles

Usage

create.tmp.bed.file(x, name = "bedr", tmpDir = NULL)
create.tmp.bed.file(x, name = "bedr", tmpDir = NULL)

Arguments

`x`	region object
`name`	a prefix for the tmp file.
`tmpDir`	where should the temp files be put

Author(s)

Daryl Waggott

Examples

# create tmp file 
# create tmp file

Determine input format

Description

Determine input format whether its tabular or bed

Usage

determine.input(x, check.chr = FALSE, verbose = TRUE)
determine.input(x, check.chr = FALSE, verbose = TRUE)

Arguments

`x`	input vector, matrix or dataframe
`check.chr`	check whether the coordinates are in chromosomal format with chr prefix
`verbose`	messages and checks

Value

integer value. index format (0), bed (1), index in first column (2), rownmames are index (3), unrecognized(4)

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
bedr:::determine.input(a);
}
if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
bedr:::determine.input(a);
}

Data frame to list conversion

Description

Take data frame and return a list of rownames where column value is not 0 i.e. missing

Usage

df2list(x, start.col = 1)
df2list(x, start.col = 1)

Arguments

`x`	data frame
`start.col`	offset from first column to ignore certain columns

Value

returns a list following cleanup and change of data structure

Author(s)

Daryl Waggott

Examples

## Not run: 
  df2list(data.frame(a = 1:10, b = 11:20));
  
## End(Not run)
## Not run: 
  df2list(data.frame(a = 1:10, b = 11:20));
  
## End(Not run)

Download some useful datasets

Description

Download some useful datasets. Some functions such as plotting and fasta querying require additional data.

Usage

download.datasets(datasets = "all", data.dir = paste0(Sys.getenv("HOME"), "/bedr/data"))
download.datasets(datasets = "all", data.dir = paste0(Sys.getenv("HOME"), "/bedr/data"))

Arguments

`datasets`	A list of datasets to download. Defaults to "all" i.e. c("refgene","hg19","b37","hugo", "cosmic","clinvar")
`data.dir`	A directory to put the data. Defaults to ~/bedr/data

Details

External datasets are required for some bedr functionality. For example, plotting intersections based on genes, get.fasta, bed2vcf and validate.gene.names. If these datasets already exist you can simply place symlinks in a directory and use bedr.setup() to define the data.dir.

Value

some datasets.

Author(s)

Daryl Waggott

Examples

## Not run: 
  download.datasets("cosmic");
  
## End(Not run)
## Not run: 
  download.datasets("cosmic");
  
## End(Not run)

Get adjacent flanks from regions

Description

Get adjacent flanks from regions

Usage

flank.region(
	x,
	n.add = NULL,
	start.add = NULL,
	end.add = NULL,
	species = "human",
	build = "hg19",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

flank.region(
	x,
	n.add = NULL,
	start.add = NULL,
	end.add = NULL,
	species = "human",
	build = "hg19",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	a object region object or index
`n.add`	the number of bases to be selected from each side of a region
`start.add`	the number of based to be selected from the start of a region
`end.add`	the number of based to be selected from the end of a region
`species`	the species i.e. human or mouse
`build`	the genome build i.e. hg19 or mm10
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/flank.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- flank.region(a, n.add = 20);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- flank.region(a, n.add = 20);

}

gets the length of each chromosome for a species/build

Description

Gets the length of each chromosome for a species/build. Choices are human (hg18, hg19, hg38), mouse(mm9, mm10)

Usage

get.chr.length(chr = NULL, species = "human", build = "hg19")
get.chr.length(chr = NULL, species = "human", build = "hg19")

Arguments

`chr`	a vector of chromosomes to query. defaults to all.
`species`	species
`build`	build

Value

A dataframe with chromosome annotations

Author(s)

Daryl Waggott

Examples

size <- get.chr.length();
size <- get.chr.length();

return a set of regions for the examples and unit testing

Description

return a set of regions for the examples and unit testing

Usage

get.example.regions()

get.example.regions()

Value

A list with three example regions

Author(s)

Daryl Waggott

Examples

index <- get.example.regions()
index <- get.example.regions()

Query fasta sequence

Description

Query fasta sequence using bedtools get.fasta

Usage

get.fasta(
	x,
	fasta = NULL,
	bed12 = FALSE,
	strand = FALSE,
	output.fasta = FALSE,
	use.name.field = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
get.fasta(
	x,
	fasta = NULL,
	bed12 = FALSE,
	strand = FALSE,
	output.fasta = FALSE,
	use.name.field = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	region or index
`fasta`	a fasta file defaults to mini example hg19 human
`bed12`	should bed12 format be used
`strand`	strand specific i.e. reverse complement negative
`output.fasta`	output a fasta defaults to a data.frame for easier parsing
`use.name.field`	should the name field be used for
`check.zero.based`	check for zero based region
`check.chr`	check for "chr" prefix
`check.valid`	check for valid regions i.e. start < end
`check.sort`	check if region is sorted
`check.merge`	check if region is merged
`verbose`	print progress

Details

Uses bedtools getFasta to query a fasta file and load into R as a data.frame for easy parsing.

Note that the hg19 reference genome fasta is large and requires on the order of 4 GB RAM to avoid a segfault happens.

Value

A data.frame or fasta. The data.frame has is two columns corresponding to the region and the sequence.

Author(s)

Daryl Waggott, Syed Haider

References

http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

Examples



if (check.binary("bedtools")) {

## Not run: 

  # get the sequence for a set of regions as a data.frame
  index <- get.example.regions();
  a <- index[[1]];
  b <- get.fasta(bedr.sort.region(a));

  # this time output a fasta
  d <- get.fasta(b, output.fasta = TRUE);
  writeLines(d[[1]], con = "test.fasta");

  # get the region adjacent to a set of mutations in a vcf
  clinvar.vcf.example <- system.file(
    "extdata/clinvar_dbSNP138_example.vcf.gz", 
    package = "bedr"
    );
  clinvar <- read.vcf(clinvar.vcf.example);

  # note that clinvar uses ncbi fasta which does not use "chr" and codes chrM as MT
  clinvar.bed <- data.frame(
    chr = paste0("chr", gsub("MT", "M", clinvar$vcf$CHROM)),
    start = clinvar$vcf$POS - 2,
    end = clinvar$vcf$POS + 1,
    stringsAsFactors = FALSE
    );

  # get trinucleotide sequences of variants on chr M only
  mutation.triplet <- get.fasta(
  	clinvar.bed[which(clinvar.bed$chr == "chrM"), ], 
  	fasta = system.file("extdata/ucsc.hg19.chrM.fasta", package = "bedr"),
  	check.chr = FALSE
  	);
  
## End(Not run)
}
if (check.binary("bedtools")) {

## Not run: 

  # get the sequence for a set of regions as a data.frame
  index <- get.example.regions();
  a <- index[[1]];
  b <- get.fasta(bedr.sort.region(a));

  # this time output a fasta
  d <- get.fasta(b, output.fasta = TRUE);
  writeLines(d[[1]], con = "test.fasta");

  # get the region adjacent to a set of mutations in a vcf
  clinvar.vcf.example <- system.file(
    "extdata/clinvar_dbSNP138_example.vcf.gz", 
    package = "bedr"
    );
  clinvar <- read.vcf(clinvar.vcf.example);

  # note that clinvar uses ncbi fasta which does not use "chr" and codes chrM as MT
  clinvar.bed <- data.frame(
    chr = paste0("chr", gsub("MT", "M", clinvar$vcf$CHROM)),
    start = clinvar$vcf$POS - 2,
    end = clinvar$vcf$POS + 1,
    stringsAsFactors = FALSE
    );

  # get trinucleotide sequences of variants on chr M only
  mutation.triplet <- get.fasta(
  	clinvar.bed[which(clinvar.bed$chr == "chrM"), ], 
  	fasta = system.file("extdata/ucsc.hg19.chrM.fasta", package = "bedr"),
  	check.chr = FALSE
  	);
  
## End(Not run)
}

generates a set of random regions

Description

generates a set of random regions for a specific species/build. Choices are human (hg18, hg19), mouse(mm9, mm10). regions are sampled from a log-normal distribution.

Usage

get.random.regions(
	n = 10,
	chr = NULL,
	species = "human",
	build = "hg19",
	size.mean = 10,
	size.sd = 0.25,
	mask.gaps = FALSE,
	mask.repeats = FALSE,
	sort.output = TRUE,
	verbose = TRUE
	)
get.random.regions(
	n = 10,
	chr = NULL,
	species = "human",
	build = "hg19",
	size.mean = 10,
	size.sd = 0.25,
	mask.gaps = FALSE,
	mask.repeats = FALSE,
	sort.output = TRUE,
	verbose = TRUE
	)

Arguments

`n`	number of regions
`chr`	the chr or region
`species`	species
`build`	build
`size.mean`	region mean in log space
`size.sd`	region sd in log space
`mask.gaps`	should the gaps (Ns) in the human reference be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off.
`mask.repeats`	should the repeats from repeatMasker be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off.
`sort.output`	return a sorted index
`verbose`	words

Author(s)

Daryl Waggott

Examples

test.regions <- get.random.regions(100)
test.regions <- get.random.regions(100)

Get adjacent flanks from regions

Description

Get adjacent flanks from regions

Usage

grow.region(
	x,
	n.add = NULL,
	start.add = NULL,
	end.add = NULL,
	species = "human",
	build = "hg19",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

grow.region(
	x,
	n.add = NULL,
	start.add = NULL,
	end.add = NULL,
	species = "human",
	build = "hg19",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	a object region object or index
`n.add`	the number of bases to be selected from each side of a region
`start.add`	the number of based to be selected from the start of a region
`end.add`	the number of based to be selected from the end of a region
`species`	the species i.e. human or mouse
`build`	the genome build i.e. hg19 or mm10
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/slop.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- grow.region(a, n.add = 20);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- grow.region(a, n.add = 20);

}

checks if regions in object a are found in object b

Description

checks if regions in object a are found in object b

Usage

in.region(
	x,
	y,
	proportion.overlap = 1e-09,
	reciprocal.overlap = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = FALSE
	)
in.region(
	x,
	y,
	proportion.overlap = 1e-09,
	reciprocal.overlap = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = FALSE
	)

Arguments

`x`	first region index in the form chr:start-stop. regions in this index will be checked for intersection in the values of the second index.
`y`	second region index.
`proportion.overlap`	Defaults 1e-9 which is 1 bp. See details below for the different interpretation between 0 and 1 based overlap
`reciprocal.overlap`	Should the proportion.overlap be reciprocal
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	check if region is valid
`check.sort`	check if region is sorted
`check.merge`	check if overlapping regions are merged
`verbose`	prints some debugging information. currently it just checks if the input regions are overlapping

Details

The function can also be called using syntax similar to the %in% operator, for example "region1 %in.region% region2"

The default is to report TRUE if there is 1bp overlap in zero based bed format. That means that region chr1:10-20 and chr1:20-30 would not overlap. To switch to one based intuitive interpretation set proportion.overlap = 0.

Value

Returns a logical vector the length of x.

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");

d <- in.region(a,b);

# alternative calling
d <- a %in.region% b

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");

d <- in.region(a,b);

# alternative calling
d <- a %in.region% b

}

convert a region index into a bed file dataframe

Description

convert a region index into a bed file dataframe

Usage

index2bed(x, set.type = TRUE)
index2bed(x, set.type = TRUE)

Arguments

`x`	an index
`set.type`	should the attribute input.type be set. Sometimes it is desirable to avoid setting it when applying intermediate conversion

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
a.bed <- index2bed(a);
}
if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
a.bed <- index2bed(a);
}

checks if region file is merged

Description

checks if region file is merged

Usage

is.merged.region(
	x,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	verbose = FALSE
	)
is.merged.region(
	x,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	verbose = FALSE
	)

Arguments

`x`	region or index
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	check if region is valid
`check.sort`	check if region is sorted
`verbose`	more words

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

b <- is.merged.region(a);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

b <- is.merged.region(a);

}

checks if region file is sorted

Description

checks if region file is sorted

Usage

is.sorted.region(
	x,
	method = "lex",
	engine = "unix",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.merge = TRUE,
	verbose = FALSE
	)
is.sorted.region(
	x,
	method = "lex",
	engine = "unix",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.merge = TRUE,
	verbose = FALSE
	)

Arguments

`x`	The region index, bed file, or bed formatted object
`method`	lexicgraphical or natural, lex is required for many operations but natural is better for interpretation
`engine`	what analytical engine to use for sorting i.e. bedtools, bedops, gnu unix
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	check if region is valid
`check.merge`	check if region is merged
`verbose`	more words

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {
index <- get.example.regions();

a <- index[[1]];

b <- is.sorted.region(a);
}
if (check.binary("bedtools")) {
index <- get.example.regions();

a <- index[[1]];

b <- is.sorted.region(a);
}

verifies the reference sequence in a vcf

Description

verifies the reference sequence in a vcf

Usage

is.valid.ref(
	x,
	fasta = NULL,
	strand = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
is.valid.ref(
	x,
	fasta = NULL,
	strand = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	input bed object
`fasta`	a reference build in fasta format
`strand`	should strand be used. if reverse then the sequence will be reverse complemented
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Value

a logical vector the length of the input

Author(s)

Daryl Waggott

Examples


vcf.path <- system.file("extdata/callerA.vcf.gz", package = "bedr");
vcf.data <- read.vcf(vcf.path, split.info = TRUE);
vcf.data$vcf <- vcf.data$vcf[, 
	c("CHROM", "POS", "END", setdiff(colnames(vcf.data$vcf), c("CHROM", "POS", "END")))
	];
vcf.data$vcf$CHROM <- paste("chr", vcf.data$vcf$CHROM, sep = "");

## Not run: 
	# need reference sequence FASTA and index file to run this, as 'fasta' parameter
	is.valid.ref(vcf.data);
	
## End(Not run)
vcf.path <- system.file("extdata/callerA.vcf.gz", package = "bedr");
vcf.data <- read.vcf(vcf.path, split.info = TRUE);
vcf.data$vcf <- vcf.data$vcf[, 
	c("CHROM", "POS", "END", setdiff(colnames(vcf.data$vcf), c("CHROM", "POS", "END")))
	];
vcf.data$vcf$CHROM <- paste("chr", vcf.data$vcf$CHROM, sep = "");

## Not run: 
	# need reference sequence FASTA and index file to run this, as 'fasta' parameter
	is.valid.ref(vcf.data);
	
## End(Not run)

checks if region/index is valid

Description

checks if region/index is valid

Usage

is.valid.region(
	x,
	check.zero.based = TRUE,
	check.chr = TRUE,
	throw.error = FALSE,
	verbose = TRUE
	)
is.valid.region(
	x,
	check.zero.based = TRUE,
	check.chr = TRUE,
	throw.error = FALSE,
	verbose = TRUE
	)

Arguments

`x`	The region index, bed file, or bed formatted object
`check.zero.based`	should basic test for zero based coordinates be checked
`check.chr`	should the algorithm check for the "chr" prefix
`throw.error`	should an error be thrown. The default is to report a logical vector of inconsistencies.
`verbose`	should diagnostic messages be printed

Author(s)

Daryl Waggott

Examples

index <- get.example.regions();

a <- index[[1]];
is.valid <- is.valid.region(a);
a.valid <- a[is.valid];

index <- get.example.regions();

a <- index[[1]];
is.valid <- is.valid.region(a);
a.valid <- a[is.valid];

verifies that sequences are correct given coordinates and a reference

Description

verifies that sequences are correct given coordinates and a reference

Usage

is.valid.seq(
	x,
	querySeq,
	fasta = NULL,
	strand = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
is.valid.seq(
	x,
	querySeq,
	fasta = NULL,
	strand = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	input bed object
`querySeq`	a vector of sequences the same length as x
`fasta`	a reference build in fasta format
`strand`	should strand be used. if reverse then the sequence will be reverse complemented
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Value

a logical vector the length of the input querySeq

Author(s)

Daryl Waggott, Syed Haider

Examples

if (check.binary("bedtools")) {
  index <- get.example.regions();
  a <- index[[1]];
  a <- get.fasta(bedr.sort.region(a));
  is.valid.seq(x = a, querySeq = a$sequence);
}
if (check.binary("bedtools")) {
  index <- get.example.regions();
  a <- index[[1]];
  a <- get.fasta(bedr.sort.region(a));
  is.valid.seq(x = a, querySeq = a$sequence);
}

calculate the jaccard distance between sets of intervals

Description

calculate the jaccard distance between sets of intervals

Usage

jaccard(
	x,
	y,
	proportion.overlap = 1e-09,
	reciprocal.overlap = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
jaccard(
	x,
	y,
	proportion.overlap = 1e-09,
	reciprocal.overlap = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	first region to be compared
`y`	second region to be compared
`proportion.overlap`	Defaults 1e-9 which is 1 bp. See details below for the different interpretation between 0 and 1 based overlap
`reciprocal.overlap`	Should the proportion.overlap be reciprocal
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Details

The Jaccard metric is the ratio of intersections to unions. The process can be tweaked by changing the proportion of overlap and even growiwng the regions.

Value

A short vector.

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/jaccard.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
jaccard(a,b);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
jaccard(a,b);

}

Interface to R's modifyList

Description

Interface to R's modifyList

Usage

modifyList2(x, val)
modifyList2(x, val)

Arguments

`x`	a named list
`val`	a named list with components to be updated using x

Value

modified version of x

Author(s)

Daryl Waggott

Gets the sort order of a region index similar to the order command

Description

Helps if you don't want to use sort region on a huge dataset

Usage

order.region(
	x,
	method = "lex",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.merge = TRUE
	)
order.region(
	x,
	method = "lex",
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.merge = TRUE
	)

Arguments

`x`	index or bed style data.frame
`method`	natural or lexicographical (lex)
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	check if region is valid
`check.merge`	check if region is sorted and merged

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
a.order <- order.region(a)

b <- a[a.order];

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
a.order <- order.region(a)

b <- a[a.order];

}

permute a set of regions

Description

permute a set of regions

Usage

permute.region(
	x,
	stratify.by.chr = FALSE,
	species = "human",
	build = "hg19",
	chr.names = paste0("chr",c(1:22,"X","Y","M")),
	mask.gaps = FALSE,
	gaps.file = NULL,
	mask.repeats = FALSE,
	repeats.file = NULL,
	sort.output = TRUE,
	is.checked = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	verbose = TRUE
	)
permute.region(
	x,
	stratify.by.chr = FALSE,
	species = "human",
	build = "hg19",
	chr.names = paste0("chr",c(1:22,"X","Y","M")),
	mask.gaps = FALSE,
	gaps.file = NULL,
	mask.repeats = FALSE,
	repeats.file = NULL,
	sort.output = TRUE,
	is.checked = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	verbose = TRUE
	)

Arguments

`x`	regions to permute
`stratify.by.chr`	Should the permutation be happen separetely for each chromosome. That is are chromosomes exchangeable.
`species`	species
`build`	the build of the reference
`chr.names`	names of the chromosomes to use
`mask.gaps`	should the gaps (Ns) in the human reference be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. It defaults to off
`gaps.file`	database file of gaps. Defaults to Homo sapiens Hg19 gap.txt.gz file available through UCSC
`mask.repeats`	should the repeats from repeatMasker be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off
`repeats.file`	database file of repeats as supplied by UCSC containing RepMasker data e.g rmsk.txt.gz
`sort.output`	should the output be sorted
`is.checked`	Has the input data already be tested for validity. This is often done once before multiple permutations.
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`verbose`	should log messages and checking take place

Details

1. Sampling with replacement on region length. 2. Sampling with replacement on start positions. Positions that contain Ns in the reference are set to 0 weight during sampling.

Regions that overlap the end of a chromosome or gap are trimmed.

Steps 1 and 2 are done within chromosomes if stratify.by.chr is set to true.

Value

A region object with randomized start positions.

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
a.perm <- permute.region(a);

}
if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
a.perm <- permute.region(a);

}

process.input

Description

process.input

Usage

process.input(
	input, 
	tmpDir = NULL,
	include.names = TRUE, 
	check.zero.based = TRUE, 
	check.chr = TRUE, 
	check.valid = TRUE, 
	check.sort = TRUE, 
	check.merge = TRUE, 
	verbose = TRUE
	)
process.input(
	input, 
	tmpDir = NULL,
	include.names = TRUE, 
	check.zero.based = TRUE, 
	check.chr = TRUE, 
	check.valid = TRUE, 
	check.sort = TRUE, 
	check.merge = TRUE, 
	verbose = TRUE
	)

Arguments

`input`	regions input or a file in one of the standard formats. these are bed, vcf, gff, bam, sam, csv, tsv, txt
`tmpDir`	The directory to be used for writing files
`include.names`	should the names of the input files be included in the output
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checked for integerity
`check.sort`	should the region sorting be checked
`check.merge`	should overlapping regions be checked
`verbose`	messages and checks

Value

list of input files

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
a.processed <- process.input(a, verbose = FALSE)

}
if (check.binary("bedtools")) {

index <- get.example.regions();
a <- index[[1]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
a.processed <- process.input(a, verbose = FALSE)

}

read a ucsc table into R

Description

read a ucsc table into R

Usage

query.ucsc(
	x,
	mirror = "http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database",
	download = TRUE,
	overwrite.local = FALSE,
	columns.keep = NULL,
	verbose = TRUE
	)
query.ucsc(
	x,
	mirror = "http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database",
	download = TRUE,
	overwrite.local = FALSE,
	columns.keep = NULL,
	verbose = TRUE
	)

Arguments

`x`	a ucsc data table. Include the full path including "txt.gz" extenstion to load from a local file. Note that $HOME/bedr/data will be checked first before downloading.
`mirror`	the ucsc mirror
`download`	should the data be downloaded to $HOME/bedr/data/
`overwrite.local`	should the local version be overwritten if it exists
`columns.keep`	what columns to load. this can help with very large tables where you only want 'chr,start,end'. defaults to all. you may have to check the sql for the actual column names.
`verbose`	more words

Details

tables can be found at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database

Value

A data.frame

Author(s)

Daryl Wagott

Examples

## Not run: 
  query.ucsc("refGene");
  
## End(Not run)
## Not run: 
  query.ucsc("refGene");
  
## End(Not run)

Read a vcf into R

Description

Read a vcf into R and parse it for downstream analysis

Usage

read.vcf(x, split.info = FALSE, split.samples = FALSE, nrows = -1, verbose = TRUE)
read.vcf(x, split.info = FALSE, split.samples = FALSE, nrows = -1, verbose = TRUE)

Arguments

`x`	A vcf
`split.info`	Split the info into columns
`split.samples`	Split the sample into columns. If multiple samples then a list matrices will be created, one matrix for each element in the FORMAT tag.
`nrows`	The the number of rows to be read. Set to 0 to parse the header.
`verbose`	print progress

Details

The function can be slow for splitting the INFO, FORMAT for large VCFs.

Value

VCF representation in R as a list. The first element in the list is the header, the second the body of the VCF. Every repeating tag in the header i.e. INFO, FORMAT is structured as matrix. If reading a multi-sample VCF and the split.sample = TRUE, then a matrix is added to the list for every tag in the FORMAT string.

Note that by default the vcf is returned as a data.table not a data.frame. Therefore there are some quirks i.e. subsetting via named columns a$vcf[,"CHROM", with = FALSE]. If in doubt just caset to data.frame.

Author(s)

Daryl Waggott

Examples


clinVar.vcf.example      <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
singleSample.vcf.example <- system.file("extdata/singleSampleOICR_example.vcf.gz", package = "bedr")
multiSample.vcf.example  <- system.file("extdata/multiSampleOICR_example.vcf.gz", package = "bedr")

# read a subset of NCBI clinVar vcf.  Note this has no samples.
vcf1.a <- read.vcf(clinVar.vcf.example)
vcf1.b <- read.vcf(clinVar.vcf.example, split.info = TRUE)

## Not run: 

# same as above but split multiple samples
vcf1.c <- read.vcf(clinVar.vcf.example, split.info = TRUE, split.sample = TRUE) 

# read a single-sample VCF
system.time(
  vcf2.a <- read.vcf(singleSample.vcf.example, split.info = TRUE, split.sample = TRUE)
  )

# read a multi-sample VCF
vcf3.a <- read.vcf(multiSample.vcf.example, split.info = FALSE, split.sample = TRUE);

# multi core example
options("cores"=9);
vcf2.a <- read.vcf(singleSample.vcf.example, split.info = TRUE, split.sample = TRUE)
options("cores"=1);


## End(Not run)
clinVar.vcf.example      <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
singleSample.vcf.example <- system.file("extdata/singleSampleOICR_example.vcf.gz", package = "bedr")
multiSample.vcf.example  <- system.file("extdata/multiSampleOICR_example.vcf.gz", package = "bedr")

# read a subset of NCBI clinVar vcf.  Note this has no samples.
vcf1.a <- read.vcf(clinVar.vcf.example)
vcf1.b <- read.vcf(clinVar.vcf.example, split.info = TRUE)

## Not run: 

# same as above but split multiple samples
vcf1.c <- read.vcf(clinVar.vcf.example, split.info = TRUE, split.sample = TRUE) 

# read a single-sample VCF
system.time(
  vcf2.a <- read.vcf(singleSample.vcf.example, split.info = TRUE, split.sample = TRUE)
  )

# read a multi-sample VCF
vcf3.a <- read.vcf(multiSample.vcf.example, split.info = FALSE, split.sample = TRUE);

# multi core example
options("cores"=9);
vcf2.a <- read.vcf(singleSample.vcf.example, split.info = TRUE, split.sample = TRUE)
options("cores"=1);


## End(Not run)

Calculate the relative distance between two sets of intervals

Description

Calculate the relative distance between two sets of intervals

Usage

reldist(
	x,
	y,
	detail = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort =TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)
reldist(
	x,
	y,
	detail = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort =TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	firt region to be compared
`y`	second region to be compared
`detail`	should the relative distance be printed for every region
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`check.sort`	should regions be checked for sort order
`check.merge`	should overlapping regions be checked
`verbose`	should log messages and checking take place

Details

The frequency of relative distances in bins spanning 0 to 0.5

Author(s)

Daryl Waggott

References

http://bedtools.readthedocs.org/en/latest/content/tools/reldist.html

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
reldist(a,b);

}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");
reldist(a,b);

}

Get region size

Description

Get region size

Usage

size.region(
	x, 
	zero.based = TRUE, 
	check.zero.based = TRUE, 
	check.chr = TRUE, 
	check.valid = TRUE, 
	verbose = TRUE
	)
size.region(
	x, 
	zero.based = TRUE, 
	check.zero.based = TRUE, 
	check.chr = TRUE, 
	check.valid = TRUE, 
	verbose = TRUE
	)

Arguments

`x`	region in vector, matrix or dataframe format
`zero.based`	whether the coordinates are zero-based or 1
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checked for integerity
`verbose`	messages and checks

Value

size/length of the region

Author(s)

Daryl Waggott

Examples

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

a.sizes <- bedr:::size.region(a);
}
if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];

a.sizes <- bedr:::size.region(a);
}

split a vector of strings into tabular data

Description

split a vector of strings into tabular data

Usage

strsplit2matrix(x, split, fixed = FALSE, perl = FALSE)
strsplit2matrix(x, split, fixed = FALSE, perl = FALSE)

Arguments

`x`	a character vector
`split`	the character or regex to split on
`fixed`	fixed i.e. no regex
`perl`	per style

Author(s)

Daryl Waggott

Examples

## Not run: 
  a.bed <- strSplitToMatrix(x);
  
## End(Not run)
## Not run: 
  a.bed <- strSplitToMatrix(x);
  
## End(Not run)

Main bedtools wrapper function.

Description

Main bedtools wrapper function.

Usage

tabix(
	region,
	file.name,
	params = NULL,
	tmpDir = NULL,
	deleteTmpDir = TRUE,
	outputDir = NULL,
	outputFile = NULL,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

tabix(
	region,
	file.name,
	params = NULL,
	tmpDir = NULL,
	deleteTmpDir = TRUE,
	outputDir = NULL,
	outputFile = NULL,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`region`	The regions to query the tabix'd file
`file.name`	The name of the bzipped/indexed tabix file to query
`params`	A string that includes all the extra parameters and arguments to the bedtools commmand. For example if you wanted to do a left outer join you would specificy method as intersect and use params = c("-loj -header"). If you leave input and method as defaults then this is this string represents the full command.
`tmpDir`	The directory to be used for writing files
`deleteTmpDir`	Should tmp files be deleted. helpful for diagnostics.
`outputDir`	The output directory. Only used if outputFile is specified. It defaults to the current working directory.
`outputFile`	The name of the output file. If this is specified the output will be sent to a file not an R object
`check.chr`	check for chr prefix
`check.zero.based`	check for zero based coordinates
`check.valid`	do all region integrity checks
`check.sort`	check if region is sorted
`check.merge`	check if region is merged
`verbose`	Should messages be printed to screen.

Value

The output of command with some parsing to keep it consistent with the input.

Author(s)

Daryl Waggott

Examples

if (check.binary("tabix")) {
query.regions <- c("1:1000-100000", "1:1000000-1100000")
cosmic.vcf.example <- system.file(
	"extdata/CosmicCodingMuts_v66_20130725_ex.vcf.gz",
	package = "bedr"
	)
cosmic.query <- tabix(query.regions, cosmic.vcf.example, check.chr = FALSE)
}
if (check.binary("tabix")) {
query.regions <- c("1:1000-100000", "1:1000000-1100000")
cosmic.vcf.example <- system.file(
	"extdata/CosmicCodingMuts_v66_20130725_ex.vcf.gz",
	package = "bedr"
	)
cosmic.query <- tabix(query.regions, cosmic.vcf.example, check.chr = FALSE)
}

Plot venn diagram

Description

Plot venn diagram of regions intersect

Usage

table2venn(x, var.names)
table2venn(x, var.names)

Arguments

`x`	intersect table of regions
`var.names`	names of the overlapping regions

Value

venn diagram input list

Author(s)

Daryl Waggott

Compare sets of regions via jaccard and relative distance using permutation

Description

Compare sets of regions via jaccard and relative distance using permutation to get an empirical p-value.

Usage

test.region.similarity(
	x,
	y,
	n = 1000,
	stratify.by.chr = FALSE,
	species = "human",
	build = "hg19",
	mask.gaps = FALSE,
	mask.repeats = FALSE,
	gaps.file = NULL,
	repeats.file = NULL,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	verbose = TRUE
	)
test.region.similarity(
	x,
	y,
	n = 1000,
	stratify.by.chr = FALSE,
	species = "human",
	build = "hg19",
	mask.gaps = FALSE,
	mask.repeats = FALSE,
	gaps.file = NULL,
	repeats.file = NULL,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	verbose = TRUE
	)

Arguments

`x`	first region to be compared. this is the region that is permuted
`y`	second region to be compared
`n`	the number of iterations to permute
`stratify.by.chr`	Should the permutation be happen separetely for each chromosome. That is are chromosomes exchangeable.
`species`	species
`build`	the build of the reference
`mask.gaps`	should the gaps (Ns) in the human reference be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off.
`mask.repeats`	should the repeats from repeatMasker be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off.
`gaps.file`	database file of gaps. Defaults to Homo sapiens Hg19 gap.txt.gz file available through UCSC
`repeats.file`	database file of repeats as supplied by UCSC containing RepMasker data e.g rmsk.txt.gz
`check.zero.based`	should 0 based coordinates be checked
`check.chr`	should chr prefix be checked
`check.valid`	should the region be checkded for integerity
`verbose`	should log messages and checking take place

Details

Iteratively permutes intervals and recalculates jaccard and reldist statistics.

Value

A list of results

Author(s)

Daryl Waggott

Examples


if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");

# a simple example
test.region.similarity(a, b, n = 8)

# note you can set the cores available to parallelize
options(cores = 4);
system.time(test.region.similarity(a, b, n = 8));

# a real example comparing the distribution of mRNA vs ncRNA genes in RefSeq
## Not run: 

# more core
options(cores = 8);

# load refgene
refgene <- query.ucsc("refGene")
refgene <- refgene[,c("chrom","txStart","txEnd","name","name2","strand")]

# only include canonical chr
chr <- paste0("chr", c(1:22,"X","Y")); 
refgene <- refgene[refgene$chrom 

# remove genes with multiple positions
duplicated.gene <- duplicated(refgene$name2) | duplicated(rev(refgene$name2));
refgene <- refgene[!duplicated.gene,];

# only select pr coding genes
refgene.nm <- refgene[grepl("^NM",refgene$name),];
# only select non protein coding rna genes	
refgene.nr <- refgene[grepl("^NR",refgene$name),];

# sort and merge
refgene.nm <- bedr.snm.region(refgene.nm,check.chr = FALSE);
refgene.nr <- bedr.snm.region(refgene.nr,check.chr = FALSE);

test.region.similarity(refgene.nm, refgene.nr, mask.unmapped = TRUE );

option(core = 1)

## End(Not run)

}

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");

# a simple example
test.region.similarity(a, b, n = 8)

# note you can set the cores available to parallelize
options(cores = 4);
system.time(test.region.similarity(a, b, n = 8));

# a real example comparing the distribution of mRNA vs ncRNA genes in RefSeq
## Not run: 

# more core
options(cores = 8);

# load refgene
refgene <- query.ucsc("refGene")
refgene <- refgene[,c("chrom","txStart","txEnd","name","name2","strand")]

# only include canonical chr
chr <- paste0("chr", c(1:22,"X","Y")); 
refgene <- refgene[refgene$chrom 

# remove genes with multiple positions
duplicated.gene <- duplicated(refgene$name2) | duplicated(rev(refgene$name2));
refgene <- refgene[!duplicated.gene,];

# only select pr coding genes
refgene.nm <- refgene[grepl("^NM",refgene$name),];
# only select non protein coding rna genes	
refgene.nr <- refgene[grepl("^NR",refgene$name),];

# sort and merge
refgene.nm <- bedr.snm.region(refgene.nm,check.chr = FALSE);
refgene.nr <- bedr.snm.region(refgene.nr,check.chr = FALSE);

test.region.similarity(refgene.nm, refgene.nr, mask.unmapped = TRUE );

option(core = 1)

## End(Not run)

}

convert a vcf to a bed file

Description

Convert a vcf to a bed file. Currently, it needs to read into R via read.vcf

Usage

vcf2bed(x, filename = NULL, header = FALSE, other = NULL, verbose = TRUE)
vcf2bed(x, filename = NULL, header = FALSE, other = NULL, verbose = TRUE)

Arguments

`x`	a vcf object
`filename`	the name of file if you want it exported
`header`	indicate if the bed file has header or not when exported
`other`	fields to include apart from chr, start, end.
`verbose`	more words

Value

A bed styled R object or an external file

Author(s)

Daryl Waggott

Examples

clinVar.vcf.example <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
x <- read.vcf(clinVar.vcf.example)
x.bed <- vcf2bed(x)
clinVar.vcf.example <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
x <- read.vcf(clinVar.vcf.example)
x.bed <- vcf2bed(x)

write a vcf object

Description

write a vcf object

Usage

write.vcf(x, filename = NULL, verbose = TRUE)
write.vcf(x, filename = NULL, verbose = TRUE)

Arguments

`x`	a vcf object
`filename`	a filename
`verbose`	more words

Details

The input needs to be a vcf object. This

Value

A vcf file

Author(s)

Daryl Waggott

References

vcf format specifications

Examples


vcf  <- read.vcf(system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr"));
vcf$header <- c(vcf$header, NOTE="vcf processed by bedr")

## Not run: 
  write.vcf(vcf, filename = paste(tempdir(), "/bedr.example.vcf", sep = ""));
  
## End(Not run)
vcf  <- read.vcf(system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr"));
vcf$header <- c(vcf$header, NOTE="vcf processed by bedr")

## Not run: 
  write.vcf(vcf, filename = paste(tempdir(), "/bedr.example.vcf", sep = ""));
  
## End(Not run)

Package 'bedr'

Help Index

A bedtools wrapper for working with genomic ranges in R

Description

Details

Author(s)

checks if regions in object a are found in object b

Description

Usage

Arguments

Details

Value

Author(s)

Examples

bed dataframe to index string

Description

Usage

Arguments

Value

Author(s)

Examples

convert bed to vcf

Description

Usage

Arguments

Author(s)

Examples

Main bedtools wrapper function.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

join multiple region objects

Description

Usage

Arguments

Author(s)

References

Examples

join two region objects using a left outer join

Description

Usage

Arguments

Author(s)

References

Examples

merge i.e. collapse overlpaping regions

Description

Usage

Arguments

Author(s)

References

Examples

Visualize regions or intervals

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Initialize some config settings for bedr

Description

Usage

Arguments

Details

Author(s)

Examples

sort a region file

Description

Usage

Arguments

Value

Author(s)

Examples

sort a region file

Description