snpgdsVCF2GDS_R {SNPRelate} | R Documentation |
Reformat a Variant Call Format (VCF) file
snpgdsVCF2GDS_R(vcf.fn, out.fn, nblock=1024, method = c("biallelic.only", "copy.num.of.ref"), compress.annotation="ZIP_RA.max", snpfirstdim=FALSE, option = NULL, verbose=TRUE)
vcf.fn |
the file name of VCF format, |
out.fn |
the output gds file |
nblock |
the buffer lines |
method |
either "biallelic.only" by default or "copy.num.of.ref", see details |
compress.annotation |
the compression method for the GDS variables,
except "genotype"; optional values are defined in the function
|
snpfirstdim |
if TRUE, genotypes are stored in the individual-major mode, (i.e, list all SNPs for the first individual, and then list all SNPs for the second individual, etc) |
option |
|
verbose |
if TRUE, show information |
GDS – Genomic Data Structures used for storing genetic array-oriented data, and the file format used in the gdsfmt package.
VCF – The Variant Call Format (VCF), which is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations.
If there are more than one file name in vcf.fn
, snpgdsVCF2GDS
will merge all dataset together once they all contain the same samples. It is
useful to combine genetic data if VCF data are divided by chromosomes.
method = "biallelic.only"
: to exact bi-allelic and polymorhpic
SNP data (excluding monomorphic variants);
method = "biallelic.only"
: to exact bi-allelic and polymorhpic SNP
data; method = "copy.num.of.ref"
: to extract and store dosage (0, 1, 2)
of the reference allele for all variant sites, including bi-allelic SNPs,
multi-allelic SNPs, indels and structural variants.
Haploid and triploid calls are allowed in the transfer, the variable
snp.id
stores the original the row index of variants, and the variable
snp.rs.id
stores the rs id.
The user could use option
to specify the range of code for autosomes.
For humans there are 22 autosomes (from 1 to 22), but dogs have 38 autosomes.
Note that the default settings are used for humans. The user could call
option = snpgdsOption(autosome.end=38)
for importing the VCF file of dog.
It also allows defining new chromosome coding, e.g.,
option = snpgdsOption(Z=27)
, then "Z" will be replaced by the number 27.
None.
Xiuwen Zheng
The variant call format and VCFtools. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R; 1000 Genomes Project Analysis Group. Bioinformatics. 2011 Aug 1;27(15):2156-8. Epub 2011 Jun 7.
snpgdsVCF2GDS_R
, snpgdsOption
,
snpgdsBED2GDS
# The VCF file vcf.fn <- system.file("extdata", "sequence.vcf", package="SNPRelate") cat(readLines(vcf.fn), sep="\n") snpgdsVCF2GDS_R(vcf.fn, "test1.gds", method="biallelic.only") snpgdsSummary("test1.gds") snpgdsVCF2GDS_R(vcf.fn, "test2.gds", method="biallelic.only") snpgdsSummary("test2.gds") snpgdsVCF2GDS_R(vcf.fn, "test3.gds", method="copy.num.of.ref") snpgdsSummary("test3.gds") snpgdsVCF2GDS_R(vcf.fn, "test4.gds", method="copy.num.of.ref") snpgdsSummary("test4.gds")