Directory Contents

DCC VerifyBamID results for 1401 exome bams based on 20110421 sequence.index
plus VerifyBamID output files for each bam checked

See
http://genome.sph.umich.edu/wiki/VerifyBamID
for how to interpret results.

========
files

20110614_1401_exome_bam_20110421_verifybam.results
 contains results from VerifyBamID .selfSM files for the all 1401 exome bams sorted in the
 following manner
	cat OUTPUT_FILES/*/*.selfSM | grep -v ^SEQ | cut -f 1,4,5,6,17,19,21,25,26,27,28 | sort -n -k 2 |cat -n
 

These results are then split based on center and platform.
20110614_bc_illumina_20110421_verifybamid.results 688
20110614_bi_illumina_20110421_verifybam.results   530
20110614_bcm_solid_20110421_verifybam.results     188

=========
Directory
OUTPUT_FILES
 contains all output files created by VerifyBamID for each bam checked in 
 sample id labelled directories
	
========
EXOME_DATA

contains the files required by VerifyBamID when using the  --bfile option.
eg. --bfile exome


========================================================================
Below is an outline of process used to create the files used by
VerifyBamID and an example command line of how the program was run.

The snps used for analysis can be found in 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20110527_bi_omni_1525_v2_genotypes/

Only snps that passed all filters were used.

zcat Omni25_genotypes_1525_samples_v2.b37.vcf.gz | egrep "^#|PASS" > passed_1525.vcf

bgzip passed_1525.vcf
tabix -p vcf  passed_1525.vcf.gz


I sorted the target list so CHR in numberical order.
sort -k 1n,1 -k 2n,2 20110426_exome_add50bp.consensus.bed > sorted.target_list

perl -e 'while(<>){chomp;@aa =split /\s+/; print "tabix ../passed_1525.vcf.gz $aa[0]:$aa[1]-$aa[2] >> exome_targetted.vcf\n";}' sorted.target_list > sort_get_exome_targets.sh

head sort_get_exome_targets.sh
tabix passed_1525.vcf.gz 1:69040-70058 >> exome_targetted.vcf
tabix passed_1525.vcf.gz 1:861271-861443 >> exome_targetted.vcf
.....


./sort_get_exome_targets.sh


Create plink formatted files.
vcftools_0.1.5/cpp/vcftools --plink --vcf exome_targetted.vcf

rename out exome out*


plink/plink-1.07-x86_64/plink --file exome --maf 0.01 --geno 0.05 --make-bed

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Web-based version check ( --noweb to skip )
Connecting to web... Problem connecting to web

Writing this text to log file [ plink.log ]
Analysis started: Fri May 20 12:51:56 2011

Options in effect:
        --file exome
        --maf 0.01
        --geno 0.05
        --make-bed

69074 (of 69074) markers to be included from [ exome.map ]


Warning, found 1525 individuals with ambiguous sex codes
Writing list of these individuals to [ plink.nosex ]
1525 individuals read from [ exome.ped ]
0 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
0 cases, 0 controls and 1525 missing
0 males, 0 females, and 1525 of unspecified sex
Before frequency and genotyping pruning, there are 69074 SNPs
1525 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 0.997257
307 SNPs failed missingness test ( GENO > 0.05 )
16060 SNPs failed frequency test ( MAF < 0.01 )
After frequency and genotyping pruning, there are 52757 SNPs
After filtering, 0 cases, 0 controls and 1525 missing
After filtering, 0 males, 0 females, and 1525 of unspecified sex
Writing pedigree information to [ plink.fam ]
Writing map (extended format) information to [ plink.bim ]
Writing genotype bitfile to [ plink.bed ]
Using (default) SNP-major mode

Analysis finished: Fri May 20 12:53:24 2011


rename plink exome plink*

example command line:
verifyBamID --reference  human_g1k_v37.fa --bfile exome --verbose -d 1500 --precise --in HG00181.mapped.ILLUMINA.BWA.FIN.exome.20110228.bam --out HG00181


Contact:
Richard Smith
DCC
1000 Genomes Project 
smithre@ebi.ac.uk

or 
resequencing-informatics@ebi.ac.uk