Small InDels on NA19240 with SOLiD OVERVIEW: The SOLiD System was used to sequence NA19240, and small indels were called for this sample using Corona v2 software. Deletions to size 11 and insertions to size 3 were called. DETECTION METHOD: * The SOLiD System was used to sequence NA19240 to about 30x coverage, including a mixture of mate pair reads and fragment reads. * Paired-end reads for which one read map up to z times to the genome and the second read does not map in the correct position(s) are identified. * The second read is remapped to a constrained region of the genome, determined by the expected location given library and pairing parameters, and of the predicted map location of the first read * Only unique pairs where the 2nd read has an indel (up to size 3 insertion and size 11 deletion) are extracted. * InDels are called if at least 3 reads independently confirm evidence of an InDel at the same position; if these reads have different start positions; and if these reads contain the InDel at positions not at the extreme end of the read. SOFTWARE: Source code for the mapping and pairing tools (including mapping with InDels) and the InDel detection tool can be downloaded at www.solidsoftwarretools.com in early December 2008. ORIGINAL DATA: Data used to call the small InDels is available for download at: ftp1.solidsoftwaretools.com/disk-b/NA19240/ Login: ftpdl1 Password: ftpdl1 FORMAT: The small Indel file is in standard GFF v3 file format with optional fields in the final column. Specifically, the format is as follows: Column 1: “seqid” The ID of the sequence to which the start and end coordinates refer; in this case, it is the human chromosome number. Column 2: “source” Free text qualifier indicating the algorithm or method that generated the feature. This should be the name of the software that generates the output file. (Prefix with “AB_”?) Column 3: “type” Specifies what kind of SOFA feature it is. This file contains the features insertion_site and deletion. Columns 4 and 5: “start” and “end” 1-based integer coordinates of the feature (relative to the sequence in column 1. For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark. Column 6: “score” Floating point value representing the quality of the evidence for the feature. Column 7: “strand” “.” meaning strand is not relevant for this feature. Column 8: “phase” Translation frame; “.” since phase is relevant only for CDS features. Column 9: “attributes” ins_len Insertion length del_len Deletion length tight_chrom_pos Conservative estimate of chromosome position range of the feature. loose_chrom_pos Maximum estimate of chromosome position range of the feature. no_nonred_reads Number of reads with unique start positions (non-redundant reads) no_mismatches Number of mismatches for each read read_pos Position in each non-redundant read at which the In/Del occurs strands Strand for each read dbSNP Annotation of any dbSNP InDel entry within 10bp of the InDel uw_hgsv Annotation of any U. Washington HGSV InDel entry within 10bp of the InDel AUTHORS: Eric Tsung, Zheng Zhang, Heather Peckham, Fiona Hyland, Stephen McLaughlin, Jingwei Ni, Yutao Fu, Francisco De La Vega, Kevin McKernan. CONTACT: fiona.hyland@appliedbiosystems.com