Computational pipeline for the detection and characterization of human and chimpanzee insertions and deletions. Using information from the designated databases, we characterized insertions and deletions (INDELs) and analyzed them using various in-house Perl scripts and open source algorithms (Multiz, RepeatMasker  and Tandem Repeats Finder ). The multiple alignment program Multiz was used to classify chimpanzee gaps (CGs) as insertions or deletions. The UCSC Genome Browser  pairwise alignment databases were used for human gap (HG) classification as insertions or deletions. Human and chimpanzee INDELs were associated with the known human and chimpanzee Ensembl genes  obtained from the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables webcite), and the presence of INDELs was correlated with the microarray gene expression data. INDEL sequences that were obtained from their corresponding reference genomes were searched for various repeat elements using RepeatMasker and Tandem Repeats Finder and classified according to the families of repeat sequences (partial or complete) present within each INDEL. The characterized INDELs were then assessed using various statistical analytical methods.
Polavarapu et al. Mobile DNA 2011 2:13 doi:10.1186/1759-8753-2-13