Pipelines designed/Implemented
In last one year, I was actively involved in designing and implementing the following pipelines, to develop the markers for physical and genetic mapping in mimulus, essentially to facilitate genome assembly and comparative analysis of the Mimulus genome.
[1] SSR pipeline: Involves developing SSR markers on the BAC end sequences, which are present in minimum tiling path based on repeat motifs. These SSR markers, having present on the FPC minimum tiling path, their physical position is known. Once if these SSR markers are genetically mapped on a mapping population, the desired trait can be physically located on the genome.
[2] The EPIC Primer Pipeline: EPIC (exon-primed intron crossing) primers are gene based markers designed on exons next to introns that flank the coding regions of single copy genes for detection of polymorphisms across non-coding regions. EPIC markers are highly useful in detecting low copy number gene polymorphisms. I have designed a pipeline that generates EPIC markers from Mimulus unigene clusters makes the database for further use. The EPIC pipeline involves series of complex steps including building the unigene contigs from ESTs, Blast against known system like Arabidopsis to find the codon positions and designing primers on the exons, very next to the intronic region and making the usable qyery based database of designed markers.
[3] Overgo Pipeline for physical mapping: Overgos are small oligos that is used to physically locate the markers, developed by various pipelines by hybridization experiment on the BACs (Bacterial Artificial chromosomes). The overgo design pipeline involves assembly if unigene clusters, finding the codon positions using known model system, finding the codon positions and designing the oligos in the exonic region.
[4] BES-Scaffold Mapping Pipeline: The objective of this pipe line is to Map the BAC end sequences on Mimulus scaffolds. It is assumed that the BAC end sequences should have perfect similarity with the scaffolds. However due to the experimental error at various stages, there is a possibility of mismatches. BLAT was basically decided to use due to its innate nature of finding similarity between two sequences that are of 95% or greater identity.
[5] Microsynteny Analysis Pipeline: I was also involved in the microsynteny analysis between Mimulus and Arabidopsis, to demonstrate the systeny/genome wide variability in Mimulus. We have sequenced and annotated 14 large-insert clones comprising 1.5 Mb from the Mimulus guttatus genome. Gene density varies from 0.009 to 0.166 per Kb. The average gene density of 0.11 per kb leads to an estimate of ~50K genes in the 430 Mb genome. Intergenic spacers, which are rich in tandem repeats and LTR retrotransposons, range in size from 0.5 to 12 Kb, and their distribution suggests that genes are organized into gene-rich islands. Relative to Arabidopsis, fewer retrotransposons and DNA transposons, and more simple and low complexity repeats, have been found. Analysis of microsynteny with Arabidopsis reveals the sparse sysnteny blocks between the Mimulus and Arabidopsis.
The analysis pipeline involved the folloiwng series of complex steps
[1] Annotation of different coputational evidences for genes:
- Gene prediction (genewise, genscan, genemark)
- Est alignment: Mululus ESTs aligned to BACs
- Transposons and repeats
- Blast with Arabidopsis
[2] Blast: The annotated genes (cdna) against the Arabidopsis peptides
[3] Synteny analysis: using FISH (Fast Identification of Segmental Homology)
[4] Visualising the results using circos
