This page walks the user through a simple example with Vibrio cholerae and explains how to interpret the results. At the bottom of this page are some pre-formed queries for select pathogens.
The Insignia pipeline can be run by clicking the "Run Insignia" link on the left navigation panel. On the query page, the user can select a reference genome, and one or more target genomes. Additionally, the background can be modified to exclude certain genomes because of low quality (draft) sequences. If an assembly from a target species has more than 1000 contigs, it should probably be excluded from both the target list and the background. To exclude these genomes, select them from the "SELECT INDIVIDUAL GENOMES TO EXCLUDE" pick list on the query page.
To compute signatures that are specific to all strains of Vibrio cholerae, begin by selecting a reference Vibrio. A good choice is Vibrio cholerae El Tor N16961. This is a finished strain comprising two contigs, one for each chromosome. The number of assembly contigs can be seen in parentheses next to each genome. Next, to select the Target genomes, select all strains of V. cholerae either from the pick box, or from the taxonomy tree. Leave the background options untouched to include all genomic sequence in the background. Finally, select a signature word length of 20 and click Signify.
Signatures will appear at the bottom of the page once the search is complete. The result of the search will be a list of signature chains and the corresponding DNA sequence from the reference genome. A signature chain is a set of consecutive 20-mer signature words. Intervals are given as the start of the first signature word and the end of the last signature word in the chain. Thus, the interval [s,e] contains exactly e-s-20+2 signature words,completely covering the interval [s,e] in the reference sequence. For the above search, more signatures will be found than can be displayed at once. To reduce this to a more reasonable number, slide the "Signature chain length" slider to ~100bp. Also, for convenience, check the "Show corresponding gene info" and "Sort by sequence length" boxes.
Signature words are perfectly conserved by all target genomes, and contain at least a single difference from every background sequence. Therefore, a signature chain will contain a difference with the background at least every 20 bases. For some types of detection assays, these signatures can still cross-react with background sequences and return false positive detections. However, we have found that long signature chains (e.g. >100bp) are often quite dissimilar from the background and make good targets for detection assays. After identifying these candidate target sequences, we recommend performing a more sensitive background screen of the individual signatures using Blast to assure they are sufficiently unique. This can be done by selecting the desired signatures and choosing "Run BLAST search" from the pick list above the signature table. Visit the Help Page for more information regarding the output formats.