The Vmatch large scale sequence analysis software

Stefan Kurtz

June 15, 2017

show matches of different sizes in a matchgraph

Download Vmatch!

This is the web-site for Vmatch, a versatile software tool for eﬃciently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very ﬂexible user interface, and improved space and time requirements. Here is a printable version of this HTML-page in PDF.

Features of Vmatch

The Vmatch-manual gives many examples on how to use Vmatch. Here are the program’s most important features.

Persistent index

Usually, in a large scale matching problem, extensive portions of the sequences under consideration are static, i.e. they do not change much over time. Therefore it makes sense to preprocess this static data to extract information from it and to store this in a structured manner, allowing eﬃcient searches. Vmatch does exactly this: it preprocesses a set of sequences into an index structure. This is stored as a collection of several ﬁles constituting the persistent index. The index eﬃciently represents all substrings of the preprocessed sequences and, unlike many other sequence comparison tools, allows matching tasks to be solved in time, independent of the size of the index. Diﬀerent matching tasks require diﬀerent parts of the index, but only the required parts of the index are accessed during the matching process.

Alphabet independency

Most software tools for sequence analysis are restricted to DNA and/or protein sequences. In contrast, Vmatch can process sequences over any user deﬁned alphabet not larger than 250 symbols. Vmatch fully implements the concept of symbol mappings, denoting alphabet transformations. These allow the user to specify that diﬀerent characters in the input sequences should be considered identical in the matching process. This feature is used to group similar amino acids, for example.

Versatility

Vmatch allows a multitude of diﬀerent matching tasks to be solved using the persistent index. Every matching task is basically characterized by (1) the kind of sequences to be matched, (2) the kind of matches sought, (3) additional constraints on the matches, and (4) the kind of postprocessing to be done with the matches.

In the standard case, Vmatch matches sequences over the same alphabet. Additionally, DNA sequences can be matched against a protein sequence index in all six reading frames. Finally, DNA sequences can be transformed in all six reading frames and compared against itself.

Where appropriate, Vmatch can compute the following kinds of matches, using state-of-the-art algorithms:

maximal and supermaximal repeats using the algorithms of M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suﬃx trees with enhanced suﬃx arrays. Journal of Discrete Algorithms, 2:53–86, 2004
branching tandem repeats using the algorithm of M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The enhanced suﬃx array and its applications to genome analysis. In Proceedings of the Second Workshop on Algorithms in Bioinformatics, pages 449–463. Lecture Notes in Computer Science 2452, Springer-Verlag, 2002
maximal (unique) substring matches using the algorithms of S. Kurtz. A Time and Space Eﬃcient Algorithm for the Substring Matching Problem, 2002
complete matches using the algorithms of U. Manber and E.W. Myers. Suﬃx Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935–948, 1993 and [86]

To compute degenerate substring matches or degenerate repeats, each kind of match (with the exception of tandem repeats and complete matches) can be taken as an exact seed and extended by either of two diﬀerent strategies:

the maximum error extension strategy, as described in
S. Kurtz, J.V. Choudhuri, E. Ohlebusch, C. Schleiermacher, J. Stoye, and R. Giegerich. REPuter: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res., 29(22):4633–4642, 2001 for repeat detection,
the greedy extension strategy of Z. Zhang, S. Schwartz, L. Wagner, and W. Miller. A Greedy Algorithm for Aligning DNA Sequences. J. Comp. Biol., 7(1/2):203–214, 2000

Matches can be selected according to their length, their E-value, their identity value, or match score.

In the standard case, a match is displayed as an alignment including positional information. Alternatively, a match can directly be postprocessed in diﬀerent ways:

inverse output, i.e. reporting of substrings not covered by a match.
masking of substrings covered by a match.
clustering of sequences according to the matches found.
chaining of matches, i.e. ﬁnding optimal subsets of matches which do not cross, using the algorithms described in
M.I. Abouelhoda and E. Ohlebusch. A Local Chaining Algorithm and its Applications in Comparative Genomics. In Proc. 3rd Worksh. Algorithms in Bioinformatics (WABI 2003), number 2812 in Lecture Notes in Bioinformatics, pages 1–16. Springer-Verlag, 2003
clustering of matches according to pairwise sequence similarities computed by the dynamic programming algorithm of E. Ukkonen. Algorithms for Approximate String Matching. Information and Control, 64:100–118, 1985
clustering of matches according to the positions where they occur, following the approach of
N. Volfovsky, B.J. Haas, and S.L. Salzberg. A Clustering Method for Repeat Analysis in DNA Sequences. Genome Biology, 2(8):research0027.1–0027.11, 2001

Eﬃcient algorithms and data structures

Vmatch is based on enhanced suﬃx arrays described Abouelhoda, Kurtz & Ohlebusch, 2004. This data structure has been shown to be as powerful as suﬃx trees, with the advantage of a reduced space requirement and reduced processing time. Careful implementation of the algorithms and data structures incorporated in Vmatch have led to exceedingly fast and robust software, allowing very large sequence sets to be processed quickly. The 32-bit version of Vmatch can process up to 400 million symbols, if enough memory is available. For large server class machines (e.g. SUN-Sparc/Solaris, Intel Xeon/Linux, Compaq-Alpha/Tru64) Vmatch is available as a 64 bit version, enabling gigabytes of sequences to be processed.

Flexible input format

The most common formats for input sequences (Fasta, Genbank, EMBL, and SWISSPROT) are accepted. The user does not have to specify the input format. It is automatically recognized. All input ﬁles can contain an arbitrary number of sequences. Gzipped compressed inputs are accepted.

Customized output and match selection

Vmatch’s output can be parsed by other programs easily. Furthermore, several options allow for its customization. XML output is available and new output formats can easily be incorporated without changing Vmatch’s program code. Certain matches can easily be selected by user deﬁned criteria, without intermediate output and subsequent parsing.

The parts of Vmatch

Up until now we have referred to Vmatch as a collection of programs. In the following we use the same name, vmatch (in typewriter font), for the most important program in this collection. Besides vmatch, there are the following programs available:

mkvtree constructs the persistent index and stores it on ﬁles.
mkdna6idx constructs an index for a DNA sequence after translating this in all six reading frames.
vseqinfo delivers information about indexed database sequences.
vstree2tex outputs a representation of the index in LATEX-format. It can be used, for example, for educational or debugging purposes.
vseqselect selects indexed sequences satisfying speciﬁc criteria.
vsubseqselect selects substrings of a speciﬁed length range from an index.
vmigrate.sh converts an index from big endian to little endian architectures, or vice versa.
vmatchselect sort and selects matches delivered by vmatch.
chain2dim computes optimal chains of matches from ﬁles in Vmatch-format.
matchcluster computes clusters of matches from ﬁles in Vmatch-format.

Here is an overview of the dataflow in Vmatch.

Related tools

There are several tools which are based on the persistent index of Vmatch:

Genalyzer

is a graphical user interface to visualize the output of Vmatch in form of a match graph. For details see

J.V. Choudhuri, C. Schleiermacher, S. Kurtz, and R. Giegerich. Genalyzer: Interactive visualization of sequence similarities between entire genomes. Bioinformatics, 20:1964–1965, 2004

Genalyzer is not available any more.

MGA

is a program to compute multiple alignments of complete genomes. For details see

M. Höhl, S. Kurtz, and E. Ohlebusch. Eﬃcient multiple genome alignment. Bioinformatics, 18(Suppl. 1):S312–S320, 2002

Multimat

is a program to compute multiple exact matches between three or more genome size sequences. For details see

E. Ohlebusch and S. Kurtz. Space eﬃcient computation of rare maximal exact matches between multiple sequences. J. Comp. Biol., 15(4):357–377, 2008

Please contact Stefan Kurtz if you are interested in using Multimat.

PossumSearch

Is a program to search for position speciﬁc scoring matrices. For details, see

M. Beckstette, R. Homann, R. Giegerich, and S. Kurtz. Fast index based algorithms and software for matching position speciﬁc scoring matrices. BMC Bioinformatics, 7:389, 2006

GenomeThreader

is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments. GenomeThreader uses the matching capabilities of Vmatch to eﬃciently map the reference sequence to a genomic sequence. For details, see

G. Gremme, V. Brendel, M.E. Sparks, and S. Kurtz. Engineering a software tool for gene prediction in higher organisms. Information and Software Technology, 47(15):965–978, 2005

Biopieces

is a collection of bioinformatics tools that can be pieced together in a very easy and ﬂexible manner to perform both simple and complex tasks. Some Biopieces depend on Vmatch. For details see http://www.biopieces.org/.

Previous and Current Usages

We provide an annotated bibliography listing papers which applied Vmatch and shortly describe the tasks for which Vmatch was used. We omit our own papers. The references were collected by a search in Google scholar (which, as of Jan 2, 2016 retrieved 397 results.)

Usages in Plant Genome Research

V. Brendel, S. Kurtz, and V. Walbot. Comparative genomics of Arabidopsis and Maize: Prospects and limitations. Genome Biology, 3(3):reviews1005.1–1005.6, 2002
In this work Vmatch was used to a compute a non-redundant set from a large collection of protein sequences from Zea-Maize.
Similar applications are described in
Q. Dong, L. Roy, M. Freeling, V. Walbot, and V. Brendel. ZmDB, an integrated Database for Maize Genome Research. Nucleic Acids Res., 31:244–247, 2003.
PLEXdb is a database for gene expression resources for plants and plant pathogens, see
S. Dash, J. Van Hemert, L. Hong, R. P. Wise, and J. A. Dickerson. PLEXdb: gene expression resources for plants and plant pathogens. Nucleic Acids Res., 40(Database issue):D1194–1201, Jan 2012
PLEXdb provides a Vmatch-based web-service to match PLEXdb probes.
The assembly of the Arabidopsis thaliana genome from 2004 (GenBank entries of 2/19/04) contained vector sequence contaminations. For example, region 3 617 880 to 3 625 027 of chromosome II contained a cloning vector. Vmatch was used to detect the vector contamination, see here
Q. Dong, C.J. Lawrence, S.D. Schlueter, M.D. Wilkerson, S. Kurtz, C. Lushbough, and V. Brendel. Comparative Plant Genomics Resources at PlantGDB. Plant Physiology, Plant Database Focus Issue, 2005
This work describes PlantGDB, which provides a service called PatternSearch@PlantGDB for genome wide pattern searches in plant sequences. The service is based on Vmatch.
M. Lindow and A. Krogh. Computational evidence for hundreds of non-conserved plant micrornas. BMC Genomics, 6(1):119, 2005
In this work Vmatch was used for three diﬀerent tasks:
- Searching spliced mRNA in the Arabidopsis genome to detect micromatches of length at least 20 with maximum 2 mismatches.
- Finding matches of length at least 15 long with at most one mismatch between predicted mature miRNA-sequences and a set of ESTs as well as sequences from the Arabidopsis Small RNA Project (ASRP).
- Aligning and performing single linkage clustering of the predicted mature miRNA sequences. Candidate pairs aligning over at least 17 bases, allowing an edit distance of 1 were grouped in the same family.
J.-F. Pombert, C. Lemieux, and M. Turmel. The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes. BMC Biology, 4:3, 2006
M. Turmel, C. Otis, and C. Lemieux. The Chloroplast Genome Sequence of Chara vulgaris Sheds New Light into the Closest Green Algal Relatives of Land Plants. Molecular Biology and Evolution, 23:1324–1338, 2006
In these papers Vmatch was used to search and compare repeated elements in diﬀerent chloroplast DNA.
M. Spannagl, O. Noubibou, D. Haase, L. Yang, H. Gundlach, T. Hindemitt, K. Klee, G. Haberer, H. Schoof, and K.F.X. Mayer. MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res, 35(Database issue):D834–40, 2007 In this work about the MIPSPlantsDB database Vmatch was used to cluster large sequence sets.
E.G.W.M. Schijlen, C.H. Ric de Vos, S. Martens, H.H. Jonker, F.M. Rosin, J.W. Molthoﬀ, Y.M. Tikunov, G.C. Angenent, A.J. van Tunen, and A.G. Bovy. RNA interference silencing of chalcone synthase, the ﬁrst step in the ﬂavonoid biosynthesis pathway, leads to parthenocarpic tomato fruits. Plant Physiol, 144(3):1520–30, 2007
In this work Vmatch was used to compare target genes of the tomato Chs RNAi to a tomato gene index.
M. Lindow, A. Jacobsen, S. Nygaard, Y. Mang, and A. Krogh. Intragenomic matching reveals a huge potential for mirna-mediated regulation in plants. PLOS Comput. Biol, 3(11):e238, 2007
In this work Vmatch was used to search diﬀerent plant genomes for matches of length at least 20 with maximum of 2 mismatches. Here the fact that Vmatch is an exhaustive search tool is important.
J.-C. de Cambiaire, C. Otis, M. Turmel, and C. Lemieux. The chloroplast genome sequence of the green alga leptosira terrestris: multiple losses of the inverted repeat and extensive genome rearrangements within the trebouxiophyceae. BMC Genomics, 8(1):213, 2007
In this work Vmatch was used to determine the presence of shared repeated elements of minimum length 30, with up to 10% mismatches using in diﬀerent sequence sets from the green alga Leptosira terrestris.
S. Ossowski, K. Schneeberger, R.M. Clark, C. Lanz, N. Warthmann, and D. Weigel. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res., 18:2024–2033, 2008
In this work Vmatch was used to map millions of short sequence reads to the A. Thaliana genome. Up to four mismatches and up to three indels were allowed in the matching process. The seed size was chosen to be 0. The reads were aligned using the best match strategy by iteratively increasing the the allowed number of mismatches and gaps at each round.
F. De Bona, S. Ossowski, K. Schneeberger, and G. Ratsch. Optimal spliced alignments of short sequence reads. Bioinformatics, 24(16):i174–180, 2008
In this work Vmatch was used to map millions of short sequence reads to the A. Thaliana genome. Vmatch was part of a multi-step pipeline, combining a fast matching algorithm (Vmatch) for initial read mapping and an optimal alignment algorithm based on dynamic programming (QPALMA) for high quality detection of splice sites.
A. G. L. Assunção, E. Herrero, Y-F. Lin, B. Huettel, S. Talukdar, C. Smaczniak, R. GH Immink, M. Van Eldik, M. Fiers, H. Schat, et al. Arabidopsis thaliana transcription factors bzip19 and bzip23 regulate the adaptation to zinc deﬁciency. Proceedings of the National Academy of Sciences, 107(22):10296–10301, 2010
In this work Vmatch was used for motif searching in diﬀerent plant genomes.
Andrea L Eveland, Namiko Satoh-Nagasawa, Alexander Goldshmidt, Sandra Meyer, Mary Beatty, Hajime Sakai, Doreen Ware, and David Jackson. Digital gene expression signatures for maize development. Plant physiology, 154(3):1024–1039, 2010
In this work Vmatch was used to map unique consensus sequence tags to the maize reference genome.
Jean-Simon Brouard, Christian Otis, Claude Lemieux, and Monique Turmel. The exceptionally large chloroplast genome of the green alga ﬂoydiella terrestris illuminates the evolutionary history of the chlorophyceae. Genome biology and evolution, 2:240, 2010
In this work Vmatch was used to identify and cluster repeated sequences in Floydiella chloroplast genome.
Hubert Rehrauer, Catharine Aquino, Wilhelm Gruissem, Stefan R Henz, Pierre Hilson, Sascha Laubinger, Naira Naouar, Andrea Patrignani, Stephane Rombauts, Huan Shu, et al. Agronomics1: a new resource for arabidopsis transcriptome proﬁling. Plant Physiology, 152(2):487–499, 2010
In this work Vmatch was used to calculate direct and reverse complementary matches of length 17 bp or greater with edit distance 1 or less between ﬁve nuclear chromosomes and mitochondrial and chloroplast genome sequences.
R. S. Sekhon, H. Lin, K. L. Childs, C. N. Hansey, C. R. Buell, N. de Leon, and S. M. Kaeppler. Genome-wide atlas of transcription during maize development. Plant J., 66(4):553–563, May 2011
In this work Vmatch was used to search probe sequences against the maize genome the cDNA sequences of the oﬃcial maize gene models.
M. Dassanayake, D. H. Oh, J. S. Haas, A. Hernandez, H. Hong, S. Ali, D. J. Yun, R. A. Bressan, J. K. Zhu, H. J. Bohnert, and J. M. Cheeseman. The genome of the extremophile crucifer Thellungiella parvula. Nat. Genet., 43(9):913–918, Sep 2011
In this work Vmatch was used for clustering sequences assembled from 454-reads of Thellungiella parvula, a model for the evolution of plant adaptation to extreme environments.
E. M. Willing, M. Hoﬀmann, J. D. Klein, D. Weigel, and C. Dreyer. Paired-end RAD-seq for de novo assembly and marker design without available reference. Bioinformatics, 27(16):2187–2193, Aug 2011
In this work Vmatch was used for grouping short reads into pools representing the same RAD tag.
L. Gao, Y. Zhou, Z.-W. Wang, Y.-J. Su, and T. Wang. Evolution of the rpoB-psbZ region in fern plastid genomes: notable structural rearrangements and highly variable intergenic spacers. BMC Plant Biology, 11(1):64, 2011
In this work Vmatch was used for detecting and clustering repetitive sequences in diverse fern plastid genomes.
D. B. Sloan, A. J. Alverson, J. P. Chuckalovcak, M. Wu, D. E. McCauley, J. D. Palmer, and D. R. Taylor. Rapid evolution of enormous, multichromosomal genomes in ﬂowering plant mitochondria with exceptionally high mutation rates. PLoS Biol., 10(1):e1001241, Jan 2012
In this work Vmatch was used to precisely deﬁne the boundaries of all repeats with 100% sequence identity.
Anuja Dubey, Andrew Farmer, Jessica Schlueter, Steven B Cannon, Brian Abernathy, Reetu Tuteja, Jimmy Woodward, Trushar Shah, Benjamin Mulasmanovic, Himabindu Kudapa, et al. Deﬁning the transcriptome assembly and its use for genome dynamics and transcriptome proﬁling studies in pigeonpea (Cajanus cajan l.). DNA research, 18(3):153–164, 2011
In this work Vmatch was used cluster sequences based on their six-frame translation.
Rachit K Saxena, R Varma Penmetsa, Hari D Upadhyaya, Ashish Kumar, Noelia Carrasquilla-Garcia, Jessica A Schlueter, Andrew Farmer, Adam M Whaley, Birinchi K Sarma, Gregory D May, et al. Large-scale development of cost-eﬀective single-nucleotide polymorphism marker assays for genetic mapping in pigeonpea and comparative mapping in legumes. DNA research, 19(6):449–461, 2012
In this work Vmatch was used to identify reciprocal best matches between the pigeonpea sequences and other legume sequences.
B. Z. Haznedaroglu, D. Reeves, H. Rismani-Yazdi, and J. Peccia. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinformatics, 13:170, 2012
In this work Vmatch was used for assembly clustering and optimization of contigs for Neochloris oleoabundans (a Chlorophyceae class green microalgae).
M. M. Martis, S. Klemme, A. M. Banaei-Moghaddam, F. R. Blattner, J. Macas, T. Schmutzer, U. Scholz, H. Gundlach, T. Wicker, H. Šimková, P. Novak, P. Neumann, M. Kubalakova, E. Bauer, G. Haseneyer, J. Fuchs, J. Dolezel, N. Stein, K. F. Mayer, and A. Houben. Selﬁsh supernumerary chromosome reveals its origin as a mosaic of host genome and organellar sequences. Proc. Natl. Acad. Sci. U.S.A., 109(33):13343–13346, Aug 2012
In this work Vmatch was used to match reads against a repeat library to identity the content of the repetitive DNA per sequence read.
K. L. Childs, R. M. Davidson, and C. R. Buell. Gene coexpression network analysis as a source of functional annotation for rice genes. PloS one, 6(7):e22196, 2011
In this work Vmatch was used to align individual probes to representative gene models.
E. I. Severing, A. D. J. van Dijk, and R. C. H. J. van Ham. Assessing the contribution of alternative splicing to proteome diversity in arabidopsis thaliana using proteomics data. BMC Plant Biology, 11(1):82, 2011
In this work Vmatch was used for performing exact searches with peptides against the ﬁltered proteome of A. thaliana.
P. Wolﬀ, I. Weinhofer, J. Seguin, P. Roszak, C. Beisel, M.T. Donoghue, C. Spillane, M. Nordborg, M. Rehmsmeier, and C. Köhler. High-resolution analysis of parent-of-origin allelic expression in the arabidopsis endosperm. PLoS Genet, 7(6):e1002126–e1002126, 2011
In this work Vmatch was used to map RNAseq reads, allowing up to two mismatches (option -h 2) and generating maximal substring matches that are unique in some reference dataset (option -mum cand).
D. J. Fleetwood, A. K. Khan, R. D. Johnson, C. A. Young, S. Mittal, R. E. Wrenn, U. Hesse, S. J. Foster, C. L. Schardl, and B. Scott. Abundant degenerate miniature inverted-repeat transposable elements in genomes of epichloid fungal endophytes of grasses. Genome Biol Evol, 3:1253–1264, 2011
In this work Vmatch was used to identify terminal inverted repeats of length range 10-65 bp, ≥ 80% identity, maximum inter-TIR distance 650 bp in in genomes of epichloid fungal endophytes of grasses.
K. L. Childs, K. Konganti, and C. R. Buell. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species. Database (Oxford), 2012:bar061, 2012
In this work Vmatch was used to match putative unique transcript sequence assemblies.
Y. Chen, B. J. Cassone, X. Bai, M. G. Redinbaugh, and A. P. Michel. Transcriptome of the plant virus vector Graminella nigrifrons, and the molecular interactions of maize ﬁne streak rhabdovirus transmission. PLoS ONE, 7(7):e40613, 2012
In this work Vmatch was used for reﬁning assemblies of Illumina reads in the context of a transcriptome project for plant virus vector Graminella nigrifrons.
N. M. Krishnan, S. Pattnaik, P. Jain, P. Gaur, R. Choudhary, S. Vaidyanathan, S. Deepak, A. K. Hariharan, P. B. Krishna, J. Nair, L. Varghese, N. K. Valivarthi, K. Dhas, K. Ramaswamy, and B. Panda. A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica. BMC Genomics, 13:464, 2012
In this work Vmatch was used for clustering repeats and for building a consensus repeat library in the context of genome and transcriptome projects for Azadirachta indica, a medicinal and pesticidal angiosperm.
Z. Liu, S. Kumari, L. Zhang, Y. Zheng, and D. Ware. Characterization of mirnas in response to short-term waterlogging in three inbred lines of zea mays. PLoS One, 7(6):e39786, 2012
In this work Vmatch was used to map unique consensus sequences tags to the maize reference genome and to predict targets of novel miRNAs.
A. Bousios, Y. A. I. Kourmpetis, P. Pavlidis, E. Minga, A. Tsaftaris, and N. Darzentas. The turbulent life of sirevirus retrotransposons and the evolution of the maize genome: more than ten thousand elements tell the story. The Plant Journal, 69(3):475–488, 2012
In this work Vmatch was used for masking Long Terminal Repeats in the Maize Genome Sequence.
In the papers
P. Hernandez, M. Martis, G. Dorado, M. Pfeifer, S. Galvez, S. Schaaf, N. Jouve, H. Šimková, M. Valarik, J. Dolezel, and K. F. Mayer. Next-generation sequencing and syntenic integration of ﬂow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J., 69(3):377–386, Feb 2012
R. Philippe, E. Paux, I. Bertin, P. Sourdille, F. Choulet, C. Laugier, H. Šimková, J. Šafář, A. Bellec, S. Vautrin, et al. A high density physical map of chromosome 1bl supports evolutionary studies, map-based cloning and sequencing in wheat. Genome Biol, 14(6):R64, 2013
Vmatch was used to mask repetitive DNA.
G. T. Howe, J. Yu, B. Knaus, R. Cronn, S. Kolpak, P. Dolan, W. W. Lorenz, and J. F. Dean. A SNP resource for Douglas-ﬁr: de novo transcriptome assembly and SNP detection and validation. BMC Genomics, 14:137, 2013
In this work Vmatch was used to cluster 40 010 assembled isotigs.
R. Karlova, J. C. van Haarst, C. Maliepaard, H. van de Geest, A. G. Bovy, M. Lammers, G. C. Angenent, and R. A. de Maagd. Identiﬁcation of microRNA targets in tomato fruit development using high-throughput sequencing and degradome analysis. J. Exp. Bot., 64(7):1863–1878, Apr 2013
In this work Vmatch was used to preprocess short reads in the context of identifying mircoRNA targets in tomato fruit development.
S. M. Gross, J. A. Martin, J. Simpson, M. J. Abraham-Juarez, Z. Wang, and A. Visel. De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana. BMC Genomics, 14:563, 2013
In this work Vmatch was used in an all-vs-all comparison to bin contigs into loci based on a minimum of 200 bp sequence overlap in the context of transcriptome assembly for two Agave-species.
U. Kanter, W. Heller, J. Durner, J. B. Winkler, M. Engel, H. Behrendt, A. Holzinger, P. Braun, M. Hauser, F. Ferreira, K. Mayer, M. Pfeifer, and D. Ernst. Molecular and immunological characterization of ragweed (Ambrosia artemisiifolia L.) pollen after exposure of the plants to elevated ozone over a whole growing season. PLoS ONE, 8(4):e61518, 2013
In this work Vmatch was used to align 454-reads to assembled isotigs for Ragweed pollen.
K. G. Kugler, G. Siegwart, T. Nussbaumer, C. Ametz, M. Spannagl, B. Steiner, M. Lemmens, K. F. X. Mayer, H. Buerstmayr, and W. Schweiger. Quantitative trait loci-dependent analysis of a gene co-expression network associated with fusarium head blight resistance in bread wheat (triticum aestivum l.). BMC Genomics, 14(1):728, 2013
In this work Vmatch was used for comparing gene sets.
Mihaela M Martis, Ruonan Zhou, Grit Haseneyer, Thomas Schmutzer, Jan Vrána, Marie Kubaláková, Susanne König, Karl G Kugler, Uwe Scholz, Bernd Hackauf, et al. Reticulate evolution of the rye genome. The Plant Cell, 25(10):3685–3698, 2013
In this work Vmatch was used to detect repetitive DNA content of chromosomal survey sequences from the Rye genome.
In the papers
D. Kopeckỳ, M. Martis, J. Číhalíková, E. Hřibová, J. Vrána, J. Bartoš, J. Kopecká, F. Cattonaro, Š. Stočes, Petr Novák, et al. Flow sorting and sequencing meadow fescue chromosome 4f. Plant Physiology, 163(3):1323–1337, 2013
D. Kopeckỳ, M Martis, J Číhalíková, E Hřibová, J Vrána, J Bartoš, et al. Genomics of meadow fescue chromosome 4f. Plant Physiol, 163:1323–1337, 2013
Vmatch was used for identifying repetitive DNA content in contigs of meadow fescue chromosome 4F assembled from Illumina short reads.
In the papers
F. Jay, Y. Wang, A. Yu, L. Taconnat, S. Pelletier, V. Colot, J.-P. Renou, and O. Voinnet. Misregulation of AUXIN RESPONSE FACTOR 8 underlies the developmental abnormalities caused by three distinct viral silencing suppressors in Arabidopsis. PLoS Pathog, 7(5):e1002035–e1002035, 2011
X. Wang, D. Weigel, and L. M. Smith. Transposon variants and their eﬀects on gene expression in arabidopsis. PLoS Genet, 9(2):e1003255, 2013
Vmatch was used for mapping siRNA sequences to the Arabidopsis thaliana genome.
E. Henaﬀ, C. Vives, B. Desvoyes, A. Chaurasia, J. Payet, C. Gutierrez, and J. M. Casacuberta. Extensive ampliﬁcation of the E2F transcription factor binding sites by transposons during evolution of Brassica species. Plant J., 77(6):852–862, Mar 2014
In this work Vmatch was used for the identiﬁcation of binding motifs.
W Wang, G Haberer, H Gundlach, C Gläßer, TCLM Nussbaumer, MC Luo, A Lomsadze, M Borodovsky, RA Kerstetter, J Shanklin, et al. The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nature Communications, 5, 2014
In this work Vmatch was used for masking one sequence set with another and for mapping miRNA sequences of all plant species present in a reference database to whole-genome assembly of Spirodela polyrhiza.
M. D. Logacheva, M. I. Schelkunov, M. S. Nuraliev, T. H. Samigullin, and A. A. Penin. The plastid genome of mycoheterotrophic monocot petrosavia stellaris exhibits both gene losses and multiple rearrangements. Genome biology and evolution, 6(1):238–246, 2014
In this work Vmatch was used for repeat detection.
X. Wang, W. Shi, and T. Rinehart. Transcriptomes That Confer to Plant Defense against Powdery Mildew Disease in Lagerstroemia indica. Int J Genomics, 2015:528395, 2015
In this work Vmatch was used to eliminate redundancies in assemblies of Illumina reads in the context of studying plant defense mechanisms.
H. Ashraﬁ, A. M. Hulse-Kemp, F. Wang, S. S. Yang, X. Guan, D. C. Jones, M. Matvienko, K. Mockaitis, Z. J. Chen, D. M. Stelly, et al. A long-read transcriptome assembly of cotton (l.) and intraspeciﬁc single nucleotide polymorphism discovery. The Plant Genome, 2015
In this work Vmatch was used for clustering to determine a non-redundant set of assembled contigs.
K. Ustyantsev, O. Novikova, A. Blinov, and G. Smyshlyaev. Convergent evolution of ribonuclease h in ltr retrotransposons and retroviruses. Molecular biology and evolution, 32(5):1197–1207, 2015
In this work Vmatch was used for clustering sequences based on their RT and aRNH domain.
M. Helguera, M. Rivarola, B. Clavijo, M. M. Martis, L. S. Vanzetti, S. González, I. Garbus, P. Leroy, H. Šimková, M. Valárik, et al. New insights into the wheat chromosome 4d structure and virtual gene order, revealed by survey pyrosequencing. Plant Science, 233:200–212, 2015
In this work Vmatch was used for identifying repeats in contigs assembled from 454-reads.
Qi Shen, Jun Yang, Chaolong Lu, Bo Wang, and Chi Song. The complete chloroplast genome sequence of perilla frutescens (l.). Mitochondrial DNA, preprint:1–2, 2015
In this work Vmatch was used for identifying inverted repeats in chloroplast genomes.
Bahman Panahi, Seyed Abolghasem Mohammadi, Reyhaneh Ebrahimi Khakseﬁdi, Jalil Fallah Mehrabadi, and Esmaeil Ebrahimie. Genome-wide analysis of alternative splicing events in Hordeum vulgare: Highlighting retention of intron-based splicing and its possible function through network analysis. FEBS letters, 589(23):3564–3575, 2015
In this work Vmatch was used to identify contaminations and repetitive elements by comparison of mRNA sequences to vector, bacterial and repeat databases.
SN Wolfenbarger, MC Twomey, DM Gadoury, BJ Knaus, NJ Grünwald, and DH Gent. Identiﬁcation and distribution of mating-type idiomorphs in populations of podosphaera macularis and development of chasmothecia of the fungus. Plant Pathology, 2015
In this work Vmatch was used to cluster contigs of diﬀerent assemblies into groups of homologous sequences.
Jun Yang, Chaolong Lu, Qi Shen, Yuying Yan, Changjiang Xu, and Chi Song. The complete chloroplast genome sequence of Fagopyrum cymosum. Mitochondrial DNA, pages 1–2, 2015
In this work Vmatch was used to identify inverted repeats in chloroplast genomes.

Usages in the Microbial Genome Research

The KPATH system, developed at the Lawrence Livermore National Laboratories, and described in
J.P. Fitch, S.N. Gardner, T.A. Kuczmarski, S. Kurtz, R. Myers, L.L. Ott, T.R. Slezak, E.A. Vitalis, A.T. Zemla, and P.M. McCready. Rapid development of nucleic acid diagnostics. Proceedings of the IEEE, 90(11):1708–1721, 2002
T. Slezak, T. Kuczmarski, L. Ott, C. Torres, D. Medeiros, J. Smith, B. Truitt, N. Mulakken, M. Lam, E. Vitalis, A. Zemla, C.E. Zhou, and S. Gardner. Comparative Genomics Tools Applied to Bioterrorism Defense. Brieﬁngs in Bioinformatics, 4(2):133–149, 2003
used Vmatch to detect unique substrings in large collection of DNA sequences. These unique substrings serve as signatures allowing for rapid and accurate diagnostics to identify pathogen bacteria and viruses. A similar application is reported in S.N. Gardner, T.A. Kuczmarski, E.A. Vitalis, and T.R. Slezak. Limitations of TaqMan PCR for Detecting Viral Pathogens I llustrated by Hepatitis A, B, C, and E Viruses and Human Immunodeﬁciency Virus. J. of Clinical Microbiology, 41(6):2417–2427, 2003.
N. Pobigaylo, D. Wetter, S. Szymczak, U. Schiller, S. Kurtz, F. Meyer, T.W. Nattkemper, and Becker A. Construction of a large signature-tagged mini-Tn5 transposon library and its application to mutagenesis of Sinorhizobium meliloti. Appl Environ Microbiol., 72(6):4329–4337, 2006
In this work Vmatch was used to map signature tags to the genome of S. meliloti.
The CRISPRFinder-program and the CRISPRdatabase, described in
I. Grissa, G. Vergnaud, and C. Pourcel. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res, 35(Web Server issue):W52–7, 2007
I. Grissa, G. Vergnaud, and C. Pourcel. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics, 8:172, 2007
used Vmatch to eﬃciently ﬁnd maximal repeats, as a ﬁrst step in localizing Clustered regularly interspaced short palindromic repeats (CRISPRs).
B. Voss, J. Georg, V. Schöon, S. Ude, and W. R. Hess. Biocomputational prediction of non-coding RNAs in model cyanobacteria. BMC Genomics, 10:123, 2009
In this work Vmatch was used to map predicted sequences to information about Rho-independent terminators provided by a speciﬁc database.
Jeremy Schmutz, Steven B Cannon, Jessica Schlueter, Jianxin Ma, Therese Mitros, William Nelson, David L Hyten, Qijian Song, Jay J Thelen, Jianlin Cheng, et al. Genome sequence of the palaeopolyploid soybean. Nature, 463(7278):178–183, 2010
In this work Vmatch was used to cluster DNA-sequences into families based on their six-frame translation.
Bob Zimmermann, Tanja Gesell, Doris Chen, Christina Lorenz, Renée Schroeder, and J Valcarcel. Monitoring genomic sequences during selex using high-throughput sequencing: neutral selex. PLoS One, 5(2):e9169, 2010
In this work Vmatch was used to align 454-sequences to the Ecoli-genome and to cluster the sequences.
Fabrice Touzain, Erick Denamur, Claudine Médigue, Valérie Barbe, Meriem El Karoui, Marie-Agnès Petit, et al. Small variable segments constitute a major type of diversity of bacterial genomes at the species level. Genome Biol, 11(4):R45, 2010
In this work Vmatch was used for detecting repeats in three bacterial species.
Klaus FX Mayer, Mihaela Martis, Pete E Hedley, Hana Šimková, Hui Liu, Jenny A Morris, Burkhard Steuernagel, Stefan Taudien, Stephan Roessner, Heidrun Gundlach, et al. Unlocking the barley genome by chromosomal and comparative genomics. The Plant Cell, 23(4):1249–1263, 2011
In this work Vmatch was used for masking repeats in 454-reads.
Smruti Pushalkar, Shrinivasrao P Mane, Xiaojie Ji, Yihong Li, Clive Evans, Oswald R Crasta, Douglas Morse, Robert Meagher, Anup Singh, and Deepak Saxena. Microbial diversity in saliva of oral squamous cell carcinoma. FEMS Immunology & Medical Microbiology, 61(3):269–277, 2011
In this work Vmatch was used to identify distal primers.
J. E. Breitenbach, K. S. Shelby, and H. JR Popham. Baculovirus induced transcripts in hemocytes from the larvae of heliothis virescens. Viruses, 3(11):2047–2064, 2011
In this work Vmatch was used for removing redundant transcripts assembled in an RNA-seq study based on Illumina reads for Heliothis virescens (tobacco budworm), infected with a virus.
LR Triplett, JP Hamilton, CR Buell, NA Tisserat, V. Verdier, F Zink, and JE Leach. Genomic analysis of xanthomonas oryzae isolates from rice grown in the united states reveals substantial divergence from known x. oryzae pathovars. Applied and Environmental Microbiology, 77(12):3930–3937, 2011
In this work Vmatch was used to search unassembled Illumina reads of US and African strains of Xanthomonas oryzae for evidence of transcriptional activator-like eﬀector sequences.
Vmatch is used as an integral part of the PriMUX software package described in
D. A. Hysom, P. Naraghi-Arani, M. Elsheikh, A. C. Carrillo, P. L. Williams, and S. N. Gardner. Skip the alignment: degenerate, multiplex primer and probe design using K-mer matching instead of alignments. PLoS ONE, 7(4):e34560, 2012
In this context Vmatch used for selecting multiplex compatible, degenerate primers and probes to detect diverse targets such as viruses.
K. S. Shelby and H. JR Popham. Rna-seq study of microbially induced hemocyte transcripts from larval heliothis virescens (lepidoptera: Noctuidae). Insects, 3(3):743–762, 2012
In this work Vmatch was used to identify redundant contigs from de novo exome assemblies.
B. L. Hurwitz and M. B. Sullivan. The Paciﬁc Ocean virome (POV): a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology. PLoS ONE, 8(2):e57355, 2013
In this work Vmatch was used to identify reads which have no common 20-mers with other reads in a context of a marine viral metagenome project.
X. Zhuo, M. Rho, and C. Feschotte. Genome-wide characterization of endogenous retroviruses in the bat Myotis lucifugus reveals recent and diverse infections. J. Virol., 87(15):8493–8501, Aug 2013
In this work Vmatch was used for clustering potential complete Endogenous retroviruses of the bat Myotis lucifugus into subfamilies.
In the three papers
B. L. Hurwitz, A. H. Westveld, J. R. Brum, and M. B. Sullivan. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses. Proc. Natl. Acad. Sci. U.S.A., 111(29):10714–10719, July 2014
B. L. Hurwitz, L. Deng, B. T. Poulos, and M. B. Sullivan. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ. Microbiol., 15(5):1428–1440, May 2013
J. R. Brum, B. L. Hurwitz, O. Schoﬁeld, H. W. Ducklow, and M. B. Sullivan. Seasonal time bombs: dominant temperate viruses aﬀect southern ocean microbial dynamics. The ISME journal, 2015
Vmatch was used for k-mer analysis in the context of diﬀerent marine metagenome projects.
C. J. Decker and R. Parker. Analysis of double-stranded rna from microbial communities identiﬁes double-stranded rna virus-like elements. Cell reports, 7(3):898–906, 2014
In this work Vmatch was used for k-mer analysis in the context of microbial communities.
J. Bengtsson-Palme, F. Boulund, J. Fick, E. Kristiansson, and D. G. Larsson. Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Front Microbiol, 5:648, 2014
In this work Vmatch was used in an iterative scheme to construct contigs from reads associated with resistance genes in the context of a shotgun metagenome project.
A Be Nicholas, James B Thissen, Shea N Gardner, Kevin S McLoughlin, Viacheslav Y Fofanov, Heather Koshinsky, Sally R Ellingson, Thomas S Brettin, Paul J Jackson, and Crystal J Jaing. Detection of Bacillus anthracis DNA in complex soil and air samples using next-generation sequencing. PloS one, 8(9), 2013
In this work Vmatch was used to match probe candidate sequences against viral sequences and the human genmome sequence.
Birgit Henrich, Madis Rumming, Alexander Sczyrba, Eunike Velleuer, Ralf Dietrich, Wolfgang Gerlach, Michael Gombert, Sebastian Rahn, Jens Stoye, Arndt Borkhardt, et al. Mycoplasma salivarium as a dominant coloniser of Fanconi anaemia associated oral carcinoma. PloS one, 9(3), 2014
In this work Vmatch was used to identify the species of the Streptococcaceae by comparing with Silva 115 release 16S reference sequence database.

Usages in General Web-Servers or Sequence Analysis Software

Since 2000, the RSA-tools, described in
J. van Helden, A.F. Rios, and J. Collado-Vides. Discovering Regulatory Elements in Non-Coding Sequences by Analysis of Spaced Dyads. Nucleic Acids Res., 28(8):1808–1818, 2000
and developed by Jacques van Helden use Vmatch to purge sequences before computing sequence statistics. Similar applications are reported in the following papers:
R.J.M. Hulzink, H. Weerdesteyn, A.F. Croes, M.M.A. Gerats, T. van Herpen, and J. van Helden. In Silico Identiﬁcation of Putative Regulatory Sequence Elements in the 5’-Untranslated Region of Genes That Are Expressed during Male Gametogenesis Gene Co-regulation. Plant Physiol., 132:75–83, 2003
N. Simonis, S.J. Wodak, G.N. Cohen, and J van Helden. Combining Pattern Discovery and Discriminant Analysis to Predict Gene Co-regulation. Bioinformatics, 20:2370–2379, 2004
N. Simonis, J. van Helden, G.N. Cohen, and S.J. Wodak. Transcriptional regulation of protein complexes in yeast. Genome Biology, 5:R33, 2004.
The program SpliceNest, described in
E. Coward, S.A. Haas, and M. Vingron. SpliceNest: Visualization of Gene Structure and Alternative Splicing Based on EST Clusters. Trends Genet., 18(1):53–55, 2002
computes gene indices and uses Vmatch to map clustered sequences to large genomes.
e2g is a web-based server which eﬃciently maps large EST and cDNA data sets to genomic DNA. The use of Vmatch allows to signiﬁcantly extend the size of data that can be mapped in reasonable time. e2g is available as a web service and hosts large collections of EST sequences (e.g. 4.1 million mouse ESTs of 1.87 Gbp) in a precomputed persistent index. For details see
J. Krüger, A. Sczyrba, S. Kurtz, and R. Giegerich. e2g: An interactive web-based server for eﬃciently mapping large EST and cDNA sets to genomic sequences. Nucleic Acids Res., 32:W301–W304, 2004.
The Bielefeld Bioinformatics Server provides the REPuter web-service to compute repeats in complete genomes. The service is based on Vmatch.
J. Fernandes, Q. Dong, B. Schneider, D.J. Morrow, G.-L. Nan, V. Brendel, and V. Walbot. Genome-wide mutagenesis of Zea mays L. using RescueMu transposons. Genome Biology, 5(10):R82, 2004
In this work Vmatch was used to (1) match 130 861 vector-trimmed sequences against the maize repeat database, and (2) to cluster near-identical sequences.
CrossLink, described in
T. Dezulian, M. Schaefer, R. Wiese, D. Weigel, and D.H. Huson. CrossLink: visualization and exploration of sequence relationships between (micro) RNAs. Nucleic Acids Res., 34(Web Server Issue):W400–W404, 200
is a versatile computational tool which aids in visualizing relationships between RNA sequences (particularly between ncRNAs and their putative target transcripts) in an intuitive and accessible way. Besides BLAST, CrossLink uses Vmatch to reveal the sequence relationships to be visualized.
The early version of the web-service Similarity matrix of Proteins (SIMAP), see
R. Arnold, T. Rattei, P. Tischler, M.-D. Truong, V. Stümpﬂen, and H.W. Mewes. SIMAP - The similarity matrix of proteins. Bioinformatics, 21(Suppl. 2):ii42–ii46, 2005
used Vmatch to locate the sequences in SIMAP which are similar to a given query. This is much faster than running BLAST.
Fiers, M.W.E.J. and Van de Wetering, H. and Peeters, T.H.J.M. and van Wijk, J.J. and Nap, J-P. DNAVis: interactive visualization of comparative genome annotations. Bioinformatics, 22(3):354–355, 2005
In this work Vmatch was used to compute similarities between genomes, which are then visualized by the program DNAVis.
In the paper
P.N. Seibel, J. Krüger, S. Hartmeier, K. Schwarzer, K. Löwenthal, H. Mersch, T. Dandekar, and R. Giegerich. XML schemas for common bioinformatic data types and their application in workﬂow systems. BMC Bioinformatics, 7:490, 2006
Seidel et. al. describe methods for creating web-services and give examples which, among other tools, also integrate Vmatch.
The program Gepard
J. Krumsiek, R. Arnold, and T. Rattei. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics, 23(8):1026–8, 2007
uses mkvtree to compute enhanced suﬃx arrays.
Vmatch is used a part of the transcriptome assembler software Rnnotator, described in
J. Martin, V. M. Bruno, Z. Fang, X. Meng, M. Blow, T. Zhang, G. Sherlock, M. Snyder, and Z. Wang. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 11:663, 2010
The BioExtract-Server described in
C. M. Lushbough, D. M. Jennewein, and V. Brendel. The bioextract server: a web-based bioinformatic workﬂow platform. Nucleic acids research, 39(suppl 2):W528–W532, 2011
uses Vmatch to remove duplicated sequences.
C. M. Lushbough, E. Z. Gnimpieba, and R. Dooley. Life science data analysis workﬂow development using the bioextract server leveraging the iplant collaborative cyberinfrastructure. Concurrency and Computation: Practice and Experience, 27(2):408–419, 2015
In this work Vmatch was used for removing duplicates in BlastP results. This use is part of a workﬂow in myexperiment.
Daniel Greuter, Alexander Loy, Matthias Horn, and Thomas Rattei. ProbeBase-an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016. Nucleic acids research, page gkv1232, 2015
In this work Vmatch was used for probe/primer search functionality in the probeBase database.

Current Usages in Human Genome Research

P.G. Buckley, C. Jarbo, U. Menzel, T. Mathiesen, C. Scott, S.G. Gregory, C.F. Langford, and J.P. Dumanski. Comprehensive DNA Copy Number Proﬁling of Meningioma Using a Chromosome 1 Tiling Path Microarray identiﬁes Novel Candidate Tumor Surpressor Loci. Cancer Res., 65(7):2653–2661, 2005
In this work Vmatch was used to reveal long repeats inside human chromosome 1 and long similar regions between human chromosome 1 and all other human chromosomes.
Liang, C. and Wang, G. and Liu, L. and Ji, G. and Liu, Y. and Chen, J. and Webb, J.S. and Reese, G. and Dean, J.F.D. WebTraceMiner: a web service for processing and mining EST sequence trace ﬁles. Nucleic Acids Res, 35(Web Server issue):W137–42, 2007
In this work Vmatch was used for Vector screening.
Sanne Nygaard, Anders Jacobsen, Morten Lindow, Jens Eriksen, Eva Balslev, Henrik Flyger, Niels Tolstrup, Søren Møller, Anders Krogh, and Thomas Litman. Identiﬁcation and analysis of mirnas in human breast cancer and teratoma samples using deep sequencing. BMC Medical Genomics, 2(1):35, 2009
In this work Vmatch was used for mapping short reads.
Christian Cole, Andrew Sobala, Cheng Lu, Shawn R Thatcher, Andrew Bowman, John WS Brown, Pamela J Green, Geoﬀrey J Barton, and Gyorgy Hutvagner. Filtering of deep sequencing data reveals the existence of abundant dicer-dependent small rnas derived from trnas. Rna, 15(12):2147–2160, 2009
In this work Vmatch was used for matching reads to sets of RNA sequences and the Human genome.
N. Cloonan, S. Wani, Q. Xu, J. Gu, K. Lea, S. Heater, C. Barbacioru, A. L. Steptoe, H. C. Martin, E. Nourbakhsh, et al. Micrornas and their isomirs function cooperatively to target common biological pathways. Genome Biol, 12(12):R126, 2011
In this work Vmatch was used to uniquely map miRNAs against the human genome.
K Takayama, S Tsutsumi, S Katayama, T Okayama, K Horie-Inoue, K Ikeda, T Urano, C Kawazu, A Hasegawa, K Ikeo, et al. Integration of cap analysis of gene expression and chromatin immunoprecipitation analysis on array reveals genome-wide androgen receptor signaling in prostate cancer cells. Oncogene, 30(5):619–630, 2011
In this work Vmatch was used to determine the positions of CAGE tags on the human genome.
Kevin CH Ha, Emilie Lalonde, Lili Li, Luca Cavallone, Rachael Natrajan, Maryou B Lambros, Costas Mitsopoulos, Jarle Hakas, Iwanka Kozarewa, Kerry Fenwick, et al. Identiﬁcation of gene fusion transcripts by transcriptome sequencing in BRCA1-mutated breast cancers and cell lines. BMC Medical Genomics, 4(1):75, 2011
In this work Vmatch was used to align sections of reads against RefSeq mRNA exon sequences.
Marie J Kidd, Zhiliang Chen, Yan Wang, Katherine J Jackson, Lyndon Zhang, Scott D Boyd, Andrew Z Fire, Mark M Tanaka, Bruno A Gaëta, and Andrew M Collins. The inference of phased haplotypes for the immunoglobulin h chain v region gene loci by analysis of vdj gene rearrangements. The Journal of Immunology, 188(3):1333–1340, 2012
In this work Vmatch was used to align sets of genes.
Ryonosuke Yamaga, Kazuhiro Ikeda, Joost Boele, Kuniko Horie-Inoue, Ken-ichi Takayama, Tomohiko Urano, Kaoru Kaida, Piero Carninci, Jun Kawai, Yoshihide Hayashizaki, et al. Systemic identiﬁcation of estrogen-regulated genes in breast cancer cells through cap analysis of gene expression mapping. Biochemical and biophysical research communications, 447(3):531–536, 2014
In this work Vmatch was used to determine the positions of CAGE tags on the human genome.

Current Usages for diﬀerent Model Organisms

A. Sczyrba, M. Beckstette, A.H. Brivanlou, R. Giegerich, and C.R. Altmann. Xendb: Full length cDNA prediction and cross species mapping in xenopus laevis. BMC Genomics, 2005
In this work Vmatch was used to cluster 317 242 EST and cDNA sequences from Xenopus laevis. Vmatch was chosen for the following reasons:
- At ﬁrst, there was no clustering tool available which could handle large data sets eﬃciently, and which was documented well enough to allow a detailed b replication and evaluation of existing clusters.
- Second, Vmatch identiﬁes similarities between sequences rapidly, and it provides additional options to cluster a set of sequences based on these matches. Furthermore, the Vmatch output provides information about how the clusters were derived. Due to the eﬃciency of Vmatch, it was possible to perform the clustering for a wide variety of parameters on the complete sequence set. This allows to study the eﬀect of the parameter choice on the clustering.
M. Spitzer, S. Lorkowski, P. Cullen, A. Sczyrba, and G. Fuellen. Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics, 7:110, 2006
In this work Vmatch was used to cluster EST-sequences of Xenopus laevis.
J.A. Eisen, R.S. Coyne, M. Wu, D. Wu, M. Thiagarajan, J.R. Wortman, J.H. Badger, Q. Ren, P. Amedeo, and K.M. Jones et al. Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote. PLoS Biology, 4(9):e286, 2006
In this work Vmatch was used to search exact repeats in the Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila.
G. J. Faulkner, A. R. Forrest, A. M. Chalk, K. Schroder, Y. Hayashizaki, P. Carninci, D. A. Hume, and S. M. Grimmond. A rescue strategy for multimapping short sequence tags reﬁnes surveys of transcriptional activity by CAGE. Genomics, 91(3):281–288, Mar 2008
In this work Vmatch was used for mapping
- 11 567 973 FANTOM3 mouse CAGE tags to the mouse genome with minimum match length of 18 bp, a single internal mismatch allowed, and multiple mismatches allowed at tag ends.
- Aﬀymetrix GNF probe sequences to transcripts without allowing for mismatches.
Jittima Piriyapongsa and I King Jordan. Dual coding of sirnas and mirnas by plant transposable elements. RNA, 14(5):814–821, 2008
In this work Vmatch was used to search small RNA signatures in entire miRNA gene sequences for Arabidopsis and rice.
R. J. Taft, E. A. Glazov, T. Lassmann, Y. Hayashizaki, P. Carninci, and J. S. Mattick. Small RNAs derived from snoRNAs. RNA, 15(7):1233–1240, Jul 2009
In this work Vmatch was used to map small RNA data sets onto the corresponding reference genomes for diﬀerent model organisms.
C. Plessy, G. Pascarella, N. Bertin, A. Akalin, C. Carrieri, A. Vassalli, D. Lazarevic, J. Severin, C. Vlachouli, R. Simone, et al. Promoter architecture of mouse olfactory receptor genes. Genome research, 22(3):486–497, 2012
In this work Vmatch was used for mapping Illumina reads to the mouse genome.
Nathan J Kenny and Sebastian M Shimeld. Additive multiple k-mer transcriptome of the keelworm Pomatoceros lamarckii (annelida; serpulidae) reveals annelid trochophore transcription factor cassette. Development genes and evolution, 222(6):325–339, 2012
In this work Vmatch was used for redundancy removal in the context of transcriptome assembly of a keelworm species.
Cene Gostin, Robin A Ohm, Tina Kogej, Silva Sonjak, Martina Turk, Janja Zajc, Polona Zalar, Martin Grube, Hui Sun, James Han, et al. Genome sequencing of four aureobasidium pullulans varieties: biotechnological potential, stress tolerance, and description of new species. BMC Genomics, 15(1):549, 2014
In this work Vmatch was used to remove redundant contigs in a genome project of four Aureobasidium pullulans varieties.
M. McMullan, A. Gardiner, K. Bailey, E. Kemen, B. J. Ward, V. Cevik, A. Robert-Seilaniantz, T. Schultz-Larsen, A. Balmuth, E. Holub, et al. Evidence for suppression of immunity as a driver for genomic introgressions and host range expansion in races of albugo candida, a generalist parasite. eLife, 4:e04550, 2015
In this work Vmatch was used for merging assemblies of Illumina sequenced cDNA.
C Morandin, K Dhaygude, J Paviala, K Trontti, C Wheat, and H Helanterä. Caste-biases in gene expression are speciﬁc to developmental stage in the ant formica exsecta. Journal of evolutionary biology, 28(9):1705–1718, 2015
In this work Vmatch was used to combine and scaﬀold contigs.

Total number of usages: 108

Availability

Vmatch is available for download in executable form for the following platforms:

Linux
Mac OS X
MS Windows

Developer

Vmatch was developed since May 2000 by Stefan Kurtz, a professor of Computer Science at the Center for Bioinformatics, University of Hamburg, Germany.

Important Documents

The Vmatch-manual