Release of the Medicinal Plant Consortium Transcriptome Resources

The Medicinal Plant Consortium is a NIH supported project (GM092521) consists of 13 collaborating research units from 7 institutions focused on providing transcriptomic and metabolomic resources for 14 key medicinal plants to the worldwide research community for the advancement of drug production and development.

The Principal Investigators of the MPC are Cornelius Barry1, C. Robin Buell8, Joe Chappell2, Dean DellaPenna1, Natalia Dudareva3, Mahmoud A. ElSohly4, A. Daniel Jones1, Ikhlas A. Khan4, Tom McKnight5, Basil J. Nikolau6, Sarah E. O’Connor7, Troy Smillie4, and Eve Syrkin Wurtele6

1Michigan State University, East Lansing, MI; 2University of Kentucky, Lexington, KY; 3Purdue University, West Lafayette, IN; 4University of Mississippi, Oxford, MS; 5Texas A&M University, College Station, TX; 6Iowa State University, Ames, IA; 7Massachusetts Institute of Technology, Cambridge, MA; 8University of Georgia, Athens, GA

This announcement from the MPC pertains to the scheduled release of version 1 transcriptome data on March 21, 2011. The initial release will include the deposit of sequence read datasets to the NCBI Sequence Read Archive (SRA) and provide web access to a BLAST search portal available at

Sequence reads are being deposited in the NCBI SRA for immediate release. However, due to the nature of how these deposits are processed at NCBI SRA, direct access may take 4-6 weeks from March 21, 2011. The accession numbers for the sequence datasets for each species will be provided on the project website ( as they are provided by the NCBI SRA.

The currently available BLAST server window ( provides searching capability for the version 1 transcriptome assembles for the following plant species:

  • Atropa belladonna sequence assembly (MSU v1)
  • Catharanthus roseus sequence assembly (MSU v1)
  • Digitalis purpurea sequence assembly (MSU v1)
  • Dioscorea villosa sequence assembly (MSU v1)
  • Echinacea purpurea sequence assembly (MSU v1)
  • Hoodia gordonii sequence assembly (MSU v1)
  • Hypericum perforatum sequence assembly (MSU v1)
  • Panax quinquefolius sequence assembly (MSU v1)
  • Rauvolfia serpentina sequence assembly (MSU v1)
  • Rosmarinus officinalis sequence assembly (MSU v1)
  • Valeriana officinalis sequence assembly (MSU v1)

Release of transcriptome resources for Camptotheca acuminata, Cannabis sativa, and Ginkgo biloba will be made available as soon as these resources are completed and qualified, but no later than Aug. 31, 2011.

The version 1 transcriptome resources are derived from paired-end Illumina based sequencing of RNAs combined from multiple tissue samples (typically 5 tissues) of a single REFERENCE plant (see Table). The majority of the cDNA libraries were normalized to reduce the abundance of highly expressed transcripts. The cDNA libraries were then sequenced on an Illumina Genome Analyzer II, derived sequences cleaned (vector, adapter, low quality, short sequences removed), and assembled using Velvet (short read assembly program)/Oases (program using the Velvet assemblies to identify transcript isoforms).

The version 1 transcriptome resources may be queried using a NCBI-like BLAST search window (i.e. blastn, tblastx) at For example, a tblastn search of the Digitalis purpurea transcriptome (v. 1) with the GenBank protein sequence of accession AAB08578 identifies several matches. For example, “dpa_locus_1325_iso_2_len_1456_ver_1” is a contig or mRNA species (locus 1325) from Digitalis purpurea (dpa), and this is isoform 2 of this locus comprised of 1,456 bases from assembly version 1.

Each assembled contig is labeled with a unique identifier that reflects the species, the locus, the specific isoform, the length of the assembly, and the version of the assembly. One may obtain the assembled DNA sequence for a locus by using the DNA sequence retrieval function found as a separate function at by pasting the locus ID (i.e. dpa_locus_1325_iso_2_len_1456_ver_1) into the retrieval search window.

Note that the isoforms represent distinct or unique mRNAs as determined by Oases. These may reflect genuine isoforms but may also represent alleles, errors in the DNA sequence or assembly process. These are first pass assemblies and errors include gaps, insertions, deletions, frameshifts and incorrect DNA base calls. While all reasonable efforts to capture mature mRNA transcripts have been applied, mRNAs not fully processed (i.e. containing introns) and small amounts of genomic/organellar DNA may appear in these transcriptomes as well.Therefore, one should independently confirm sequence information (i.e. independent reverse transcription-PCR) of any locus identified in these assembled transciptomes.

Future refinements in the assembled transcriptome resources are anticipated and will be released as version 2 as these become available and have passed quality control assessments, but no later than Aug. 31 at ( Along with the version 2 transcriptome assemblies, an expression matrix for each plant species reflecting quantitative expression levels for each transcript in up to 20 different tissues or tissue treatments will also be released.

Metabolome datasets for these same 14 medicinal plants are scheduled for release beginning in Summer 2011 and to be completed by Aug. 31, 2011.

To assure receipt of future MPC notices of data release and announcements, please register at the Mailing List sign-up form on the project website (

Further inquiries may be directed to Joe Chappell at