The Structure and Function of the Expressed Portion of the Wheat Genomes

Distribution of research investigators by objective Obj. 6Genomestructure &evolution Objectives and coordination structure Obj. 1cDNAlibraries Obj. 2cDNAsequencing Obj. 4Functionalgenomics Obj. 5Bioinfor-matics The Structure and Functionof the Expressed Portionof the Wheat Genomes Obj. 3Mapping Investigator OD Anderson X* X* X X* PI & Project Coordinator: Calvin QualsetProject Manager: Patrick McGuire UCDavis/ARS TJ Close X X UCRiverside Objectives 1 and 2.EST ProductionCoordinator:Olin Anderson Objective 5.BioinformaticsCoordinator:Olin Anderson HT Nguyen X X X Texas Tech U BS Gill X* X X EST Arrays Kansas State U DeletionMapping ME Sorrells X X* X SAGE Cornell U ComparativeMapping J Dvořák X X X* SequenceMatching UCDavis Obj. 5. Bioinformatics J Dubcovsky X X Obj. 4.FunctionalGenomics Obj. 3.Mapping UCDavis Objs. 1 & 2.EST Production KS Gill X X U Nebraska JP Gustafson cDNA librariesScreening/normalizationsSequencingData analysisDNA storage/distribution X X U Mo/ARS Objective 3.MappingCoordinator:Bikram Gill SF Kianian X X N Dak State U Objective 4.Functional GenomicsCoordinator:Mark Sorrells JA Anderson X U Minn NLV Lapitan X Colo State U Obj. 6.GenomeStructure &Evolution K Walker-Simmons X Wash State U/ARS * designates coordinator for the corresponding objective. Introduction ProjectDBI-9975989 Objective 6.Genome Structure & EvolutionCoordinator:Jan Dvořák Objectives, approaches, and status Training The project’s goal is to generate and map a large number (target is 10,000) of unique DNA sequen-ces from the genomes of bread wheat. The assumption is that these unique DNA sequences will correspond to individual genes of wheat and their identification is a first step in determining the function of these genes. The ultimate use of this information is the improvement of wheat quality, yield, and adaptability to new and marginal environ-ments, thus increasing production. Wheat is the most widely grown crop and the third in economic significance for the United States. The US is the largest wheat exporter in the world and, to maintain this market, continuous genetic improve-ment of the crop is required. Recent advances in plant genetics and genomics offer unprecedented opportunities for discovering the function of genes and potential for their manip-ulation for crop improvement. Because of the large size of the wheat genomes, it is unlikely that the actual base-pair sequences of the DNA molecules will be learned completely in the near future. This project takes an alternative strategy to realize the benefits of new techniques for discovering genes and learning their function (functional genomics). Following the identification of 10,000 unique wheat DNA sequences (termed ESTs, Expressed Sequence Tags), they will be mapped to their physical location on the chromosomes of wheat using a set of terminal deletion stocks. By the end of the mapping component of this project, a most valuable tool will have been produced: 10,000 unique DNA sequences, likely corresponding to genes, whose physical locations in the chromo-somes of wheat are known. This sets the stage for the next phase, the analysis of this array of mapped ESTs to determine function. The project will focus on characteristics of the wheat reproductive stages, from flowering signals through seed development and dormancy. The information gathered on the sequence, function, and position of these genes in the wheat chromosomes will be publicly available and will be distributed by means of the website created for this project. Because of the close relationship of wheat to other species in the Triticeae tribe and other grass species, especially corn and rice, the results from this project will be immediately applicable to other crops in the Triticeae. Most of the collaborating investigators are already collaborating members of the International Triticeae Mapping Initiative which has produced molecular genetic maps of the chromosomes of wheat and related species. The diversity of experimental techniques and traits pursued in the individual laboratories collaborating on this project will be an ideal training ground for graduate students and postdoctoral scientists. The large pool of well-characterized and mapped unique DNA sequences, available in the public domain will be an exceedingly important resource for future Triticeae research and basic functional genomics research. Among the protocols developed in Albany is the performance of large-scale EST data processing and analysis. It involves using computer programs readily available from other public institutions and scripts written in-house. Briefly, as raw sequence data are generated from the sequencer, phred, a base-calling program from the University of Wash-ington at Seattle, is used to convert the raw data file into fasta, quality scores and histogram files. Any read with a phred-quality score of less than 20 (>1% error rate) is considered ambiguous and is trimmed off from both ends of a sequence. Search of the vector sequence is then carried out and then removed from each sequence followed by manual sequence editing. A set of guidelines has been developed to ensure the consistency of the editing process. The processed 5' EST data are then uploaded into the project website at http://wheat.pw. usda.gov/NSF/data for public access. The raw sequence data and trace files are also available publicly. The number of processed 5' ESTs is normally updated and reported twice a week using a thermometer graphic to indicate current progress at http://wheat.pw.usda.gov/NSF/progress.html. Large-scale EST assembly analysis was done using a computer program called phrap from the University of Washington at Seattle. The number of singleton candidates gathered both from within-library comparisons and from overall among-library comparisons are updated and reported weekly and can be found at http://wheat.pw.usda.gov/NSF/ libraries.html. With respect to bioinformatics personnel, a Data Curator (S. Chao) has been in place since January and a very promising candidate has been selected and offered the position of Bioinformatics Program-mer initially to evaluate available software for EST sequence data distribution, display, and analysis and implementation of the best procedures for the project. 1. To produce cDNA libraries from as many tissue and condition combinations as possible. templates. Final steps will include preparative PCR to ensure that sufficient amounts of each substan-tiated product are available both for distribution and archiving purposes. PCR products distributed to the mapping labs are in a 96-well plate format, which includes four cDNA clones which have already been mapped as con-trols. At the moment, 96-well plates have been dis-tributed to most of the nine mapping labs to carry out the pilot deletion mapping study. Following this release, the system will be geared up to allow high-throughput selection of representative clones from contigs and candidate singletons which match NCBI database entries. (2) Deletion stocks and mapping labs. Each of the nine mapping labs is now staffed and equipped, ready to begin high-throughput deletion mapping. All have a common set of deletion lines with a standardized order including controls for the map-ping blots. A procedure for reporting and presenting the mapping data is in place. The mapping process will be further revised and improved as results emerge from the initial mapping studies. ling. In order to further understand if a certain library contains higher number of sequences that are not present in any other libraries, thus a good source for gene discovery, EST assembly analysis was carried out among libraries. This analysis has indi-cated that among the over-35,000 ESTs generated so far, 18,000 are singleton candidates. More analysis is underway to characterize and identify unique gene sequences among them. Currently over 35,000 processed 5' ESTs are avail-able on the project website for public access (http://wheat.pw.usda.gov/NSF/data). All the ESTs have been submitted to GenBank, a national DNA sequence database, to provide the science com-munity with a resource for sequence search and retrieval. Approach: Produce multiple cDNA libraries from mRNAs isolated in several labs. Our aim is a target of 30 total libraries after quality testing, normalizing and subtraction to reduce redundancy; and screen-ing to maximize relative abundance of different sequences. Status: Thirty cDNA libraries are now available to the project. Twenty were made at T. Close’s lab at the University of California, Riverside, four were from H. Nguyen’s lab at Texas Tech University (including two normalized libraries and two subtrac-ted libraries), and six were from other sources. The tissue source used for cDNA library construction included spikes sampled at various developmental stages, anther, embryo, endosperm, young seed-ling, root, crown, and flag leaf and sheath. The tissues were sampled under various treatments, such as drought stress, cold stress, salt stress, aluminum stress, ABA treatment, and vernalization. (A list of cDNA libraries constructed at UC-Riverside is available as a hand-out.) The two normalized libraries were made using etio-lated shoot and root cDNA libraries as the starting materials. The two subtracted libraries are cold-stressed and drought-stressed libraries subtracted against an equal mix of nonstressed shoot and root cDNAs. 3. To map into wheat deletion stocks a set of 10,000 unique ESTs. Approach: Map EST singletons into bins defined by wheat deletion stocks; target is 10,000 mapped singletons. Status: (1) Singleton identification pilot study was initiated in early June 2000. The first candidate pool in-cluded all NSF-supported hexaploid wheat ESTs, as well as ESTs derived from O. Anderson’s endosperm library. These sequences were assem-bled first. Out of 12,637 reads submitted, 2189 contigs and 5445 singletons emerged. The single-tons were run in a BLAST against the nonredundant EST database at NCBI. While most ESTs matched existing entries (with varying levels of confidence), approximately 800 fell out with no match. From this set of 800 previously uncharacterized ESTs, 96 were selected from each of four different cDNA libraries to act as a pilot group. The plasmid clones were re-arrayed from their original bacterial culture archive plates into a new master plate. They were grown in 96-well deep block plates and plasmids were isolated. The first analysis step consisted of sequencing the plasmids from the 5' end to confirm their identities as present in the database. Sequences of record were assem-bled with newly generated sequences, and the reported contigs reviewed for inconsistencies. All candidates were subsequently sequenced from the 3' end to identify redundant clones; when these 3' sequences were assembled, only 13 formed contigs. The possibility that the candidate singletons might include transposon-related and other multi-copy elements which could be problematic in mapping has been addressed. Radiolabeled total wheat DNA was hybridized to nylon membranes grids of the bacterial clone DNAs. Six out of 384 gave detec-table signal and were removed from consideration. A generic PCR protocol was developed and run on the plasmid templates to obtain inserts. As a quality control measure, all PCR products were sequen-ced; phrap assembly will be used to confirm fidelity of PCR products with their parent plasmid Table 1. Evaluation of cDNA library quality and the extent of library redundancy. 4. To determine the functional activity of the mapped ESTs relevant to reproductive biology of wheat. Approach: Analyze with respect to function the arrays and glass-slide microarrays of the mapped EST singletons in 10 labs focusing on five aspects of wheat reproduction. 2. To determine the base-pair sequence of these cDNAs, yielding ESTs. Approach: Carry out in-house, single-site 5' sequencing of approx. 3000 clones in each of the 30 libraries, with 3' sequencing of putative single-tons and quality control at all stages. Status: Ten of the project’s labs will be involved in this objective. Strategies and technologies for microarray production and analysis are being studied in anticipation of the availability of mapped singleton ESTs. The individual labs are producing relevant cDNA libraries for challenging arrays with respect to the project’s target of the five stages of wheat reproductive biology. Status: Sequencing has been carried out at O. Anderson’s lab, Albany CA. 5'-sequenced ESTs have been generated from twenty-two of the libraries (Table 1). Initially, about 1,000 ESTs from each library were sequenced in order to evaluate library quality and complexity. Libraries with good quality and high complexity were then targeted for deep sequencing. Library quality was evaluated based on factors including (1) number of empty clones or clones con-taining vector sequence or short adaptor sequence only, (2) number of clones containing ribosomal RNA sequence contamination, (3) number of clones with reversed orientation (most of the libraries were made with cDNAs cloned in the fixed direction). Library complexity was evaluated based on the level of clone redundancy using the method of comparing all the 5' ESTs within each library. ESTs are considered redundant if they show a degree of similarity and overlapping with other ESTs. These ESTs can be grouped and assembled together into a contig. Representatives from each contig and those ESTs not forming contigs are singleton can-didates. Those libraries exhibiting the highest pro-portions of singleton candidates are considered to be of higher complexity, thus worth extensive samp- 6. To analyze gene density and distribution of mapped ESTs and thus genes in the wheat genomes (genome structure and evolution). Approach: Analyze densities and distribution of related genes determined from deletion map locations combined with functionality. 5. To process, analyze, and display data accumulated in this project (bioinformatics). Status:No activity for this objective is expected until sufficient data accumulate in the third and fourth years. Approach: Develop and enhance means to ana-lyze, interpret, and visualize project data (data pro-cessing, database modifications, and web page maintenance). Status:A web site dedicated to this project was established at http://wheat.pw.usda.gov/NSF. Its purpose is to disseminate to the science community such information as a detailed project description, lab protocols, cDNA library information, progress, and a roster of all the personnel involved in the work. The site also provides the general public with links to various education sites. In addition to this site, a password-protected site has been created which provides project members with a forum for discussion of strategies, techniques, and data as well as detailed information about material sharing and transfer. In addition to the work experience each postdoc and graduate student hired by the various investiga-tors in the project will get in their individual labs, lab rotations will be scheduled to bring these workers to other labs where more specialized activ-ity may be occurring. For example, in year one, the T. Close lab (UC-Riverside) was responsible for the bulk of the cDNA library production from tissue and mRNAs contributed by other investigators. For the period July 19 to August 3, 2000, Dr. Close held a training session in cDNA library production that was attended by postdocs and graduate students from four investigator labs. A similar opportunity in se-quencing will be offered by the O. Anderson lab in the coming year.

The Structure and Function of the Expressed Portion of the Wheat Genomes

The Structure and Function of the Expressed Portion of the Wheat Genomes

Presentation Transcript

Structure and Function of the Kidney

Structure and Function of the Eye

The Structure and Function of Macromolecules

The Structure and Function of Macromolecules

The Structure and Function of Synapses

The Structure and Function of Macromolecules

The Structure and Function of the CELL!

The Structure and Function of Macromolecules

The Structure and Function of the Expressed Portion of the Wheat Genomes

Structure and Function of the Heart

Structure and Function of the Cell

The Structure and Function of Blood

Structure and Function of the Cell

The Structure and Function of DNA

Structure and Function of the Eye

The Structure and Function of Macromolecules

Structure and Function of the Flower

Structure and Function of the Cell

THE STRUCTURE AND FUNCTION OF MACROMOLECULES

Structure and Function of the Heart

Structure and Function of the Cell

The Structure and Function of Macromolecules