Aspergillus oryzae RIB40 (= NBRC 100959), or 'Koji-kin' in Japanese, is one of the filamentous fungi most widely used in fermentation industries in Japan. It is exploited in the production of sake, 'miso' (soybean paste), 'shoyu' (soy sauce) etc. and has been safely used in more than 1,000 years. Hence, it is often called 'fungus of the country'. It can be used for large-scale production of enzymes and other proteins and is regarded as an ideal host for the synthesis of active proteins of eukaryotic origins that cannot be achieved with E. coli.
The 37-megabase (Mb) genome of A. oryzae contains 12,074 genes and is expanded by 7-9 Mb in comparison with the genomes of Aspergillus nidulans and Aspergillus fumigatus. Comparison of the three Aspergillus species revealed the presence of syntenic blocks and A. oryzae-specific blocks (lacking synteny with A. nidulans and A. fumigatus) in a mosaic manner throughout the genome of A. oryzae. The blocks of A. oryzae-specific sequence are enriched for genes involved in metabolism, particularly those for the synthesis of secondary metabolites. Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.
2014-01-07 ..... 1
Important notice and apology -BLAST search against Aspergillus oryzae RIB40 genome
An error in BLAST search was discovered. Those who executed BLAST search against Aspergillus oryzae RIB40 during the period from May 2012 to January 2014 could have gotten inappropriate search result. We have fixed an error and we deeply apologize to all DOGAN users.
The genome analysis of Aspergillus oryzae RIB40 was carried out in collaboration with the National Institute of Advanced Industrial Science and Technology (AIST) and other members of the A. oryzae Genome Analysis Consortium as described below. A complete list of the consortium members will be presented at the bottom of this page.
Construction of a whole genome shotgun library
The A. oryzae RIB40 nuclear DNA was sheared randomly into fragments of approximately 2 kb in size using a HydroShear (GeneMachines) and blunted with BAL31 nuclease (Takara) and Klenow fragment (Takara), to which adaptors prepared by annealing 5'CGAGAGCGGCCGCTAC and 3'CTCGCCGGCGATG were attached. The vector pUC19 was digested with SalI restriction endonuclease, treated with calf intestine alkaline phosphatase (Takara) after its 3' termini were filled in with dT, and ligated with the fragments mentioned above at 16¡ëC overnight. The ligation mixture was used to transform DH10B cells (Invitrogen) by electroporation, yielding a whole genome shotgun (WGS) library.
Templates for sequencing were prepared from the WGS library clones by PCR amplification of the insert directly from bacterial colonies. High-quality sequences were accumulated by sequencing both ends of each insert on an ABI PRISM 3700 or a 3730XL capillary DNA sequencers (Applied Biosystems).
Assembly of sequence data
Phred/Phrap software was used to assemble the raw sequence reads. Only high-quality segments with phred scores higher than 20 with each read were used for assembly.
Construction of genomic scaffolds
Cosmid and bacterial artificial chromosome (BAC) clones were sequenced from both ends of the insert to validate sequence assembly as well as to combine WGS contigs. A cosmid library was prepared using SuperCos I (STRATAGENE) according to the manufacturer's instructions. A BAC library with an average insert length of 81 kb was prepared at Macrogen (Seoul, Korea).
Long sequence gaps between contigs were filled up by sequencing corresponding cosmid clones or DNA fragments obtained by PCR using the A. oryzae genomic DNA as a template and primers placed at the ends of contigs flanking the gaps, while short sequence gaps were filled up by sequencing appropriate bridging WGS clones if available.
The A. oryzae genome contains various types of long repeat sequences dispersed throughout the genome. To establish nucleotide sequence of each copy of the repeat sequences, cosmid or BAC clones covering individual repeat units were selected and sequenced after constructing shotgun library from the respective clone. The genome also contains a number of homopolymeric sequences, short tandem repeats, and sequences with strongly biased base compositions. Conditions for sequencing reaction were optimized for each type of such hard-to-read sequences.
Mapping scaffolds to chromosomes
Intact A. oryzae chromosomal DNA was prepared from protoplast and separated by electrophoresis using a CHEF-Mapper (Bio-Rad) apparatus. Scaffolds were linked to each chromosome by the chromosomal hybridization using probes prepared at the regions close to each terminus and the middle of a scaffold. Relative positions of the scaffolds were determined by the finger-printing method. For this purpose, probes prepared close to the terminus of the scaffold were used for Southern hybridization analysis against the A. oryzae genomic DNA digested with NotI, PmeI, PacI, BamHI, EcoRI, KpnI, SacI, Sall, SphI or XbaI.
Sequence assembly and genomic scaffolds were further validated by Optical Mapping (OpGen) using AflII restriction endonuclease. No discrepancy was found between the optical maps and the simulated cutting patterns of the A. oryzae chromosomes within the resolution of the method.
Gene Assignment and Annotation
Genes were predicted in the A. oryzae genome based on the homologies to known genes in the public databases, ESTs of A. oryzae and A. flavus, and the statistical features of the genes that were analyzed with a combination of three gene-finding software tools, ALN, GlimmerM and GeneDecoder. Candidates for homologues of known fungal proteins were evaluated by ALN, which predicts the precise gene structure by aligning the Blast hits and the protein sequence encoded by the gene. ALN takes into account frameshift errors, coding potentials and signals for translational initiation, termination and splicing. Of the 6,586 genes thus predicted by ALN, 489 highly reliable genes were adopted to construct a learning set for GeneDecoder and GlimmerM software that work based on the statistical features of genes. GeneDecoder integrates the information for splice sites provided by the ESTs, which are aligned with the genome sequence by SIM4.
All of the predicted protein-coding genes were annotated by searching against the KOG database using BLASTP, followed by manual corrections.
Transfer RNAs were identified using tRNAScan-SE.
Repeat sequences were detected using RepeatMasker.
A list of the consortium members:
Brewing Society of Japan, Axiohelix Pvt. Ltd, Amano Enzyme Inc., INTEC Web and Genome Informatics Corporation, Ozeki Corporation, Kikkoman Corporation, Kyowa Hakko Kogyo Co. Ltd., Gekkeikan Sake Company Ltd., Higeta Shoyu Co. Ltd., Tohoku University, The University of Tokyo, Tokyo University of Agriculture and Technology, Nagoya University, National Research Institute of Brewing (NRIB), National Food Research Institute (NFRI), and National Institute of Advanced Industrial Science and Technology (AIST).