small fontnormal fontlarge fontmail to

Rhodococcus erythropolis PR4 (= NBRC 100887)

close allopen/close all

close this sectionAbout this Microorganism

Photo by Dr. Otoguro, Dr. Tamura (NBRC, NITE)

The genus Rhodococcus is a diverse group of bacteria commonly found in many environments from soil to seawater. They are Gram-positive, high G+C content, coryneform bacteria belonging to the order Actinomycetales. Many strains of Rhodococcus bacteria show remarkable metabolic versatility, including their ability to degrade a variety of xenobiotic compounds such as polychlorinated biphenyls (PCBs). Some strains are known to produce biosurfactants and still others are source of useful enzymes such as phenylalanine dehydrogenase and endoglycosidases. Because of these characteristics, Rhodococcus bacteria are assumed to be industrially important.

Rhodococcus erythropolis PR4 (= NBRC 100887) has been isolated from the deep sea at a depth of 1,000 m in south of Okinawa Island, Japan (the Pacific Ocean). This strain can utilize n-alkanes of C8 to C20, alkylbenzenes, and pristine (2,6,10,14-tetramethylpentadecane) as sources of carbon and energy. It can also produce a large quantity of extracellular polysaccharides (EPSs), which are assumed to play a crucial role in its tolerance to a variety of organic solvents.

The complete genome consisted of one circular chromosome (6,516,310 bp; GC content 62.31%), one linear plasmid (pREL1: 271,577 bp), and two circular plasmids (pREC1: 104,014 bp; and pREC2: 3,637 bp). A total of 6,437 ORFs were predicted on the chromosome and three plasmids. The chromosome and linear plasmid encode many genes involved in the degradation of alkanes. In addition, genes responsible for the degradation of intermediates in the catabolism of aromatic compounds, such as protocatechuic acid and catechol, are clustered on the chromosome. The genome also contains a number of genes for secondary metabolism and EPS biosynthesis.

This work was supported by the New Energy and Industrial Technology Development Organization (NEDO)

close this sectionProject history

close this date 2009-05-09 ..... 1
2009-05-09 Rhodococcus erythropolis PR4 database was updated (We changed EC number of several ORFs)
imageList of ORFs updated in annotation
RER_36500 -

close this sectionSummary of the genomic data

Genomic size 6,895,538 bp
G+C content 62.31 %
Number of ORFs assigned 6,437
Percentage of the coding regions 91.41 %
Percentage of the intronic regions 0.00 %
Number of rRNA genes 15
Number of tRNA genes 54
Number of other features

close this sectionGeneral Procedure

The nucleotide sequence of the R. erythropolis B4 genome was determined by the whole genome shotgun sequencing method as in the case of other organisms analyzed at NITE-DOB.

General Procedure
  • DNA shotgun library
    DNA shotgun library with inserts of 2-5 kb in pUC118 vector (TAKARA) was constructed.

  • Cosmid library
    A Cosmid library with inserts of 40 kb in the SuperCos-1 cosmid vector was constructed using the SuperCos1 Cosmid Vector Kit (STRATAGENE).

  • Nucleotide sequencing
    Plasmid clones were end-sequenced using dye-terminator chemistry on an ABI PRISM3700 sequencer (ABI).
    Cosmid DNA was extracted from E. coli transformants using the Montage BAC96 MiniPrep Kit (Millipore) and end-sequencing was carried out using dye-terminator chemistry on ABI PRISM3700.
    Raw sequence data corresponding to approximately 10-fold coverage were assembled using PHRED/PHRAP/CONSED software (

  • Gap closing
    Cosmid end sequences were mapped onto the assembled sequence.
    Cosmid clones that link two contigs were selected and sequenced by primer walking to close gaps.
    The sequencing of difficult templates was performed using the CUGA Sequencing Kit (Nippon Genetech).

  • Validation of the assembled sequence data
    From the final nucleotide sequence, PCR primer sequences were generated at appropriate intervals throughout the genome which were then used to amplify the corresponding genomic regions. The restriction enzyme digestion patterns of each of the PCR fragments thus obtained were accordingly compared with those deduced from the sequence data of the regions to validate the correctness of the assembled sequence data.

Genome analysis and annotation
  • Putative nontranslated genes were identified using the Rfam and tRNAscan-SE programs, whereas rRNA genes were identified using the BLASTN program.

  • For the identification of protein-coding genes, the genome sequence was translated in six frames to generate potential protein products of open reading frames (ORFs) longer than 90 bp, with ATG, GTG and TTG considered as potential initial codons.

  • The potential protein sequences were compared with the UniProt databases using the BLASTP program.

  • Potential protein sequences that showed significant similarities to known protein sequences in the database were selected.

  • The start sites were manually inspected and altered in comparison to the prediction obtained by GLIMMER.

  • These predicted ORFs were further evaluated using the Frameplot program.

  • The translated sequences of the predicted protein-coding genes were searched against the nonredundant UniProt database (version 14.0) and the protein signature database, InterPro version 18.0.

  • The KEGG database was used for pathway reconstruction.

  • Signal peptides in proteins were predicted using SignalP, whereas transmembrane helices were predicted using TMHMM.

close this sectionRelated links to external databases