Photo by Dr. Masai in Nagaoka University of Technology
Lignin is the most abundant aromatic compound in nature. Biodegradation of lignin in lignocellulosic biomass is thought to be initiated by white rot fungi, and the resultant low-molecular-weight products are further degraded and utilized by soil bacteria. Sphingobium sp. SYK-6 is a bacterium isolated from the wastewater of a kraft pulp mill, and is one of the best-characterized degraders of lignin-derived aromatic compounds, being able to utilize various lignin-derived biaryls, including beta-aryl ether, biphenyl, and diarylpropane, as sole sources of carbon and energy.
Genome analysis of Sphingobium sp. SYK-6 revealed one circular chromosome (4,199,332bp, 65.57% G+C, 3,913 ORFs, 50 tRNAs, 2 rRNA operons) and one circular plasmid pSLGP (148,801bp, 64.40% G+C, 150 ORFs). Genes known to be involved in the degradation of lignin-derived aromatic compounds were located in at least 10 different loci scattered across the chromosome. In addition, many transporter genes possibly involved in the uptake of a variety of aromatic compounds were predicted. Comparison with Sphingobium japonicum UT26S, whose genome sequence has previously been determined, about 56% of total ORFs in Sphingobium sp. SYK-6 were orthologous to those of Sphingobium japonicum UT26S. Despite the poor conservation of synteny between these two strains, about 120 ORFs on the plasmid pSLGP were found to be almost identical to those on a chromosomal region of Sphingobium japonicum UT26S, suggesting the plasmid-mediated gene transfer between these sphingomonads.
Project history
Last sequence update:
Last annotation update: 2011-06-27
2011-09-01
Release of the Sphingobium sp. SYK-6 genomic data
We published the genomic data of Sphingobium sp. SYK-6 (= NBRC 103272).
Summary of the genomic data
Genomic size
4,348,133 bp
G+C content
65.53 %
Number of ORFs assigned
4,063
Percentage of the coding regions
89.29 %
Percentage of the intronic regions
0.00 %
Number of rRNA genes
6
5S
16S
23S
2
2
2
Number of tRNA genes
50
Ala
Arg
Asn
Asp
Cys
Gln
4
4
1
1
1
2
Glu
Gly
His
Ile
Leu
Lys
2
3
1
2
5
2
Met
Phe
Pro
Ser
Thr
Trp
4
1
4
5
3
1
Tyr
Val
1
3
Number of other features (misc_RNA,misc_feature,repeat)
The nucleotide sequence of the Sphingobium sp. SYK-6 genome was determined by the whole genome shotgun sequencing method as in the case of other organisms analyzed at NITE Biotechnology Center.
General Procedure
DNA shotgun libraries
DNA shotgun libraries with inserts of 1.5 and 5 kb in pUC118 vector (TAKARA) was constructed.
Fosmid library
A Fosmid library with inserts of 40 kb in the pCC1FOS fosmid vector was constructed using the CopyControl Fosmid Library Production Kit (Epicentre).
Nucleotide sequencing
Plasmid and Fosmid clones were end-sequenced using dye-terminator chemistry on an ABI Prism 3730 sequencer (ABI).
Sequence reads were trimmed at a threshold quality value of 20 by Phred and assembled using PHRAP/CONSED software (http://www.phrap.org).
Gap closing
Fosmid end sequences were mapped onto the assembled sequence.
Fosmid clones that link two contigs were selected and sequenced by primer walking to close any gaps.
In some cases, Fosmid clones were subcloned by insertion of Entranceposon using Template Generation System II Kit (Finnzymes) and sequenced.
Validation of the assembled sequence data
In construction of final nucleotide sequence, low-quality regions with a Phrap quality score of less than 40 were re-sequenced and verified. Finally, each base of genome was successfully ensured to be sequenced from Phrap quality value more than 40.
Gene identification and annotation
Putative non-translated genes were identified using the Rfam, tRNAscan-SE and ARAGORN programs.
The prediction of open reading frames (ORFs) was performed using Glimmer3.
The initial set of ORFs was manually selected from the prediction result in combination with BLASTP results.
For functional annotation, the non-redundant UniProt database and protein signature database, InterPro, were searched to assign the predicted protein sequences based on sequence similarities.
The KEGG database was used for pathway reconstruction.
Signal peptides in proteins were predicted using SignalP and transmembrane helices were predicted using TMHMM.