How one plant bacterium reshaped our views on DNA organization

Mar 8

A bacterium’s genome is all the DNA within its cell, organized into chromosomes or plasmids. Source

“To know me is to fly with me,” says Ryan Bingham, George Clooney’s character in the 2009 film Up in the Air. After flying 270 days a year for work, accumulating 10 million frequent flier miles by the movie’s end, Bingham has turned getting through airport security into an art. Every turn of his black roller bag is precise and practiced. He skips the crowded lines in the priority access lane; His ticket is held out before TSA asks for it; And he, of course, never checks a bag. Bingham is efficient, orderly, streamlined, without a gram of excess baggage or wasted space. One carry-on is all he needs to hold the essentials of his life.

Bacteria pack their genomes much in the same way Bingham packs for his flights: all the essentials in one bag. For most species, the genome is one circular chromosome that encodes all the essential genes needed to survive. Though a measly 2% of the human genome actually codes for proteins (the remaining 98% being cluttered with so-called “junk DNA”) bacterial chromosomes are packed to the brim with protein-coding genes, with an average protein-coding density of 87%.

In addition to the chromosome, many species also carry small supplementary DNA molecules called plasmids. Plasmids encode accessory genes that aren’t necessary for survival but can provide beneficial functions in special circumstances, such as antibiotic resistance. Further, plasmids can be transferred “horizontally” between neighboring, often unrelated bacteria. Like the mind-jack in The Matrix — where a needle goes in the back of Neo’s head and he immediately knows kung-fu — plasmids can bestow new traits onto bacteria without the fuss of millions of years of evolution. (This is in contrast to “vertical” transmission when a parent cell gives a copy to its offspring.) Plasmids give bacteria unrivaled flexibility with the genomes, allowing them to acquire new genes when they’re needed and eject them when they’re no longer useful — all while keeping the organized chromosome untouched.

Together, the organization of essential genes on streamlined chromosomes and accessory genes on versatile plasmids suggests that natural selection has favored organized and trim bacterial genomes. However, the haphazard way one newly-discovered species, Aureimonas ureilytica, packs its genome frustrates this narrative, and challenges the usual thinking about how organisms organize their DNA.

In most regards A. ureilytica is an unassuming species. Isolated in 2011 from the stem of a soybean plant, it belongs to the class Alphaproteobacteria, one of the most abundant and diverse groups of bacteria, and the order Rhizobiales, well-known plant symbionts which benefit their host by providing growth-promoting nutrients and phytohormones. A. ureilytica was initially studied because of the way its numbers shrink and grow in response to nutritional changes in the host plant, suggesting some kind of symbiotic relationship. In order to investigate the genes that might underpin these plant-microbe interactions, Mizue Anda and colleagues set out to sequence its genome.

Whole-genome sequencing revealed A. ureilytica encodes 5.2 million base pairs of DNA organized into nine circular replicons. Genome lengths are measured in base pairs, the number of complementary nucleotide bases (A to T, and G to C) that form each rung of the DNA ladder. In the jargon of bacterial genomics any circular DNA molecule is called a replicon, and replicons are further classified as either chromosomes or plasmids. The largest replicon in A. ureilytica’s genome was 3.7 million base pairs, with the remaining DNA split amongst three replicons between 300,000-500,000 base pairs, and five under 100,000 base pairs.

Biologists, always excited to create new naming systems, have devised a scheme to differentiate between chromosomes and plasmids. At first the task seemed relatively simple: chromosomes are large and vertically transmitted, while plasmids are small and horizontally spread. But evolution has a knack for producing oddballs and outliers that blur seemingly clear lines. Though the average bacterial chromosome is 50x larger than the average plasmid, this masks a nearly 100-fold range in the sizes of chromosomes as well as a sub-class of plasmids, dubbed “megaplasmids,” that are 10x larger than the average. When looking within a single species’ genome, the chromosome will be the largest replicon. But when trying to make a general rule that accounts for all sizes in all species there’s an uncomfortable overlap between the smallest chromosomes and largest plasmids. Well, what about horizontal transfer? Unfortunately, this is similarly mired in exceptions. Plasmids can sometimes lose the ability to be horizontally transmitted, leaving them only vertical transmission as a way to spread and can, and that shouldn’t suddenly change its label.

Instead, the characteristic which has served as a practical and robust distinction between chromosomes and plasmids has been the physical separation between essential and accessory genes. The replicon that encodes all essential genes needed to survive — DNA replication, cell membrane production and maintenance, central metabolism — is the chromosome, and any other replicon — which, necessarily, only encodes accessory genes — is a plasmid. True, additional copies of an essential gene may be found on a plasmid but the “main” chromosome will never lack it.

Size and horizontal transfer are fine shorthand criteria, but the real rule is the essential-accessory gene split. At least, that’s what’s been long thought. Based on the shorthand rule, it was intuitive to designate the largest replicon, encoding 70% of the genome, as the chromosome. However, it would be the smallest and most unlikely of these smaller replicons — a circular ring of DNA just 9,000 base pairs long — that would upset easy classification, and in the process upend the textbook distinction between chromosomes and plasmids. Because this “plasmid,” dubbed pAU20rrn, encodes the sole copy of A. ureilytica’s rRNA operon (rrn operon) — including the essential 16S, 23S, and 5S rRNA genes.

Discovering that the sole copy of one of this obscure species’ essential genes is on a small plasmid would have been a surprise in and of itself. But the fact that the gene was the 16S rRNA gene must have amplified the researchers’ shock ten-fold. To fully appreciate why, it’s necessary to take a brief detour to describe the history of the 16S rRNA gene.

Functionally, it encodes a subunit of the bacterial ribosome, the molecular machine that translates mRNA transcripts of genes into proteins. However, its importance in microbiology goes beyond just what it does for bacteria. Rather than being just any common housekeeping gene, the 16S rRNA is the gene for studying bacterial phylogeny and taxonomy.

With the advent of DNA sequencing, it became possible to determine organisms’ evolutionary relatedness not by comparing second- or third-order phenotypic traits, like the shape of teeth or number of toes, but by directly comparing the gene sequences that encode these traits. No field was more impacted by this molecular revolution than microbiology. Unlike animals and plants, which are rich in complex morphological traits that can be compared, the morphologies of bacteria are so simple that comparisons based on appearance alone are of little to no use. The best that can often be done is noting whether they’re shaped like a sphere or a rod — not exactly high-resolution data. Physiological traits, such as the nutrients required to grow or the presence of a peptidoglycan cell wall, can correctly group related bacterial species, but they can also miss relatives lacking the feature in question.

Because of these limitations, early attempts at phylogenetic classification of bacteria more often than not created flawed schemes that confused rather than clarified. As a result, microbiologists of the 1800s and early 1900s wisely avoided the subject. It’s not that they were unaware of how insightful determining the evolutionary relationships between species could be. They could certainly see the successes their colleagues down the hall in the botany and zoology departments were having investigating these. They just didn’t have the technology to experimentally determine those relationships. The field continued to characterize species, particularly disease-causing bacteria like Eschericia coli and Mycobacterium tuberculosis, but they didn’t actually have any phylogenetic framework that could connect these species by descent. The Linnean names given to these taxa suggested knowledge about evolutionary relationships that just wasn’t there.

That’s where the 16S rRNA gene comes in. With DNA sequencing, genotypes could be used in lieu of phenotypes to compare and classify organisms. For microbiology, which up until that point had no comparable phenotypes, this was a massive boon. But what was first needed was a gene that could be used as a “chronometer,” a biological stopwatch that can be used to measure time. Like how radiometric dating calculate the age of rocks by measuring the nuclear decay of radioactive isotopes, chronometers calculate the time since two species diverged by measuring the number of mutations.

A good chronometer meets three criteria: First, it must be found in all species. Naturally, you can’t compare the sequence of a gene in multiple species if some don’t have that gene. Second, its sequence must change randomly over time and at a rate equivalent to the degree of evolutionary separation. Like the half-life of radioactive nuclei, genes will acquire random mutations at a more or less constant rate, a phenomena known as the “molecular clock.” Counting the number of mutations between two sequences, therefore, is equivalent to counting the number of years. One might think the best chronometers are sequences whose rate of change is equal to the average mutation rate. But these sequences would change so rapidly — evolutionarily speaking — to be of little use comparing vastly unrelated species. Instead, the best chronometers are essential genes whose functions are so well-maintained by natural selection that mutate very slowly relative to other genes. Lastly, the chronometer must be long enough to capture all that change data. Long genes are important to give you better statistics and ensure mutations don’t overwrite eachother. But it’s perhaps more important that the gene has independent functional regions, such that a large nonrandom mutation in one region doesn’t affect the others, and the clock can still run smoothly.

The 16S rRNA gene meets all these criteria. It’s found in all species, mutates very slowly, and is comprised of different independent regions. The power of this approach was exemplified by Carl Woese and George Fox from the University of Illinois, in their foundational 1977 paper which used this information to discover that the microbes previously lumped together under the label Prokaryotes actually belong to two distinct domains: Bacteria and Archaea. Even with advancing technologies and the ability to sequence whole genomes 16S rRNA sequencing remains a powerful tool, particularly for microbiologists studying bacteria which cannot be isolated. Many such species are known only by their 16S rRNA sequences.

The importance of the rRNA operon magnified the shock at the unexpected discovery that A. ureilytica’s sole copy was located on a plasmid instead of the chromosome. That a species organized all its essential genes into the chromosome except one would be a notable discovery in and of itself. But it’s as if the rebellious A. ureilytica didn’t want any scientific hemming and hawing over its exceptionalism, and so deliberately chose the most iconic, recognizable essential gene it had to stick in a plasmid. And not just any plasmid either. At just 9,000 base pairs, or 0.2% the size of the chromosome, this plasmid hardly anything more than the rRNA operon (6,000 base pairs) circularized.

This discovery has numerous implications for bacterial genomics and evolution. It’s a clear exception to the textbook rule that the chromosome encodes all essential genes while plasmids are for accessory genes. No longer could that be used as a defining distinction between these two replicons because it wasn’t the case in A. ureilytica. But, as with all interesting questions, the importance of a new discovery lies in its frequency. If A. ureilytica is a unique, oddball species with an interesting but unstable genome doomed to extinction by the pruning shears of natural selection, then it can be dismissed as a minor exception to an otherwise robust rule. On the other hand, if other species share this unusual genome arrangement, and if it can be stably maintained over long evolutionary periods, then that would have major implications on our understanding of bacterial genomics.

To investigate these questions, in 2023 Anda and colleagues devised a follow-up study. Their goal was to find more species like A. ureilytica, without chromosomal rRNA operons. They searched >80,000 bacterial genomes for those whose rRNA operons were located on replicons that, by all other metrics, appeared to be plasmids.

Anda found three genomes which matched this criteria: two from the genus Persicobacter, and one from Treponema saccharophilum. Both Persicobacter genomes had three rRNA operons in tandem on a ~30,000 base pairs sequence, while the T. saccharophilum genome’s rRNA operon was located on an 8,400 base pairs plasmid. They then sequenced the genomes of the type strains for these species — P. diffluens and T. saccharophilum JCM 32279 — and determined these strains also have the same arrangement of rRNA operons as the genomes from their search. Therefore, they concluded that P. diffluens and T. saccharophilum are two species additional which lost rRNA operons from their chromosomes. Together, this meant that species from at least three phyla — first Pseudomonadota with A. ureilytica, and now Bacteroidota and Spirochaetota — independently lost chromosomal rRNA operons.

Having established that A. ureilytica isn’t an oddball but in fact one of many species which have lost chromosomal rRNA operons, Anda then investigated how stable this genome organization is. Are these species just random evolutionary one-offs, or is this an arrangement that can be maintained for an evolutionarily long time? Looking at the genomes of other Treponema species, they found that these have their rRNA operons in the chromosome. This suggests T. saccharophilum lost its chromosomal rRNA operon fairly recently.

On the other hand, related Persicobacter species had not yet had their genomes sequenced, raising the exciting possibility that the whole Persicobacteraceae family lacks chromosomal rRNA operons. To investigate this, Anda sequenced the genomes of three species in the Persicobacteraceae family — P. pyschrovividus, Aureibacter tunicatorum, and Fulvitalea axinellae — and found all three lack chromosomal rRNA operons. Their rRNA operons instead are located in the same arrangement on the same plasmid. This strongly suggests that loss of the chromosomal rRNA operon occurred in the common ancestor of these Persicobacteraceae species. That ancestor was then subject to the classic forces of selection, divergence, and speciation — all the while passing along its special genome organization to all descendent species.

The answer to the question of how long this unusual genome arrangement can be maintained is the same as the question of how long ago this common ancestor lived. Unfortunately, because most bacterial species leave little to no traces in the fossil record it’s not possible to determine this directly. But by using the same principles used to calculate evolutionary distances with the 16S gene, by counting the number of differences between two genomes and multiplying that by the average mutation rate, it’s possible to estimate approximately how long ago this common ancestor lived. Using this, Anda estimated their common ancestor of these Persicobacteraceae species lived approximately 500 million years ago — a time so long ago there were no organisms bones, and no grasses or trees. Rather than being short-lived oddball doomed to extinction, it seems bacteria can maintain rRNA operons solely on plasmids for hundreds of millions of years.

Living organisms are not — can never be — perfectly optimized. We’re a product of history, a jury-rigged amalgamation of imperfect parts assembled over billions of years for different purposes. Darwin understood this to be the strongest proof of evolution. Organisms could only be perfectly designed in a world without history, and a world without history might as well have been created as we find it. Though history precludes perfection, fortunately, natural selection doesn’t demand it. The only requirement is that organisms work well enough to survive and reproduce.

The sloppy genome organization of A. ureilytica and others is another in a long list of features that highlight this principle. There’s some suggestion that having rRNA operons exclusively on plasmids could provide some evolutionary advantages. Since multiple copies of a plasmids often exist in cells, putting the rRNA operon on a plasmid could increase its copy number, thereby increasing the rate of rRNA synthesis, and theoretically enabling rapid adaptation to changes in the environment. However, these species lacking chromosomal rRNA operons come from very different environments (marine, soil, rumen, plant stems, air), making it unlikely they’re all converging on the same unusual adaptation to the same unique environmental stressor. And even if it was initially selected for in the common ancestor due to some selective advantage, that doesn’t mean all species today continue to utilize that function. Regardless of the potential or hypothetical benefits, it remains that this unusual arrangement arose due to some chance event. Because it doesn’t have any negative consequences which would compel selection against it, it was left as-is for half a billion years.

This story also highlights a second principle of evolution. Though we might seek structure and order in the universe, Nature isn’t under any obligation to provide us with it. There are, to be sure, different kinds of things — different species and different DNA molecules. But the boundaries between kind are rarely clear. They’re fuzzy, with one kind grading into the other. Objects at these boundaries confuse and frustrate us only so long as we insist that everything fits unambiguously into the neat categories we’ve devised for ourselves. The wall dividing chromosomes and plasmids, built upon genuine facts known at the time, has been toppled. These categories have had, and certainly will continue to have, practical importance in the science of bacterial genomics. But it’s a mistake to confuse our labels with reality. When it comes to the study of living organisms sometimes we, like A. ureilytica, have to accept that a little disorganization isn’t always a bad thing.

Sources:

Anda M, Ohtsubo Y, Okubo T, Sugawara M, Nagata Y, Tsuda M, Minamisawa K, Mitsui H. Bacterial clade with the ribosomal RNA operon on a small plasmid rather than the chromosome. Proc Natl Acad Sci USA. 2015 Nov 17;112(46):14343–7. doi: 10.1073/pnas.1514326112. Epub 2015 Nov 3. PMID: 26534993; PMCID: PMC4655564.
Anda M, Yamanouchi S, Cosentino S, Sakamoto M, Ohkuma M, Takashima M, Toyoda A, Iwasaki W. Bacteria can maintain rRNA operons solely on plasmids for hundreds of millions of years. Nat Commun. 2023 Nov 14;14(1):7232. doi: 10.1038/s41467–023–42681-w. PMID: 37963895; PMCID: PMC10645730.

Kevin Blake

How one plant bacterium reshaped our views on DNA organization

Sources:

Sequence-structure-function characterization of the emerging tetracycline destructases family of antibiotic resistance enzymes

The Gender Gap in Bacteria Named After Scientists