What’s holding back microbiome research translation?

20 Oct

Front cover of LYVA Labs' report in to Microbiome Discovery to Commercialisation — We supported a LYVA Labs investigation into microbiome research translation barriers

While LYVA looked across the wider ecosystem, we were specifically tasked with identifying Informatics barriers.

At Cofactory, we work where Life Sciences, Chemistry, Informatics and Engineering meet. We’ve built data systems, analysis pipelines and software, and launched AI products for research and industry.

A microbiome is the community of microbes — bacteria, fungi, viruses and others — that live in or on a particular environment. They’re everywhere: in soil, water, the human gut and skin. They affect infection, immunity, nutrition and more.

There’s an enormous body of microbiome research now - with the Microbiome Innovation Centre in Liverpool leading the way - so why aren’t we seeing more translation to clinic and industry?

When LYVA Labs won iiCON funding to explore why, they asked us as domain experts to look into Informatics barriers. We set out to find what’s really slowing things down and what can be fixed today.

What We Found

Across design-led workshops and dozens of interviews, we identified both organisational and technical Informatics barriers. This post focuses on the technical.

Study design: Many projects are underpowered because bioinformatics and statistics are brought in too late. While that’s an organisational issue, the technical impact is clear — once the data’s been generated, it’s often too late to fix. Studies are left with too much noise and not enough statistical power to yield meaningful insights.
AI and Digital Twins: We heard a lot of hope that large AI models or simulations might emerge to make sense of large and complicated microbiome datasets. Right now, these ideas are early-stage, inherently complex, and highly data-hungry: they’re not ready to solve today’s problems. Instead, we focused on flushing out technical issues we can tackle now to make the next steps in microbiome data analysis actually work.
Metadata: This is the biggest and most fixable issue. Metadata — the details about each sample and how it was collected — is often patchy or wrong. One partner found an “Ethnicity” field containing Austrian — a nationality, not an ethnicity. Another found a “Location” field that was supposed to describe where on the body a sample was taken (“behind left knee”) but contained entries like Laboratory and USA.

Getting Metadata Right

These mistakes might sound minor — even funny — but they make data comparison and detailed analysis impossible. Metadata should use controlled vocabularies and ontologies so that terms mean the same thing across datasets.

Good metadata unlocks insights that are currently hidden from view. With it, we can detect subtle responses to interventions that are currently drowned out by metadata noise, and compare datasets across different studies to uncover shared trends and differences. Right now, those insights — and their value — are lost.

The FAIR metadata principles — Findable, Accessible, Interoperable, Reusable — are well known but rarely stick. They describe what to do, but not how. In practice, FAIR tools are too hard for end users. Industry projects, especially, can’t spend half their time and budget annotating metadata. FAIR only works if the tools make it easy by design.

We don’t need more white papers advocating for FAIR. We need practical software that fits into real workflows and helps people capture metadata correctly. It should support users as they work, resolving terms to the right vocabularies in the background.

Three Things to Take Away

Experimental design first. You can’t analyse your way out of a weak study. Talk to informaticians from the start.
Metadata is the backbone. It’s what allows dataset stratification, comparison and reuse. It’s a priority, even if it often gets overlooked.
Build tools, not frameworks. Metadata capture is still too hard. From a human-led design standpoint, FAIR only works if it’s easy to do — and right now, it isn’t.

For both fundamental microbiome research and translational projects, we need solid metadata mapped to clear terminology. That means building better tools that help people get FAIR right the first time.

If we fix metadata, we fix half the problem. The rest follows once our data can talk to each other.

You can read the wider LYVA Labs investigation here.

Paul Dobson

What’s holding back microbiome research translation?

What We Found

Getting Metadata Right

Three Things to Take Away

Our scientific publications

Contact