Blog <-

Scaling System with GPT-3

Adam Bly


We are excited to announce a major milestone on our journey to relate everything

Starting today, System now continuously auto-extracts statistical relationships from peer-reviewed scientific papers. To the best of our knowledge, this is the first time that statistical relationships have been auto-extracted at scale from scientific literature to improve search and our understanding of complex topics.

This breakthrough, leveraging recent advancements in Large Language Models — specifically GPT-3 — massively scales up the volume of accurate statistical evidence freely and openly available on System, deepening and enriching our shared understanding of how anything in the world is related to everything else. 

Once extracted from sources like PubMed, statistical relationships (including causal relationships) are grounded semantically with known scientific terms, and normalized and linked to other statistical relationships in SystemDB — the large-scale graph powering System.

You can search System for a topic and see all the factors that impact and are impacted by it. For example, System now contains hundreds of pieces of statistical evidence relating endometriosis to its risk factors and effects — like tobacco smoke pollution and ovarian cancer. You can also search System for a relationship between topics and see all the statistical evidence of that relationship that System has gathered and synthesized. The depth and breadth of evidence on System increases daily.

As a Public Benefit Corporation, we’ve committed to "release features to the general public only after they have undergone an internal review of their potential unintended consequences." Earlier this year, we introduced the practice of publishing what we call “Release Risks” with major releases to openly share potential unintended consequences and how we are mitigating them. You can read more about how we are mitigating the risks of this release in our accompanying Release Risks

For example, alongside today’s release we have introduced content review across System. Logged-in users can now flag any errors they identify in extraction, naming, or matching. Content that is flagged will be marked as in review and promptly reviewed by our team. This data will also be used to improve the performance of our models. 

In the coming weeks, we’ll be introducing a ground-breaking solution that leverages today’s announcement to improve research and healthcare.

Sign up and join our community today.