Blog <-

Release Risks: October 2022

Caz Gottlieb & Mehdi Jamei


At System, we believe it is our responsibility to consider and be open about the potential risks and unintended consequences we see as we ship new releases. We made a commitment under our charter as a PBC to publish what we call “Release Risks” that document the risks of major releases and the efforts we are taking to mitigate them. This custom started with our beta launch earlier this year, and now, we are continuing this conversation with this latest release.

Here is a summary of the risks we identified and our work to mitigate them.

What biases could be present when writing content to System with AI?

  1. Selection of corpus: The raw content we use for auto-extraction is sourced today from PubMed, which consists of peer-reviewed, biomedical and life sciences literature. System currently processes and extracts findings in English text only, as well as only evidence present in the abstract of the article.

    Mitigation: After processing the full abstract corpus at our standard for accuracy, we will extend ingestion to additional corpuses that span other fields of research. We are also investigating increasing our coverage to include evidence that appears in the full text of an article.
  2. AI extraction process: The System Platform utilizes OpenAI’s GPT-3 model, so it inherits any biases from this technology. Additionally, we train it on specific statistic types, so  relationships written are constrained to these types.

    We are expanding our coverage to all statistic types. And we stay on top of pertinent scholarship surrounding GPT-3 and do not intend to be dependent on a single LLM over time.
  3. Topics on the graph: Evidence written to SystemDB is ultimately matched to a pair of topics in Wikidata, grounding the information on System to a well-accepted ontology — but occasionally limiting what appears on the System Graph. Additionally, for a relationship to appear on the System Graph, it must be backed up by at least one piece of evidence that is statistically significant (at 95% confidence level).

    We are working to augment our use of Wikidata with other well-accepted ontologies. And we are evaluating the graph experience to include all contributing relationships regardless of a fixed significance level.

How can I trust that System’s extractions are accurate?

  1. AI extraction accuracy: System’s models are only deployed to production when strict quality thresholds are met. But no model is ever perfect. As such, we actively seek to identify varying events where the extracted findings are not supported by the source.

    When a pattern of inaccuracies is observed with a specific type of statistical relationship, we suspend publication of this evidence until our models meet quality standards. This is guided by a continuous process of both human sampling and monitoring performance of model. Learn more about our relationship methodology here.

    Alongside this release, we have introduced content review across System. Logged-in users can now flag any errors in extraction, naming, or matching. Content that is flagged will be marked as in review and immediately reviewed by our team and community. This data will also be used to improve the performance of our models.

How does System consider prioritizing content presented to users?

  1. Content visualization: Due to the nature of digital platforms, there is a finite amount of content we can show on any given page, before truncation and pagination. And this does influence the content that users see and digest (the first page of content will always be the most read).

    Within content panels, System’s “sort by” defaults to surface the strongest relationships first, based on statistical evidence. Our intent is to keep rank as neutral as possible by allowing the science to establish priority. Additionally, the System Graph does not weigh any relationships above others.

Release Notes