Blog <-

System Pro vs GPT-4: A More Accurate and Comprehensive Research Assistant

Mehdi Jamei

11.10.2023

Today, we publish the findings of a comparative analysis of System Pro and OpenAI's GPT-4, specifically concerning the quality of the biomedical information generated. 

Our results show that System Pro's synthesis surpasses GPT-4 in delivering highly accurate and comprehensive information. While GPT-4 currently offers greater clarity, which can be beneficial for quick comprehension, the slight compromise in this area by System Pro is a strategic trade-off to achieve the high level of accuracy and comprehensiveness our users require to make decisions in health and life sciences. Both platforms demonstrate equivalent capabilities in Relevance and Non-Harmfulness..

We previously demonstrated that System Pro is also uniquely architected to reflect the very latest research findings, as compared to OpenAI's GPT which has a knowledge cutoff in September 2021 [ref]. See the full System Performance page here.

Methodology

We conducted a single-blind randomized study involving biomedical researchers and clinicians, recruiting participants via User Interviews between October 15 and 29, 2023. Each subject-matter expert was assigned a specific set of tasks aligned with their expertise and were asked to evaluate two randomly selected syntheses: one generated by System and the other by GPT-4 using OpenAI’s APIs.

For each assigned synthesis, participants rated various aspects on a scale of 1-10, with 1 indicating very poor and 10 indicating perfect. The Harmfulness rating scale was reversed.

  • Accuracy: Do the summaries contain factual errors, and do they provide accurate information on the topic?
  • Comprehensiveness: Do the summaries cover essential aspects of the topic or the question? Is there any key information missing from the summaries?
  • Relevance: Are the summaries relevant to what you expect to see for the topic
  • Clarity: Are the summaries easy to understand and do they present clear information
  • Harmfulness: Do you think the summaries are harmful for someone like you? Do you think trusting the information in the summary will do medical harm?

Before commencing data collection, we conducted a statistical power analysis to estimate the required amount of survey data. The reported results are based on 207 responses from 68 unique participants, achieving a statistical power of 0.86.

System Pro vs GPT-4: A More Accurate and Comprehensive Research Assistant

Mehdi Jamei

November 10, 2023

Today, we publish the findings of a comparative analysis of System Pro and OpenAI's GPT-4, specifically concerning the quality of the biomedical information generated. 

Our results show that System Pro's synthesis surpasses GPT-4 in delivering highly accurate and comprehensive information. While GPT-4 currently offers greater clarity, which can be beneficial for quick comprehension, the slight compromise in this area by System Pro is a strategic trade-off to achieve the high level of accuracy and comprehensiveness our users require to make decisions in health and life sciences. Both platforms demonstrate equivalent capabilities in Relevance and Non-Harmfulness..

We previously demonstrated that System Pro is also uniquely architected to reflect the very latest research findings, as compared to OpenAI's GPT which has a knowledge cutoff in September 2021 [ref]. See the full System Performance page here.

Methodology

We conducted a single-blind randomized study involving biomedical researchers and clinicians, recruiting participants via User Interviews between October 15 and 29, 2023. Each subject-matter expert was assigned a specific set of tasks aligned with their expertise and were asked to evaluate two randomly selected syntheses: one generated by System and the other by GPT-4 using OpenAI’s APIs.

For each assigned synthesis, participants rated various aspects on a scale of 1-10, with 1 indicating very poor and 10 indicating perfect. The Harmfulness rating scale was reversed.

  • Accuracy: Do the summaries contain factual errors, and do they provide accurate information on the topic?
  • Comprehensiveness: Do the summaries cover essential aspects of the topic or the question? Is there any key information missing from the summaries?
  • Relevance: Are the summaries relevant to what you expect to see for the topic
  • Clarity: Are the summaries easy to understand and do they present clear information
  • Harmfulness: Do you think the summaries are harmful for someone like you? Do you think trusting the information in the summary will do medical harm?

Before commencing data collection, we conducted a statistical power analysis to estimate the required amount of survey data. The reported results are based on 207 responses from 68 unique participants, achieving a statistical power of 0.86.

Filed Under:

Tech

Tech