Our engineers conducted an experiment to see if it is possible for AI to pass a non-technical online course without any human involvement. They chose Coursera as one of the biggest e-learning platforms in the world that offers a variety of courses, from programming to philosophy.

The goal of the experiment is not only to satisfy our curiosity but also to explore the capabilities of cutting-edge AI models in solving intellectual tasks (stressful exams in our case).

Table of Contents

What is the technology behind the experiment?

As a basic model for understanding questions and generating answers, we chose a Generative Pre-trained Transformer 3 (GPT-3), an autoregressive language model from OpenAI that is currently a state-of-the-art technology for generating human-like text. It performs well in a wide variety of NLP (natural language processing) tasks such as QA, summarization, generation, conversation, and others. What is more, it can also be used for multiple-choice questions. Simply put, this model takes a prompt text as input and completes it.

Additionally, we used RoBERTa (fine-tuned on the squad2 dataset from Hugging Face Model Hub) as a context builder extractor, for the extractive QA task. This model takes a question and a context as input and returns two positions where the correct answer is most likely located. Although this model was made for extracting answers, it’s not advanced enough to answer questions from the educational courses. Therefore, it is used not as the main “brain”, but as a context builder for a much more powerful GPT-3.

How does it work?

Approach 1

Our team decided to try two different approaches for every course. The first and simpler one requires only GPT-3. Since this model is trained on a huge amount of data, it should already have enough knowledge to solve educational tests. In this approach, we provide the questions and answer choices from the weekly test combined with some instructions like: “Choose one of the answers” if the question specifies that there could be only one answer or “Choose answers that apply” if the question may have more than one answer. With these settings, the GPT-3 model should return the text containing answers that are among the available options.

Approach 2

One of the ways to improve the GPT performance is passing information as a context along with questions and options. The problem is that the input is limited to 4,000 tokens, which is about 3,000 words. Thus, it’s impossible to fit a full week of lectures from any course in it. That is why we decided to create a context builder, using a fine-tuned RoBERTa model. Although the answer it gave was usually incorrect, the answer position that the model extracted could be useful for narrowing down the information radius to a specific part of the lecture. As a result, we limited the input to 1,000 words. So now the GPT processes the question, the answer options, and the context of 1,000 words with potentially useful information.

After developing our approaches, we selected three random Beginner level courses in different domains. We decided to focus on non-technical courses that required the processing of a big chunk of information but did not involve working with numbers, formulas, or code. Here are the courses our AI had to take:

Introduction to Psychology // Offered by Yale University

About this Course: What are people most afraid of? What do our dreams mean? Are we natural-born racists? What makes us happy? What are the causes and cures of mental illness? This course tries to answer these questions and many others, providing a comprehensive overview of the scientific study of thought and behavior. It explores topics such as perception, communication, learning, memory, decision-making, persuasion, emotions, and social behavior.

Foundations of Project Management // Offered by Google

About this Course: This course focuses on the foundational project management terminology, as well as offers a deeper understanding of the role and responsibilities of a project manager. Throughout the program, the students learn from current Google project managers, who provide them with a multi-dimensional educational experience.

Introduction to Philosophy // Offered by the University of Edinburgh

About this Course: This course introduces the students to some of the main areas of research in contemporary philosophy. In each module, a different philosopher explores some of the most important questions and issues in their area of expertise. The course begins with the fundamentals of psychology – what are its characteristic aims and methods, and how does it differ from other subjects? The rest of the course focuses on an introductory overview of several different areas of philosophy.

Now that you understand the mechanism, let’s dive into the most exciting part of the research – the results. Here’s how our AI performed in the three courses we chose

Introduction to Psychology

The passing grade for this course was 80% or more on every weekly test. The GPT alone wasn’t able to do this. The second approach allowed the AI to get a little higher grade on average, but it still failed 2 out of 6 weekly tests. One of the reasons for the failure was a large amount of course information that made it difficult for the extractive QA to select the proper information. What is more, the multiple choice questions were more complicated for the AI, so they contributed to the lower score as well.

Despite not passing the course, the AI did a decent job, while also being around 90 times quicker than an average student. A human would need approximately 15 hours to complete this course, but the AI managed to do it in a little longer than 10 minutes.

2. Foundations of Project Management

The second course from Google was no sweat for the AI at all. In fact, it required only about 7 minutes to pass it successfully, even without context from lectures. Now, a human would have to struggle for approximately 18 hours to get a certificate like that, which makes the result even more impressive.

3. Introduction to Philosophy

With this course, the GPT had to fiddle a little. The condition for passing was to get an 80% grade at least in one of the two tests at the end of the week. As you can see, the first approach resulted in quite mediocre scores. However, the context builder turned out to be really helpful here. In the second approach, the grades were higher, and, as a result, the course was successfully passed. The AI spent less than 15 minutes on this course, which is ~75 times faster than a human would (approx. 19 hours).

Conclusion

Our experiment proves that modern AI is advanced enough to get a Coursera certificate. While some of the 4000 courses offered by the platform require human interaction, such as those that involve programming or essay answers, many do not. This means that artificial intelligence can complete these courses on its own, and in a shockingly small amount of time. GPT-3 was able to answer the majority of questions, including those that had more than one correct answer. This proves that modern AI is already quite proficient at solving intellectual tasks. Our experiment also uncovers the potential of AI as a tool for evaluating educational content or providing personalized recommendations for students.

However, just like us humans, AI also has its limitations. The major reason for the failure in the first attempt may be a large volume of lectures, which lowered the chances of the context builder extracting the proper information. And some courses still might be just too difficult for artificial intelligence….for now. However, it doesn’t mean we won’t try. Microeconomics? Intro to Python? Let us know in the comments which courses we should try next.