Prompt Engineering

Description

Education is increasingly being disrupted by AI-powered chatbots like ChatGPT. Though limited in various ways, such chatbots are formidable tools to use alongside traditional pedagogical methods. To derive maximum benefit from them, we must overcome the prompt engineering problem: how to formulate human user input (i.e. prompts) to maximise the chances of getting meaningful output from chatbots. Our British Academy/Leverhulme SRG funded project (PI: Ioannis Votsis, co-PIs: Brian Ball and David Freeborn, RAs: Maria Federica Norelli, Riccardo Tosselini) titled ‘Reverse Engineering Prompts’ combines expertise from philosophy with empirical and data science methods to address the problem of prompt engineering by conducting case-based research.

We are pleased to present some preliminary results here: Our research looked at whether the way a question is written affects the quality of the answer. We focused on the subject of simple Python coding questions and answers. We first considered human-to-human interactions on online forums. One interesting result was that politely worded, easy-to-read questions tended to produce more readable answers. We then considered human-to-chatbot interactions, where the humans ask a question and chatbots answer it. Chatbots were noticeably more responsive to how a question is written than humans. For example, a highly grammatical question was more likely to result in a highly grammatical answer. Ditto with highly readable questions. We also found trade-offs in answer quality. For example, more complex answers tended to be less grammatical. Finally, one of our biggest findings is that chatbot answers are much more uniform in quality than human answers.

We are currently extending our analysis by examining a much larger dataset and testing prompts on more advanced chatbots. We hope that our research will help throw light on how to fruitfully exploit chatbot technologies in pedagogical and other settings. Our findings should be relevant for educators, researchers, developers, policymakers, and the general public.