Harnessing Chatbots as Chemistry Research Assistants

WRITTEN BY NEIL LIU

ILLUSTRATED BY MANAL VISHNOI

February 22, 2024 | | 9 min read

Grappling with large amounts of experimental data can be burdensome for researchers. For decades, researchers have been developing new algorithms and training artificial intelligence (AI) models to shoulder such laborious tasks. With the breakthrough in large language models, including ChatGPT, a new revolution in chemistry research might have arrived. In a recent study in the Journal of American Chemistry Society, a team from UC Berkeley trained ChatGPT for a time-consuming task—searching academic literature. To further demonstrate the potential of ChatGPT, the team used this large language model to help them code a machine-learning model that predicts experimental outcomes and a customized chemistry chatbot.

The answer was prompt engineering—carefully designing the prompts for the chatbot to steer ChatGPT toward generating precise and pertinent information. One problem the team faced was hallucination, a phenomenon where ChatGPT and other language models fabricate unreliable and misleading responses when the prompt is unclear or asking beyond the database. To address this, the team designed detailed instructions to make ChatGPT less likely to provide incorrect information. For example, the team made an additional prompt following their inquiry: “If you are uncertain, reply with ‘I do not know’”, and therefore forced the AI to answer based on its knowledge. The team also prompted ChatGPT to generate outputs in a table with fixed headers, making data easier to process by computer programs. Together, these two measures were important to the development of ChatGPT Chemistry Assistant.

Having utilized data-mining techniques to obtain MOF synthesis data, the team wanted to leverage this resource to build a MOF synthesis Q&A chatbot, making the dataset more accessible. With the help of ChatGPT, the Yaghi team compiled a large dataset including bibliographic information of each paper and MOF synthesis factors, such as reaction time, temperature, type of metal ions, etc. To be manageable by the language model, this information was converted to a specific format called text-embedding, which uses numeric vectors to represent semantic meanings: the closer two vectors are, the more similarity two sentences/words will have. Similar to the development of the text-mining process, the programming for text-embedding was done by ChatGPT. The resulting MOF chatbot could construct its answer centered around given synthesis information. For newcomers to MOF research, this chatbot could provide comprehensible data, reliable sources, and detailed explanations to make the learning process of MOF synthesis more efficient.

This study demonstrated the great potential of language models in the realm of chemistry. The impact of AI in chemistry transcends the boundaries of MOF research. Chemists, even those not familiar with coding, can set up specialized AI research assistants, potentially reducing the time consumed by routine work. Chatbot assistants can also make education more efficient if used correctly and fairly. For college students, having chatbots specialized in chemistry will not only offer learning options to strengthen our understanding outside of the lecture but also open the door to numerous fields of research.