Do Large Language Models Speak Scientific Workflows?
Yildiz, Peterka
With the advent of large language models (LLMs), there is a growing interest in applying LLMs to scientific tasks. In this work, we conduct an experimental study to explore applicability of LLMs for configuring, annotating, translating, explaining, and generating scientific workflows. We use 5 different workflow specific experiments and evaluate several open- and closed-source language models using state-of-the-art workflow systems. Our studies reveal that LLMs often struggle with workflow related tasks due to their lack of knowledge of scientific workflows. We further observe that the performance of LLMs varies across experiments and workflow systems. Our findings can help workflow developers and users in understanding LLMs capabilities in scientific workflows, and motivate further research applying LLMs to workflows.
academic
Do Large Language Models Speak Scientific Workflows?
With the emergence of large language models (LLMs), there is growing interest in applying LLMs to scientific tasks. This study experimentally explores the applicability of LLMs in configuring, annotating, and translating scientific workflows. Using three distinct workflow-specific experiments, the research evaluates the performance of multiple open-source and closed-source language models on state-of-the-art workflow systems. The study finds that LLMs frequently encounter difficulties due to insufficient training data on scientific workflows, and their performance varies across different experiments and workflow systems.
Scientific workflows play an important role in high-performance computing (HPC) environments, consisting of a series of collaborative tasks that work together in scheduling and communication. However, many scientists find workflow systems difficult to use and often choose to run tasks manually or develop their own workflow solutions.
Users provide natural language descriptions, and LLMs generate corresponding workflow configuration files. For example:
User Prompt: I want a 3-node workflow with one producer and two consumer tasks.
The producer generates mesh and particle datasets, consumer1 reads mesh data,
consumer2 reads particle data. The producer requires 3 processes, each consumer
runs on a single process. Please provide a workflow configuration file for the
Wilkins workflow system.
This research cites 33 relevant papers covering important works in scientific workflows, large language models, HPC, and other related fields, providing a solid theoretical foundation for the research.
Summary: This is a pioneering research paper that systematically evaluates large language models' capabilities in the scientific workflow domain for the first time. The research reveals significant limitations of LLMs while also demonstrating the potential for performance improvement through appropriate techniques (such as few-shot prompting), laying the foundation for future research in this important area.