A classic, central Service-Oriented Computing (SOC) challenge is the service composition problem. It concerns solving a user-defined task by selecting a suitable set of services, possibly found at runtime, determining an invocation order, and handling request and response parameters. The solutions proposed in the past two decades mostly resort to additional formal modeling of the services, leading to extra effort, scalability issues, and overall brittleness. With the rise of Large Language Models (LLMs), it has become feasible to process semistructured information like state-of-practice OpenAPI documentation containing formal parts like endpoints and free-form elements like descriptions. We propose Compositio Prompto to generate service compositions based on those semi-structured documents. Compositio Prompto acts as an encapsulation of the prompt creation and the model invocation such that the user only has to provide the service specifications, the task, and which input and output format they expect, eliminating any manual and laborious annotation or modeling task by relying on already existing documentation. To validate our approach, we implement a fully operational prototype, which operates on a set of OpenAPIs, a plain text task, and an input and output JSON schema as input and returns the generated service composition as executable Python code. We measure the effectiveness of our approach on a parking spot booking case study. Our experiments show that models can solve several tasks, especially those above 70B parameters, but none can fulfill all tasks. Furthermore, compared with manually created sample solutions, the ones generated by LLMs appear to be close approximations.
Methodology (summarized): We perform an automated service composition for parking spot booking using LLMs for the study. There are six parking services and two payment services. The six parking services are duplicated with different distances and prices to create distinct sets 1 and 2. We define eight prompts and perform the composition using 14 different LLMs. We use a best-of-three-shot approach to reduce the influence of randomness. Finally, we assess functionality manually and apply code similarity metrics to a manually crafted sample solution. All experiments are described in detail in the full paper.
Content:
code/:Code to perform the experiments. For details, see "code/README.md".
code/evaluation/prompt_generation.py:Services and prompt generation.
code/evaluation/sample_solution/:Sample solution and similarity evaluation.
results/*:Results for the runs with the LLMs. Filename structure: "results/{model_name}-{run}/prompt_{prompt_number}set{set_number}_{artifact}". The artifact can be "code_0.py" for the generated code, "code_1.py" if tasked to improve the code, or "prompt.txt" for the used prompt. For the best run, the code metrics are in "comparison.json".
prompt_template.txt:Pseudo code for the prompt template. Implemented in code/evaluation/prompt_generation.py.
Note: Please use the tree view to access the files.
License: License for the "code/": MIT. License for the "results/": CC BY 4.0.