Leveraging Generative Vision Models for Extracting Process Models from Documents

Ulm University

Presentation at BPM 2024; Marius Breitmayer, Krakau, Poland, 02 Sptember 2024

This paper investigates the vision capabilities of multimodal Generative Pre-trained Transformers (GPTs) to auto-generate structured process models from diagram- and text-based documents. We introduce a dataset of 123 process models and corresponding documentation, emphasizing real-world element distributions. Using evaluation metrics for process model similarity, this enables ground truth-based assessment of process model generation. We evaluate commercial GPT capabilities with zero-, one-, and few-shot prompting strategies. Our results indicate that generative vision models can be useful tools for semi-automated process modeling based on multimodal documents.
More importantly, the dataset and evaluation metrics as well as the open-source evaluation code provide a structured framework for continued systematic evaluations moving forward.

Privacy Settings

This website uses cookies. Strictly Necessary Cookies are essential for the functionality of the website. In accordance with Art. 6, 1a) GDPR, you can choose to give your consent to a Video Cookie. The Video Cookie is set to offer embedded third-party content from video providers (e.g. YouTube, Vimeo) without further access protection. You can revoke your consent at any time. To do so, simply go to our privacy policy page (link below) and change the cookie settings.

or

or

Functional

in2code

Provider: in2code GmbH, Kunstmühlstraße 12a, 83026 Rosenheim, Deutschland

External Video Content

in2code

Provider: in2code GmbH, Kunstmühlstraße 12a, 83026 Rosenheim, Deutschland

Chatbot Assistant

ChatBot

Provider: Kauz GmbH
processed by: International Office der Universität Ulm, Helmholtzstraße 16, 89081 Ulm, Deutschland