Topics for Theses and Projects

Our topics for bachelor and master theses as well as projects are from the areas of software engineering and programming languages. The concrete topics for theses are based on our research interests and allow students to make their own contribution to a field of research. Our main target group are students of Computer Science, Software Engineering, Media Informatics, Artificial Intelligence, and Cognitive Systems.

This page offers a selection of topics and subject areas. For more information, please do not hesitate to contact the respective supervisor. In addition, we are open to your own suggestions for topics.

^{(Legend - B: Bachelor Thesis, M: Master Thesis, P: Project)}

Outline of a Bachelor or Master Thesis

Topic and Proposal

After the initial contact, the topic and the contents of the Bachelor or Master thesis are agreed upon and recorded in a proposal. The proposal has proven to be a valueable tool for risk minimization and planning of the thesis and includes:

the context of the thesis
the research question
the state of research
the solution idea
the methodology and the evaluation plan

Much of the proposal can be reused in the final thesis.

Interim Presentation

In the course of the interim presentation, students learn to present and communicate results. In addition, the interim presentation can be used to reflect on the status of the thesis so far and to respond to feedback.

Submission and Final Presentation

The submission of the final thesis and the final presentation formally conclude the bachelor or master thesis.

(Legend - B: Bachelor thesis, M: Master thesis, P: Project)

Jump to...

Human-centered Software Engineering

Algorithm Detection

[RESERVED] B/M: Detecting Non-Trivial Algorithms in Code: How Far Can LLMs Go? (Neumüller, Tichy)

Context
We are convinced that automatically detecting algorithm implementations in a code base can be helpful to gain knowledge about, which concerns are present in the code base, how they are solved and which components are involved. This knowledge can then support the tasks of software comprehension, software architecture recovery and software maintenance.
Examples of algorithms that could be interesting to detect and some of the insights their detection provides:

Quicksort -> The application sorts data structure x.
A* -> The application does path search.
Raft (Consensus Algorithm) -> The application is a distributed, fault-tolerant system.

Research Question
Large Language Models (LLMs) achieve impressive results on code-related tasks such as code clone detection, code summarization and code generation.
Our first experiments using LLMs for algorithm detection indicate promising performance, with F1 scores of about 77%.
However, our evaluation has so far focused only on smaller algorithms such as Binary Search, Bubble Sort, Matrix Transposition etc.
We are now interested in evaluating LLMs on more complex algorithms implemented in real code bases.
Therefore, the main research question of this thesis is:

How do LLMs perform in recognizing complex algorithms in source code bases?

Tasks

Create a dataset of more complex algorithms (e.g. Levenshtein distance, Raft), preferably from real open-source projects, using code from GitHub, Maven, or other web resources.
Adapt the current LLM evaluation code written in Python to support different evaluation strategies, such as file-based and API-based evaluations.
Evaluate different LLMs (e.g., Deepseek, Mixtral, LLaMA3, ChatGPT) using the dataset and the implemented evaluation strategies.
Furthermore, we are interested in explaining the predictions of LLMs, (especially for failure cases).

No prior knowledge of machine learning (ML) is required for this thesis. However, you should be open to familiarizing yourself with a new subject area.

Related Work

Publications regarding the (automated) curation of datasets:
Publications regarding the application of LLMs to (other) code related tasks:
- https://ieeexplore.ieee.org/document/10576984
- https://arxiv.org/abs/2308.01191
Examples of LLM API interaction:
- https://dl.acm.org/doi/pdf/10.1145/3650212.3680384
- https://arxiv.org/abs/2408.02232
Publications regarding explanations:

Contact
If you are interested and/or have any questions, feel free to contact me any time.
We can discuss the topic further and try to adapt it to your personal preferences.

Denis Neumüller

Dynamic and Static Program Analysis

M: Using Program Slicing to Improve Conventional Coverage (Sihler, Tichy)

Context

Classical code coverage techniques are widely used to give an indication of the quality of a test suite (with respect to its capability in detecting faults within the system under test).
However, they are not always effective and can be outright misleading [1, 2]. Consider a scenario in which you simply execute code within a unit test but never check whether the resulting state is as expected.
While (at least for most languages) this still ensures that the code does not raise a runtime exception, this is way too weak for most use-cases.

As one possible solution, Schuler and Zeller introduced the concept of checked coverage [3], which uses dynamic slicing to only consider the executed code that contributed to something which is checked with an assertion. Later work by Zhang and Mesbah [4] showed that such checked coverage is indeed a strong indicator of the effectiveness of a test suite.

Research Problem

In this work, we want to look at an alternative to dynamic slicing - static slicing, which does not execute the code and hence over-approximates all potential executions. In contrast to a dynamic slice on an assertion, the static slice contains all the code that could potentially affect the assertion, even if it is not executed in a particular test run. This way, we obtain additional information about the test suite and the system under test, yet we do not know whether this information is useful to gauge the effectiveness of the test suite.

Tasks

More specifically the intended steps of this work are as follows:

Consider a well-supported language such as Java alongside an appropriate 1) static slicer, 2) dynamic slicer, and 3) coverage tool.
Use these tools to compute the static and dynamic slices for test-suites of real-world Java projects.
Analyze, compare, and discuss the results of the static and dynamic slices with respect to the code coverage and the effectiveness of the test suite (e.g., using mutation testing).

Related Work and Further Reading

Contact and More

If you are interested and/or have any questions, feel free to contact me any time.
We can discuss the topic further and try to adapt it to your personal preferences.

Florian Sihler (Institute Homepage)

B/M: Dead/Unused Code Detection for R (Sihler, Tichy)

Context
Dead or unused code describes all parts of a program that are either never executed at runtime or never used in any way.
Such parts hinder not only the readability of the program — requiring users first to realize that such snippets do not actually contribute to the program's functionality — but also the maintainability of the code, as users may be afraid that the code could be used in some way [1, 2].
Additionally, dead code can lead to performance issues, as it may cause the definitions of many unused functions, the loading of unnecessary datasets, or the calculation of values that are never used.

Research Problem
Given an R program and information on its control- and dataflow, identify the dead and unused code within the program adapting well known algorithms to the dynamic nature of the R programming language [3,4,5]. For this, we use a static dataflow analyzer called flowR [6] which provides us with a control flow and a dataflow graph.

Tasks
More specifically, the task are:

Adapt the dead code elimination problem to the R programming language and discuss which code can be removed safely
Use flowR's control flow and dataflow graph to identify the dead code using your adapted algorithm
Evaluate and discuss the effectiveness of your algorithm on a set of real-world R programs

Related Work and Further Reading

Contact and More
If you are interested and/or have any questions, feel free to contact me any time.
We can discuss the topic further and try to adapt it to your personal preferences.

Florian Sihler (Institute Homepage)

B/M: Incremental and Performant Dataflow Analysis (Sihler, Tichy)

Context

Static program analysis provides us with information regarding the runtime behavior of a program without actually executing it. This is useful for various purposes, ranging from verifying the correctness of a program, over checking it for security issues to fixing issues that prevent the program from compiling or executing.
However, precise static analysis is often very expensive for larger programs making it impractical.
As a simple example, consider a case in which a method invocation may refer to a collection of implementations - based on a non-constant value that is only known at runtime. To be precise, a static analyzer has to consider every possible implementation for each method invocation, including their side-effects and transitive dependencies.

Luckily, a lot of existing research provides a plethora of possible solutions for this problem, proposing
the use of special operators [1], dynamic adjustments of the precision [2], or incremental analysis [3].

Research Problem

The goal of this thesis is to apply existing performance improvements to a static program analyzer for the R programming language and evaluating their effectiveness.
More specifically, the analyzer in question is flowR [4], which is developed here at the University of Ulm with the goal of statically handling R's powerful reflective capabilities and its deeply integrated dynamic nature.

Tasks

In the context of flowR, the main steps of the thesis are:

Familiarize yourself with flowR's architecture and analysis algorithm
Adapt and implement a selection of performance improvements to flowR's structure
Evaluate the performance characteristics of the adapted versions to discuss their impact on the analysis time and precision

Related Work and Further Reading

If you want to, you can have a first look at flowR for yourself [4]!

Contact and More

If you are interested and/or have any questions, feel free to contact me any time.
We can discuss the topic further and try to adapt it to your personal preferences.

Florian Sihler (Institute Homepage)

B/M: Static Analysis for Reflective or Self Modifying Code (Sihler, Tichy)

Context
Most static analyzers rely on static dataflow analysis to detect problems like possible null pointer exceptions in code [5].
However, analyzers are usually unable to handle reflective or self-modifying code (e.g., Java Agents, Java Reflection, R's meta-functions [6]). While this is fine for languages in which such constructs are rare or discouraged, they are 1) used quite often in the R programming language, 2) are in-part essential to track program semantics, 3) pose an interesting problem to solve.

Problem
As a basis, we use the static program analysis framework flowR which is designed for the R programming language [3]. flowR is currently unable to deal with reflective and code-modifying constructs like body, quote, and parse in its static dataflow graph.
While handling such constructs statically may be infeasible in the general case, we first want to focus on a set of common cases that appear frequently.

Tasks

Develop a concept to represent code-modifications and lazy evaluation (within flowR's dataflow graph). For example, to represent a function that has the default values of its arguments or the contents of its body modified.
Create a proof of concept implementation for this concept in flowR.

Related Work and Further Reading

If you want to, you can have a first look at flowR for yourself: https://github.com/flowr-analysis/flowr.

Contact and More
If you are interested and/or have any questions, feel free to contact me any time.
We can discuss the topic further and try to adapt it to your personal preferences.

Florian Sihler (Institute Homepage)

[RESERVED] B/M: Automatically Infer Code-Constraints (Sihler, Tichy)

Context

Let's suppose you are a data scientist tasked with the analysis of a dataset. As an expert of the domain you have a quick look at the dataset and remember an older script by a colleague which already loads, prepares, and transforms the dataset as you want! Reusing it just leaves you with the task of visualizing the data (as is the common workflow) so you quickly write up and run the script... and luckily realize that even though the script runs and produces figures eerily close to what you would expect, something is not right. The dataset of your colleague never contained a zero so the script contains the implicit assumption of being just able to divide cells.

Within this work we want to make such implicit assumptions explicit in the code, alerting users whenever they no longer hold!

Problem

You have an R script together with the statically inferred dataflow graph that informs you about the control and data dependencies of function calls, variables, and definitions in the program.

The challenges are to

identify points of interest at which the behavior of the program is defined,
infer contracts that represent the potential implicit assumptions at the given position (e.g., that the value of a variable has to be non-zero, smaller than the length of another vector, ...), and
instrument the code to automatically verify these constraints from now on.

Of course, the specific scope of these challenges as well as the focus depends on whether you want to do this as a bachelor's or master's thesis as well as your personal preference.

Tasks

Enrich flowR [4], a dataflow analysis framework for the R programming language, with the capability to infer predefined constraints
Create an initial collection of sensible constraints to infer (e.g., non-zero values, ...)
Infer these constraints and instrument the program to reflect them [5]

One way to infer such constraints would be the definition of abstract domains [1] although classical optimization techniques such as constant folding and constant propagation help as well [2, 3].

Related Work and Further Reading

P. Cousot. Principles of Abstract Interpretation. (ISBN: 978-0-26-204490-5)
K. Cooper and L Torczon. Engineering a Compiler. (ISBN: 978-0-12-818926-9)
U. Khedker, A. Sanyal, and B. Sathe. Data Flow Analysis: Theory and Practice. (ISBN: 978-0-8493-3251-7)
F. Sihler. Constructing a Static Program Slicer for R Programs.
B. Meyer, Applying "Design by Contract"

If you want to, you can have a first look at flowR for yourself: https://github.com/flowr-analysis/flowr.

Contact and More

If you are interested and/or have any questions, feel free to contact me any time.
We can discuss the topic further and try to adapt it to your personal preferences.

Florian Sihler (Institute Homepage)

P: Static Program-Analysis for Data Analysis Projects (Sihler, Tichy)

^{[2–6 Students] [AP SE] [PSE1] [PSE2] [German version below]}

Static analysis refers to the examination of programs for runtime properties without actually executing them.
It is an integral part of modern software development and aids in identifying bugs, security vulnerabilities, or improving readability.
Compilers use static analysis to, for example, avoid type errors or generate the most optimal code possible.
Development environments or language servers leverage static analysis to enable features such as refactoring or auto-completion.

In this project, we focus on flowR, a framework for the static analysis of R, a statistical programming language widely used in data analysis and visualization.
A detailed analysis of data and control flow enables flowR to, for instance, reduce a program to only the parts relevant for generating a graphic or computing a statistical model (a technique known as Program Slicing).

About flowR

Currently, flowR can be used and tested as a Visual Studio Code extension, RStudio add-in, or directly as a Docker image.

flowR is developed primarily in TypeScript under the GPLv3 license and is hosted on GitHub.
Documentation is provided through a dedicated Wiki and directly in the code.

Goal

In this project, we aim to extend flowR with support for R projects.
While the analysis of individual and multiple scripts is already supported, this includes in particular:

Integration with build systems like renv
Consideration of metadata such as DESCRIPTION or NAMESPACE files
Support for incremental updates (e.g., when individual files change)
Resolving package relationships and dependencies

This would enable flowR to be used in larger projects and, for instance, analyze the data flow while accounting for correct package dependencies.

German Version

Statische Programm-Analyse für Projekte

Statische Analyse bezeichnet die Untersuchung von Programmen auf Laufzeiteigenschaften ohne diese tatsächlich auszuführen. Sie ist ein integraler Bestandteil moderner Softwareentwicklung und hilft beim Identifizieren von Fehlern, Sicherheitslücken oder dem Verbessern der Lesbarkeit. Compiler verwenden statische Analyse beispielsweise, um Typfehler zu vermeiden oder möglichst optimalen Code zu generieren. Entwicklungsumgebungen oder Language Server verwenden statische Analyse, um Ihre Funktionalität wie Refactorings oder Autovervollständigung zu realisieren.

In diesem Projekt widmen wir uns flowR, einem Framework für die statische Analyse von R, einer statistischen Programmiersprache die häufig in der Datenanalyse und -visualisierung eingesetzt wird. Eine ausgiebige Analyse des Daten- und Kontrollflusses ermöglicht es flowR beispielsweise ein Programm nur auf die Teile zu reduzieren, die für die Generierung einer Grafik oder die Berechnung eines statistischen Modells relevant sind (das sogenannte Program Slicing).

Über flowR

Aktuell kann flowR als Erweiterung für Visual Studio Code und RStudio, sowie direkt als Docker Image verwendet und ausprobiert werden.

flowR wird unter der GPLv3 Lizenz auf Github hauptsächlich in der Programmiersprache TypeScript entwickelt. Die Dokumentation erfolgt über ein dediziertes Wiki und direkt im Code.

Ziel

Im Anwendungsprojekt wollen wir flowR um eine Unterstützung für R Projekte zu erweitern.
Während die Analyse von einzelnen und auch mehreren Skripten bereits unterstützt wird, zählt hierzu insbesondere die:

Integration von Build-Systemen wie renv
Berücksichtigung von Metadaten wie DESCRIPTION oder NAMESPACE Dateien
Unterstützung von inkrementellen Aktualisierungen (z.B. wenn sich einzelne Dateien ändern)
Auflösung von Paketbeziehungen und Abhängigkeiten

Auf diese Weise kann flowR auch in größeren Projekten eingesetzt werden und beispielsweise den Datenfluss unter der Berücksichtigung der richtigen Paketabhängigkeiten analysieren.

Self-adaptive Systems

M: Generating PCM Models of Self-Adaptive Microservices (Straub, Tichy)

Context

Self-Adaptive systems are systems that adjust themselves to maintain or improve their performance in response to changes in their environment and operational conditions, thereby ensuring continued effectiveness, reliability, and efficiency.
Self-adaptive systems are diverse and multifaceted, with applications extending across numerous fields. In our project, we concentrate on the cloud-native domain, with a special emphasis on the explainability aspect of self-adaptation. This involves delving into how these systems can not only adjust autonomously to changing conditions but also provide transparent and understandable explanations for their adaptations, ensuring clarity and trust in their operations.

In the MENTOR project, we use Palladio Component Models (PCM) of Self-Adaptive Microservices to simulate the system. These simulations are used to analyze and improve the performance of the systems reconfiguration behavior. To test the performance of our appraoch, we need PCM models of different systems to ensure generalizability and detect potential issues with our approach.

Problem

In this Master thesis, a prototype implementation of a PCM model generator has to be created. This prototype should get a set of constraints (e.g., N_min≤∣C∣≤N_max; ∀s∈S:λ_min≤ArrivalRate(s)≤λ_max;∀(c_i,c_j)∈PotentialConnections:L_min≤Latency(c_i,c_j)≤L_max;∀c∈C_pattern∖{c_center},c→c_center) and generate one or multiple pcm models satisfying the constraints. The generated models then have to be verified to ensure that they are well formed.

To generate the models, we imagine a combination of a constraint solver and a LLM. However, we are also open to other appraoches, for example using Answer Set Programming.

Tasks/Goals

Familiarization with the PCM meta model
Familiarization with constraint solver, LLMs, etc.
Implement a Prototype
Evaluate the Implementation
Implement a Prototype
Evaluate the Implementation

Context

Problem

Our state graphs get too large to display. We need to visualize all states in a comprehensive and understandable way. Ensure the visualization is intuitive and easy to understand

Tasks/Goals

Develop visualization concepts for Large State Graphs
Implement the delevoped visualization concepts
Develop the solution in React/Typescript

Configurable Software Systems

Identification of Feature Interactions in Configurable Software

Context

In software product lines, variability is determined by features that can be included or excluded to create a specific configuration. T-wise sampling algorithms aim to generate samples that cover every possible combination of t features in at least one configuration within the sample set.

Problem

Some products/configurations may contain bugs resulting from errors that arise when multiple features are combined. These faulty feature interactions can potentially mask other errors. Currently, error masking/shadowing in configurable software is understudied that is why we want to investigate this topic.

Tasks

Research on occurencies of complex feature interaction bugs and error masking bugs in configurable software systems
Detect, compare, and discuss the complex feature interactions

Contact

Sabrina Böhm

Context

In the paper "A classification of product sampling for software product lines" (SPLC'18)", Mahsa et al. propose a classification for product sampling techniques by classifying the existing literature.

Problem

Since 2018, more recent publications have emerged addressing product line sampling, alongside new research topics of growing interest.

Tasks

Literature research
Investigate and classify new research topics and research gaps of growing interest

Contact

Sabrina Böhm

Constraint-Programmierung und Constraint Handling Rules

FreeCHR

The declarative programming language Constraint Handling Rules (CHR) is designed as a language extension to other, not necessarily declarative programming languages. There are existing implementations for Prolog, C, Java, JavaScript, and others. We want to conduct a Structured Literature Research (SLR) on existing implementations, to get an exhaustive overview over implementation techniques and patterns.

Goals

Conduct an SLR on papers on existing CHR implementations
Find CHR implementations without a scientific publication on public repositories on, e.g. GitHub, GitLab, ...
Identify and document common architectures, implementation techniques and patterns
Get an exhaustive overview over existing and historic CHR implementations

References

Prerequesites

Interest in programming languages and (to some extend) compiler construction.
Good knowledge of multiple programming languages and paradigms.

Contact

Sascha Rechenberger

FreeCHR aims to be a sound and complete embedding framework for CHR. Hence, we want to extend the operational semantics by possibly failing computation, as they are necessary for the development of constraint solvers and other software.

Goals

Extend the very abstract operational semantics of FreeCHR, such that they can model possibly failing computations
Prove soundness and completeness w.r.t. the very abstract operational semantics of CHR
Optional: Develop an execution algorithm and prove correctness w.r.t. the new operational semantics

References

S. Rechenberger, T. Frühwirth: FreeCHR - An Algebraic Framework for CHR Embeddings
T. Frühwirth: Constraint Handling Rules (ISBN: 978-0-521-87776-3)

Prerequesites

Interest in formal aspects of programming languages
Interest/knowledge in category theory and/or type systems is recommended
Knowledge in functional programming, especially monads

Contact

Sascha Rechenberger

FreeCHR aims to be a sound and complete embedding framework for CHR. Hence, we want to extend the operational semantics by stateful computation, as they are common in many programming languages.

Goals

Extend the very abstract operational semantics of FreeCHR, such that they can model stateful computations
Prove soundness and completeness w.r.t. the very abstract operational semantics of CHR
Optional: Develop an execution algorithm and prove correctness w.r.t. the new operational semantics

References

S. Rechenberger, T. Frühwirth: FreeCHR - An Algebraic Framework for CHR Embeddings
T. Frühwirth: Constraint Handling Rules (ISBN: 978-0-521-87776-3)

Prerequesites

Interest in formal aspects of programming languages
Interest/knowledge in category theory and/or type systems is recommended
Knowledge in functional programming, especially monads

Contact

Sascha Rechenberger

FreeCHR aims to be a sound and complete embedding framework for CHR. The abstract operational semantics are a next step in the direction of modelling the necessities of real-life programming languages. Hence, we want to re-formalize them in the context of FreeCHR and establish soundness and completeness.

Goals

Formalize the abstract operational semantics ω_tof CHR in the terms of (side-effect free) FreeCHR
Prove soundness and completeness of the new definition

References

S. Rechenberger, T. Frühwirth: FreeCHR - An Algebraic Framework for CHR Embeddings
T. Frühwirth: Constraint Handling Rules (ISBN: 978-0-521-87776-3)

Prerequesites

Interest in formal aspects of programming languages
Interest/knowledge in category theory and/or type systems is recommended

Contact

Sascha Rechenberger

Constraint Handling Rules

We are developing a rule-based implementation of a tool to analyse and generate graphs. It is used in the domain of mason’s marks. For thousands of years, stonemasons have been inscribing these symbolic signs on dressed stone. Geometrically, mason’s marks are line drawings. They consist of a pattern of straight lines, sometimes circles and arcs. We represent mason’s marks by connected planar graphs. Our prototype tool for analysis and generation of graphs is written in the rule-based declarative language Constraint Handling Rules. One or several of following features could be improved in this proposed work:

Goals/Tasks

improve the vertex-centric logical graph representation, i.e. adding arcs, colors, labels,
encode existing mason marks either by hand, from images or import from existing databases,
recognize (sub)graphs and patterns in a graph, in particular (self-)similarities and correlations between graphs,
perform classical algorithms on graphs like shortest paths,
derive properties and statistics from graphs,
generate (randomly or exhaustively) and draw graphs from given constrained subgraphs based on properties and statistics.

References

Thom Frühwirth: Rule-Based Drawing, Analysis and Generation of Graphs for Mason's Mark Design, 2018

Prerequesites

Good knowledge of Prolog and CHR
Lecture Rule-based Programming

Contact

Thom Frühwirth, Sascha Rechenberger

P/B/M: Graph Tool for Mason Marks (Frühwirth)

Extend the analysis techniques and/or the associated tool from the following two research papers:

A dynamic program analysis of the non-termination problem for recursion in the Constraint Handling Rules (CHR) language: A simple program transformation for recursive rules in CHR was introduced that produces one or more adversary rules. When the rules are executed together, a non-terminating computation may arise. It was shown that any non-terminating computation of the original rule contains this witness computation.

References

Thom Fruehwirth: A Devil's Advocate against Termination of Direct Recursion, PPDP 2015.
Transformation Tool available (use "Devil" options).

A static program analysis of the non-termination problem for recursion in the Constraint Handling Rules (CHR) language: Theorems with so-called misbehavior conditions for potential non-termination and failure (as well as definite termination) of linear direct recursive simplification rules are given. Logical relationships between the constraints in a recursive rule play a crucial role in this kind of program analysis.

References

Thom Fruehwirth: Why Can’t You Behave? Non-Termination Analysis of Direct Recursive Rules with Constraints, RuleML 2016

Prerequesites

Lecture "Rule-Based Programming"

Contact

Thom Frühwirth, Sascha Rechenberger

B/M: Non-Termination Analysis of Recursive Rules (Frühwirth)

Invariants (or assertions, properties, conditions) annotate program text and express static and dynamic properties of a program's execution. Invariants can be expressed as logical relations (predicates) over the program's variables. In the context of constraint-programming and Constraint Handling Rules (CHR), they amount to constraints. These can be readily added to the program to enforce the invariants. By comparing the program with and without invariants expressed as constraints using established program analysis techniques for CHR, namely confluence and program equivalence, we can check if the invariants hold in the program.

Furthermore, invariants can be strenghened and even be generated by adapting the so-called completion method (that is normally used to generate additional rules to make a CHR program confluent).

References

Johannes Langbein, Frank Raiser, Thom Frühwirth: A state equivalence and confluence checker for CHR. In P. Van Weert and L. De Koninck, editors, CHR '10: Proc. 7th Workshop on Constraint Handling Rules. K.U.Leuven, Department of Computer Science, Technical report CW 588, July 2010.

Prerequisites

Lecture "Rule-based Programming"

Contact

Thom Frühwirth, Sascha Rechenberger

Program slicing is a program anaylsis technique whereby one extracts properties and relationships of variables in a program by removing from the program all statements that do not effect the assignments of the variables. In the context of constraint programming and Constraint Handling Rules that deal with logical relations (predicates) this amounts to the logical operation of variable projection. This means that we remove unwanted variables that are not of interest from the program by program transformation. This transformation can be accomplished by adapting the technique of "completion". It is usually used to make a non-confluent program confluent.

References

Johannes Langbein, Frank Raiser, Thom Frühwirth. A state equivalence and confluence checker for CHR. In P. Van Weert and L. De Koninck, editors, CHR '10: Proc. 7th Workshop on Constraint Handling Rules. K.U.Leuven, Department of Computer Science, Technical report CW 588, July 2010.

Prerequisites

Lecture "Rule-based Programming

Contact

Thom Frühwirth, Sascha Rechenberger

B/M: Program Slicing by Confluence and Completion (Frühwirth)

Repeated recursion unfolding is a new approach that repeatedly unfolds a recursion with itself and simplifies it while keeping all unfolded rules. Each unfolding doubles the number of recursive steps covered. This reduces the number of recursive rule applications to its logarithm at the expense of introducing a logarithmic number of unfolded rules to the program. Efficiency crucially depends on the amount of simplification inside the unfolded rules. A super-linear speedup is provably possible in the best case, i.e. speedup by more than a constant factor. The optimization can lower the time complexity class of a program.

Goals

The goal is to implement this optimization scheme as a program transformation in the programming language of choice. If necessary, the scheme should be transferred from recursion to iteration constructs such as loops.

References

Thom Frühwirth: Runtime Repeated Recursion Unfolding in CHR: A Just-In-Time Online Program Optimization Strategy That Can Achieve Super-Linear Speedup, 2023 (DOI: 10.48550/arXiv.2307.02180)

Prerequisite

Excellent knowledge and programming skills in the choosen programming language.

Contact

Thom Frühwirth, Sascha Rechenberger

Prolog (WAM) and then Java (JVM) popularized the concept of an abstract (or virtual) machine to implement programming languages in a systematic, portable yet efficient way. Such a machine shall be developed for CHR.

Goals

Define the abstract code instructions for CHR and to implement them.

Prerequesites

Good knowledge of Prolog and CHR
Lecture Rule-based Programming
Lecture Compiler Construction (Compilerbau) (optional but very helpful)

Contact

Thom Frühwirth, Sascha Rechenberger

Quality-driven System Evolution

Architecture-based Modelling and Analysis of Security

Motivation
Moderne Softwaresysteme sind zunehmend komplex und vernetzt, was eine umfassende Analyse zur Sicherstellung von Qualit ¨atsmerkmalen wie Vertraulichkeit unerlässlich macht. Eine architekturbasierte Vertraulichkeitsanalyse ermöglicht die frühzeitige Erkennung potenzieller Verstöße gegen die Vertraulichkeit und trägt somit zur effektiven Vermeidung von Datenlecks bei. Allerdings erschweren Unsicherheiten im System selbst sowie in seiner Umgebung eine präzise und vollständige Analyse der Softwarearchitektur. Die hohe Komplexität architektonischer Modelle und die Vielzahl möglicher Unsicherheiten führen darüber hinaus zu einer kombinatorischen Explosion, die eine automatisierte Modellreparatur erheblich erschwert. In der Praxis müssen Softwarearchitekt:innen daher alle Verstöße manuell beheben – ein aufwändiger und fehleranfälliger Prozess. Zwar existieren bereits Ansätze zur Identifikation von Vertraulichkeitsverletzungen unter Unsicherheit, jedoch fehlt es bislang an wirksamen Methoden zur Minderung ihrer Auswirkungen.

Aufgabenstellung
Im Rahmen dieser Arbeit entwickeln Sie einen innovativen Ansatz zur automatisierten Behebung von Vertraulichkeitsverletzungen in Datenflussdiagrammen unter Nutzung von Sprachmodellen wie GPT. Dazu wird das Modell in einem strukturierten Format (z.B. JSON) an das
Sprachmodell übergeben. Das Sprachmodell soll daraufhin einen Lösungsvorschlag generieren. Die Qualität der vorgeschlagenen Lösung wird anschließend mithilfe eines von uns bereitgestellten Analyseverfahrens überprüft. Basierend auf dieser Bewertung wird ein iterativer Feedback-Loop mit dem Sprachmodell aufgebaut, um die Ergebnisse sukzessive zu verbessern.

Kontakt
Bei Interesse und/oder Fragen können Sie mich jederzeit kontaktieren. Wir können das Thema gerne weiter besprechen und versuchen, es an Ihre persönlichen Wünsche anzupassen.

Robert Heinrich (Institute Homepage)

Motivation
Moderne Softwaresysteme sind zunehmend komplex und vernetzt, was eine umfassende Analyse zur Sicherstellung von Qualitätsmerkmalen wie Vertraulichkeit unerlässlich macht. Eine architekturbasierte Vertraulichkeitsanalyse ermöglicht die frühzeitige Erkennung potenzieller Verstöße gegen die Vertraulichkeit und trägt somit zur effektiven Vermeidung von Datenlecks bei. Allerdings erschweren Unsicherheiten im System selbst sowie in seiner Umgebung eine präzise und vollständige Analyse der Softwarearchitektur. Die hohe Komplexität architektonischer Modelle und die Vielzahl möglicher Unsicherheiten führen darüber hinaus zu einer kombinatorischen Explosion, die eine automatisierte Modellreparatur erheblich erschwert. In der Praxis müssen Softwarearchitekt:innen daher alle Verstöße manuell beheben – ein aufwändiger und fehleranfälliger Prozess. Zwar existieren bereits Ansätze zur Identifikation von Vertraulichkeitsverletzungen unter Unsicherheit, jedoch fehlt es bislang an wirksamen Methoden zur Minderung ihrer Auswirkungen.

Aufgabenstellung
Ziel dieser Arbeit ist die Entwicklung eines Verfahrens zur automatisierten Behebung von Vertraulichkeitsverletzungen in Softwarearchitekturen mithilfe diskreter Optimierungstechniken. Dabei sollen gezielte Modelländerungen vorgenommen werden, um alle identifizierten Verstöße zu beseitigen – und gleichzeitig die Änderungskosten so gering wie möglich zu halten. Hierfür ist zunächst eine geeignete Kostenfunktion für Modelländerungen zu definieren. Anschließend soll ein Optimierungsansatz (z.B. basierend auf Integer Linear Programming oder Constraint Solving) entwickelt und implementiert werden, der eine optimale Reparaturlösung ermittelt. Die Qualität der Lösung wird mithilfe einer bestehenden Analyse validiert und kann mit anderen Reparaturansätzen verglichen werden.

Kontakt
Bei Interesse und/oder Fragen können Sie mich jederzeit kontaktieren. Wir können das Thema gerne weiter besprechen und versuchen, es an Ihre persönlichen Wünsche anzupassen.

Robert Heinrich (Institute Homepage)

Motivation
Choosing inappropriate configurations of blockchain systems for decentralized applications (DApps) can entail serious issues, including exposure to attacks, too slow performance, and high costs. Such inappropriate blockchain system configurations can even require migrating to a new system, which is extremely difficult once a blockchain system is deployed. Every node would need to set up a completely new blockchain system, and crucial data could get lost. To configure blockchain systems that meet DApp requirements on dependability, decentralization, and performance, software architects need tooling to simulate and analyze the influence of blockchain system configurations on the behaviors and quality of blockchain systems before deployment. In particular, novel modeling, simulation, and prediction approaches are needed to estimate and explain the behavior of blockchain systems to ensure they meet DApp requirements on dependability before setup.

Topic Overview
Within the context of developing suitable blockchain systems for DApps, several topics for Bachelor/Master thesis are available as listed in the following. In addition to these topics, we will be happy to discuss your own ideas for thesis topics on blockchain technology.

DApp Archetypes: Develop archetypal DApps that can be used to bootstrap productive DApps.
Mitigation of Security Risks by Design: Develop new security analyses for blockchain systems to help improve their robustness against common attacks (e.g., selfish mining attacks) by design.
Modeling and Simulation: Design a modeling language to represent blockchain systems or DApps using the Palladio modelling and analysis approach. Implement an extension for the Palladio analysis tool to assess blockchain systems for security vulnerabilities or performance bottlenecks. Evaluate your extension of the Palladio tool.
Trade-offs in Blockchain Systems: Develop new analysis techniques to assess the quality attributes and behaviors of blockchain systems with a focus on the blockchain trilemma.

Contact and More
If you are interested and/or have any questions, feel free to contact me any time. We can discuss the topic further and try to adapt it to your personal preferences.

Robert Heinrich (Institute Homepage)

Topics for Theses and Projects

Outline of a Bachelor or Master Thesis

Topic and Proposal

Interim Presentation

Submission and Final Presentation

Jump to...

Human-centered Software Engineering

Algorithm Detection

[RESERVED] B/M: Detecting Non-Trivial Algorithms in Code: How Far Can LLMs Go? (Neumüller, Tichy)

Dynamic and Static Program Analysis

M: Using Program Slicing to Improve Conventional Coverage (Sihler, Tichy)

B/M: Dead/Unused Code Detection for R (Sihler, Tichy)

B/M: Incremental and Performant Dataflow Analysis (Sihler, Tichy)

B/M: Static Analysis for Reflective or Self Modifying Code (Sihler, Tichy)

[RESERVED] B/M: Automatically Infer Code-Constraints (Sihler, Tichy)

P: Static Program-Analysis for Data Analysis Projects (Sihler, Tichy)

About flowR

Goal

German Version

Statische Programm-Analyse für Projekte

Über flowR

Ziel

Self-adaptive Systems

M: Generating PCM Models of Self-Adaptive Microservices (Straub, Tichy)

P: Scalable Visualization for State-Graphs (Straub, Tichy)

Configurable Software Systems

Identification of Feature Interactions in Configurable Software

P/B/M: Exploring Complex Feature Interactions in Software Product Lines (Böhm, Tichy, Thüm)

B/M: Survey on the Analysis of Sampling in Software Product Lines (Böhm, Tichy, Thüm)

Constraint-Programmierung und Constraint Handling Rules

FreeCHR

M: Structured Literature Research on CHR Implementations (Rechenberger, Frühwirth)

B/M: Failure in FreeCHR (Rechenberger, Frühwirth)

B/M: Stateful Computations in FreeCHR (Rechenberger, Frühwirth)

M: Abstract Operational Semantics for FreeCHR (Rechenberger, Frühwirth)

Constraint Handling Rules

P/B/M: Graph Tool for Mason Marks (Frühwirth)

B/M: Non-Termination Analysis of Recursive Rules (Frühwirth)

P/B/M: Invariant Checking and Generation by Confluence and Completion (Frühwirth)

B/M: Program Slicing by Confluence and Completion (Frühwirth)

M: Repeated Recursion Unfolding for Super-Linear Speedup (Frühwirth)

M: CHR Abstract Machine (Frühwirth)

Quality-driven System Evolution

Architecture-based Modelling and Analysis of Security

B/M: Beheben von Vertaulichkeitsverletzungen mittels Sprachmodellen (Heinrich)

B/M: Beheben von Vertaulichkeitsverletzungen mittels diskreter Optimierung (Heinrich)

B/M: Designing Blockchain Systems Suitable for Decentralized Software Applications (Heinrich)