Alexander Heß

Alexander Heß received his master's degree in computer science from Ulm University in 2020. He is currently employed as a research assistant at the Institute of Distributed Systems.

Research Interests

  • Fault-Tolerant Distributed Systems
    • State-Machine Replication in Cloud Environments
    • Client-Interaction in SMR-Services
    • Replica Reconfiguration
  • Trusted Computing
    • Intel SGX
    • Trusted Platform Modules

Teaching

Lab Courses

Seminars

Projects

  • SORRIR (2020/09 - 2022/12, completed) - A Self-Organizing and Resilient Execution Environment for IoT Services

Publications

2024

Heß, A., Hauck, F.J. and Meißner, E. 2024. Consensus-agnostic state-machine replication. 25th ACM/IFIP Int. Middleware Conf. (Hong Kong, China, Dec. 2024).
State-machine replication (SMR) is a popular fault-tolerance technique for building highly-available services. Usually, consensus protocols are used to enforce a deterministic service-request ordering among replicas, in order to prevent their state from diverging. Over the last decades, a multitude of consensus protocols have been developed which come with different characteristics but also with different communication and programming models. Our Consensus-Agnostic Replication Toolkit (CART) is a wrapper for consensus protocols that relieves clients from most consensus configuration and support. Besides, it implements a generic client and application interface to support different consensus protocols and configurations, e.g. in cloud deployments. CART has built-in authentication of services based on BLS threshold signatures. It can further prove malicious behaviour of replicas, thus speeding up recovery in case of Byzantine faults. We evaluate the performance overhead of our approach in a real-world WAN deployment for two different consensus protocol implementations using the YCSB benchmark. Our results show that CART is able to reach up to 90% of the throughput achieved by the native consensus protocol with an additional latency overhead of only 10%.
Hauck, F.J. and Heß, A. 2024. Linearizability and state-machine replication: Is it a match? ArXiv.org.
Hauck, F.J. and Heß, A. 2024. Linearizability and state-machine replication. Workshop on Resilient Oper. - Byz. Fault Tol. and State-Machine Repl. – ROBUST (Mar. 2024).
Heß, A. and Hauck, F.J. 2024. A framework for consensus-agnostic state-machine replication based on threshold signatures. Workshop on Resilient Oper. - Byz. Fault Tol. and State-Machine Repl. – ROBUST (Mar. 2024).

2023

Heß, A. and Hauck, F.J. 2023. Towards a Cloud Service for State-Machine Replication. Tagungsband des FG-BS Frühjahrstreffens 2023 (Bonn - Germany, 2023).
State-machine replication (SMR) is a well-known technique to achieve fault tolerance for services that require high availability and fast recovery times. While the concept of SMR has been extensively investigated, there are still missing building blocks to provide a generic offer, which automatically serves applications with SMR technology in the cloud. In this work, we introduce a cloud service architecture that enables automatic deployment of service applications based on customer-friendly service parameters, which are mapped onto an internal configuration that comprises the number of replicas, tolerable failures, and the consensus algorithm, amongst other aspects. The deployed service configuration is masked to large extent with the use of threshold signatures. As a consequence, a reconfiguration in the cloud deployment does not affect the client-side code. We conclude the paper by discussing open engineering questions that need to be addressed in order to provide a productive cloud offer.

2021

Mödinger, D., Heß, A. and Hauck, F.J. 2021. Arbitrary Length k-Anonymous Dining-Cryptographers Communication. CoRR. abs/2103.17091, (Mar. 2021).
Dining-cryptographers networks (DCN) can achieve information-theoretical privacy. Unfortunately, they are not well suited for peer-to-peer networks as they are used in blockchain applications to disseminate transactions and blocks among par- ticipants. In previous but preliminary work, we proposed a three- phase approach with an initial phase based on a DCN with a group size of k while later phases take care of the actual broadcast within a peer-to-peer network. This paper describes our DCN protocol in detail and adds a performance evaluation powered by our proof-of-concept implementation. Our contributions are (i) an extension of the DCN protocol by von Ahn for fair delivery of arbitrarily long messages sent by potentially multiple senders, (ii) a privacy and security analysis of this extension, (iii) various performance optimisation especially for best-case operation, and (iv) a performance evaluation. The latter uses a latency of 100 ms and a bandwidth limit of 50 Mbit s−1 between participants. The interquartile range of the largest test of the highly secured version took 35s ± 1.25s for a full run. All tests of the optimized common-case mode show the dissemination of a message within 0.5s ± 0.1s. These results compare favourably to previously established protocols for k-anonymous transmission of fixed size messages, outperforming the original protocol for messages as small as 2 KiB.
Heß, A., Hauck, F.J., Mödinger, D., Pietron, J., Tichy, M. and Domaschka, J. 2021. Morpheus: A Degradation Framework for Resilient IoT Systems. STAF Workshops (Virtual Event, Bergen - Norway, 2021), 105–114.
Graceful degradation is an established concept to improve the resilience of systems, especially when other resilience mechanisms have failed. Its implementation is often heavily tied to the application code and, thus, cumbersome and error prone. As IoT systems get not only ubiquitous but also critical, reliable graceful degradation would be ideal. In this paper, we present the Morpheus framework that provides a TypeScript-internal DSL to enable a systematic development of degradable IoT systems. The design of the framework is based on the concept of separation of concerns by providing distinct yet linked languages to specify hierarchical components and their connections; the components’ operating modes and transfer functions between them; as well as state machines for the specification of the components’ behaviour in each operating mode. The operating modes for each component serve as degradation levels. Automatic degradation of a component is triggered in case of failures of connected components. With recovery from underlying failures, the component is automatically upgraded back to a higher level. We illustrate our framework using a simplified prototype of an entrance barrier of a parking garage