Maximilian Beck

Pioneering innovative RNN-inspired Large Language Models with sub-quadratic complexity.

I am a third year PhD at the Institute for Machine Learning at the Johannes Kepler University (JKU) Linz advised by Mr LSTM Sepp Hochreiter. I work on efficient, RNN-inspired architectures for Large Language Models with sub-quadratic complexity.

I have obtained my bachelors and masters degree in Mechatronics and Information Technology with focus on Control Theory from Karlsruhe Institute of Technology (KIT) in 2017 and 2021. From 2018 to 2019 I spent two amazing semesters abroad studying Computer Engineering at San José State University (SJSU) in the heart of Silicon Valley.

During my bachelors I have worked at the Institute for Production Science (wbk) at KIT, focusing on Automation Technology. After my time in San José I joined the autonomous driving division at FZI Research Center of Information Technology. There, I contributed the visibility computation package to their driving simulator written in C++. For my master thesis project I have developed a Monte-Carlo Tree search motion planning algorithm that explicitly considers a vehicles’ uncertainty about its environment.

In 2021, I got accepted as an ELLIS PhD at JKU Linz. During my first 1.5 years I focused on Few-Shot Learning, Meta-Learning and Domain Adaptation. I got also very excited about studying Loss Landscapes of Deep Learning and its properties such as Mode Connectivity for example.

With the rise of ChatGPT in 2022, I pivoted towards Large Language Models (LLMs). While the impressive performance of LLMs is the main driver of todays hype about generative AI, they have a major drawback: Their quadratic scaling in compute costs with growing input length. The reason for this is that all todays LLMs are based on the Transformer architecture with the quadratic Attention mechanism at its core.

Before the introduction of Transformers, LSTMs, that scale only linear in input length during inference, were state-of-the-art in Natural Language Processing. In our current, project we extend the LSTM with the most recent tricks of the trade of modern LLMs, aiming to challenge the conventional dominance of Transformer models. This work is funded by the newly founded company NXAI.

news

Oct 30, 2023 My third PhD Talk presenting the details of the xLSTM for the first time.
Aug 28, 2023 ELLIS Doctoral Symposium 2023 in Helsinki. I presented our ICLR23 paper Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation.
Nov 28, 2022 My first big Machine Learning conference: NeurIPS 2022 in New Orleans.
Nov 21, 2022 My second PhD Talk in our Seminar about “Loss Landscapes under Distribution Shift”. Slides.
Oct 16, 2022 My first invited talk together with Martin Gauch about SubGD: Few-Shot Learning by Dimensionality Reduction in Gradient Space at Ruhr-University Bochum.

selected publications

  1. Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation
    Marius-Constantin Dinu , Markus Holzleitner , Maximilian Beck, and 7 more authors
    In The Eleventh International Conference on Learning Representations , 2023
  2. subgd-pca.png
    Few-Shot Learning by Dimensionality Reduction in Gradient Space
    Martin Gauch , Maximilian Beck, Thomas Adler , and 10 more authors
    In The Conference on Lifelong Learning Agents , 2022