Maximilian Beck

Working on efficient Large Language Models with sub-quadratic complexity.

I am a fourth year PhD at the Institute for Machine Learning at the Johannes Kepler University (JKU) Linz advised by Mr LSTM Sepp Hochreiter. I work on efficient, RNN-inspired architectures for Large Language Models with sub-quadratic complexity.

I have obtained my bachelors and masters degree in Mechatronics and Information Technology with focus on Control Theory from Karlsruhe Institute of Technology (KIT) in 2017 and 2021. From 2018 to 2019 I spent two amazing semesters abroad studying Computer Engineering at San José State University (SJSU) in the heart of Silicon Valley.

During my bachelors I have worked at the Institute for Production Science (wbk) at KIT, focusing on Automation Technology. After my time in San José I joined the autonomous driving division at FZI Research Center of Information Technology. There, I contributed the visibility computation package to their driving simulator written in C++. For my master thesis project I have developed a Monte-Carlo Tree search motion planning algorithm that explicitly considers a vehicles’ uncertainty about its environment.

In 2021, I got accepted as an ELLIS PhD at JKU Linz. During my first 1.5 years I focused on Few-Shot Learning, Meta-Learning and Domain Adaptation. I got also very excited about studying Loss Landscapes of Deep Learning and its properties such as Mode Connectivity for example.

With the rise of ChatGPT in 2022, I pivoted towards Large Language Models (LLMs). While the impressive performance of LLMs is the main driver of todays hype about generative AI, they have a major drawback: Their quadratic scaling in compute costs with growing input length. The reason for this is that all todays LLMs are based on the Transformer architecture with the quadratic Attention mechanism at its core.

Before the introduction of Transformers, LSTMs, that scale only linear in input length during inference, were state-of-the-art in Natural Language Processing. In our current project we extend the LSTM with the most recent tricks of the trade of modern LLMs, aiming to challenge the conventional dominance of Transformer models. This work is funded by the newly founded company NXAI.

news

Sep 25, 2024 🚨 The xLSTM paper has been accepted as a spotlight at NeurIPS 2024! 🎉
I am looking forward to present and discuss the xLSTM in Vancouver.
Jul 26, 2024 We presented the xLSTM at 3 Workshops at ICML 2024 including an Oral at ES-FOMO@ICML2024.
Jul 01, 2024 I presented the xLSTM on 1littlecoder's YouTube Channel.
Jun 27, 2024 ELISE wrap up conference 2024 in Helsinki. I presented the xLSTM as ELLIS PhD Spotlight presentation.
May 08, 2024 It's out! We published the xLSTM on Arxiv.

selected publications

  1. arXiv
    Vision-LSTM: xLSTM as Generic Vision Backbone
    Benedikt Alkin , Maximilian Beck, Korbinian Pöppel , and 2 more authors
    In Arxiv , 2024
  2. xlstm_figure1_cropped.png
    xLSTM: Extended Long Short-Term Memory
    Maximilian Beck*, Korbinian Pöppel* , Markus Spanring , and 6 more authors
    In Advances in Neural Information Processing Systems (NeurIPS) , 2024
  3. Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation
    Marius-Constantin Dinu , Markus Holzleitner , Maximilian Beck, and 7 more authors
    In The International Conference on Learning Representations , 2023
  4. subgd-pca.png
    Few-Shot Learning by Dimensionality Reduction in Gradient Space
    Martin Gauch , Maximilian Beck, Thomas Adler , and 10 more authors
    In The Conference on Lifelong Learning Agents , 2022