Maximilian Beck
Working on efficient Large Language Models with sub-quadratic complexity.
ELLIS PhD Student at Johannes Kepler University Linz, Institute for Machine Learning
I am a fourth year PhD at the Institute for Machine Learning at the Johannes Kepler University (JKU) Linz advised by Mr LSTM Sepp Hochreiter. I work on efficient, RNN-inspired architectures for Large Language Models with sub-quadratic complexity.
I have obtained my bachelors and masters degree in Mechatronics and Information Technology with focus on Control Theory from Karlsruhe Institute of Technology (KIT) in 2017 and 2021. From 2018 to 2019 I spent two amazing semesters abroad studying Computer Engineering at San José State University (SJSU) in the heart of Silicon Valley.
During my bachelors I have worked at the Institute for Production Science (wbk) at KIT, focusing on Automation Technology. After my time in San José I joined the autonomous driving division at FZI Research Center of Information Technology. There, I contributed the visibility computation package to their driving simulator written in C++. For my master thesis project I have developed a Monte-Carlo Tree search motion planning algorithm that explicitly considers a vehicles’ uncertainty about its environment.
In 2021, I got accepted as an ELLIS PhD at JKU Linz. During my first 1.5 years I focused on Few-Shot Learning, Meta-Learning and Domain Adaptation. I got also very excited about studying Loss Landscapes of Deep Learning and its properties such as Mode Connectivity for example.
With the rise of ChatGPT in 2022, I pivoted towards Large Language Models (LLMs). While the impressive performance of LLMs is the main driver of todays hype about generative AI, they have a major drawback: Their quadratic scaling in compute costs with growing input length. The reason for this is that all todays LLMs are based on the Transformer architecture with the quadratic Attention mechanism at its core.
Before the introduction of Transformers, LSTMs, that scale only linear in input length during inference, were state-of-the-art in Natural Language Processing. In our current project we extend the LSTM with the most recent tricks of the trade of modern LLMs, aiming to challenge the conventional dominance of Transformer models. This work is funded by the newly founded company NXAI.
news
Jul 26, 2024 | We presented the xLSTM at 3 Workshops at ICML 2024 including an Oral at ES-FOMO@ICML2024. |
---|---|
Jul 01, 2024 | I presented the xLSTM on 1littlecoder's YouTube Channel. |
Jun 27, 2024 | ELISE wrap up conference 2024 in Helsinki. I presented the xLSTM as ELLIS PhD Spotlight presentation. |
May 08, 2024 | It's out! We published the xLSTM on Arxiv. |
Apr 08, 2024 | My fourth PhD Talk presenting first results of the xLSTM. |