MemGPT — Unlimited Context (Memory) for LLMs

How can you overcome the context window size limit of LLMs?

Venelin Valkov


Photo by Raj Rana on Unsplash

One of the largest (no pun intended) limitations of Large Language Models (LLMs) is their context window size. Here’s an overview of the most popular LLMs and their context window size:

How can you overcome the limited token context window? MemGPT offers a solution inspired by traditional Operating Systems (OS) — hierarchical memory. Let’s take a look at how it works.

Read the article on

What is MemGPT?

MemGPT, introduced in the paper “MemGPT: Towards LLMs as Operating Systems”, helps large language models (LLMs) handle longer conversations by cleverly managing different memory tiers. It knows when to store important information and retrieve it later during a chat. This makes it possible for AI models to have extended conversations, greatly improving their usefulness.

The authors focus on two use cases:

  • chat with (very) large documents that don’t fit the context window
  • multi-session chats that remember previous conversations and information about the user

MemGPT works a bit like the concept behind modern Operating Systems (OS), where you have speedy memory (RAM) and slower memory (HDD). RAM is super fast, but it’s limited in size compared to HDD. What makes MemGPT cool…