Lumon: Why do I want to re-build an AI? (Part 1)

Lumon is a personal AI assistant. I put emphasis on the word personal because most of the AI solutions on the market are only just a ChatGPT / Claude AI wrapper. But that's it, similar and repetitive. The only purpose of these solutions is to answer questions, as their ability is limited.

I keep thinking, what if there is a model that can act like a personal AI assistant? It can remember my tasks, integrate into calendars, know me, and improve its knowledge base based on my information. It's always been hard for me to remember stuff. Think back to the movie Iron Man, what if there is a model like JARVIS?

Why would I want to build this myself? This is the very first time I've worked with LangChain and LangGraph, which are frameworks for building applications with large language models (LLMs). I want to try them out. The concept is really simple, and if you have a bit of knowledge on LangChain, you can probably do it yourself. But to me, the most important thing is the journey. The journey through which I can learn something new, and the journey through which I can build something useful.

What do I want?

An AI Assistant that can persist long-term memory. It should be able to recall, update, and remove these memories on the fly.
Being able to save structured tasks data, and recall them accurately based on chronological order. And more, being able to sync with my calendar.
Being able to browse the web and perform actions for me.

In order to achieve these goals, I need a model which can think about its own tasks and solve possible problems on its own.

From all the AI providers on the market, I chose OpenAI for now.

I immediately thought of LangChain, a framework that helps developers build applications powered by language models through composable components, which allows me to chain tools together. I could implement memory, thinking, problem solving, web research, as well as task storage as tools, and chain them together.

This worked, until it didn't.

I was able to make it perform actions, such as saving memories. However, as the tasks increased, the single thread of chain of thought was not enough. It was like I was using a single-core CPU. The AI was not consistent and reliable enough.

Then came the "Agentic Framework."

I stumbled upon a library called orchestra by mainframecomputers.

Orchestra is a lightweight open-source agentic framework for creating LLM-driven task pipelines and multi-agent teams, centered around the concept of Tasks rather than conversation patterns.

It is still in the alpha stage, but its core functionality is exactly what I needed for this project. An agentic framework allows multiple AI "agents" to work together, each with specific roles and capabilities.

I could have a conductor who manages agents that are dedicated to one single field of expertise.

For example:

Agent 1: Memory Agent
Agent 2: Task Management Agent
Agent 3: Web Search Agent
......

lumon_agent = Agent(
    agent_id="lumon_agent",
    role="Conductor",
    goal="To chat with and help the human user by coordinating your team of agents to carry out tasks.",
    attributes="""You know that you must use the conduct_tool to delegate tasks to your team of agents...""",
    llm=OpenaiModels.gpt_4o_mini,
    tools=[
        Conduct.conduct_tool(web_research_agent, memory_management_agent, task_management_agent)
    ]
)
lumon_agent = Agent(
    agent_id="lumon_agent",
    role="Conductor",
    goal="To chat with and help the human user by coordinating your team of agents to carry out tasks.",
    attributes="""You know that you must use the conduct_tool to delegate tasks to your team of agents...""",
    llm=OpenaiModels.gpt_4o_mini,
    tools=[
        Conduct.conduct_tool(web_research_agent, memory_management_agent, task_management_agent)
    ]
)

The conductor will be the one that orchestrates the whole process. It will be able to see the big picture and make decisions based on the overall goal.

When an error occurs, the conductor will be able to adjust and try to fix the problem itself.

This system is extremely extensible. I can add more agents, and each agent can be customized to do specific tasks.

The system does not rely on a single thread; instead, each agent has its own identity. The conductor can call multiple agents simultaneously. It's multi-threaded.

There are a couple of aspects I need to focus on:

Prompting
Cost Efficiency
Memory Management & Storage System
Task Management

Prompting

Prompting is easy, but making it effective is hard.

In order for LUMON to function well and have knowledge about the tasks it can perform, I have to separate the instructions into parts:

Base
Memory Guidelines
Task Guidelines
......

Take the base section as an example:

  You are L.U.M.O.N., an AI assistant focused on being helpful and efficient in your responses.
  Keep your responses clear and concise while maintaining a professional tone.
  
  ⚠️ CRITICAL WARNING ⚠️: You must ALWAYS use the conduct_tool to interact with your team of agents. NEVER try to call agents directly.
  The conduct_tool accepts tasks in this format:
  {"tasks": [
    {
      "task_id": "unique_id",
      "agent_id": "agent_name",
      "instruction": "what you want the agent to do"
    }
  ]}
  INCORRECT (DO NOT DO THIS):
  {"tool_calls":[{"tool":"task_management_agent","params":{"task_id":"search_upcoming_tests","instruction":"List all upcoming tests"}}]}
  CORRECT (ALWAYS DO THIS):
  {"tool_calls":[{"tool":"conduct_tool","params":{"tasks":[{"task_id":"search_upcoming_tests","agent_id":"task_management_agent","instruction":"List all upcoming tests"}]}}]}
  Example for web research:
  {"tasks": [{"task_id": "web_search_1", "agent_id": "web_research_agent", "instruction": "Search for information about evannotfound"}]}
  You are L.U.M.O.N., an AI assistant focused on being helpful and efficient in your responses.
  Keep your responses clear and concise while maintaining a professional tone.
  
  ⚠️ CRITICAL WARNING ⚠️: You must ALWAYS use the conduct_tool to interact with your team of agents. NEVER try to call agents directly.
  The conduct_tool accepts tasks in this format:
  {"tasks": [
    {
      "task_id": "unique_id",
      "agent_id": "agent_name",
      "instruction": "what you want the agent to do"
    }
  ]}
  INCORRECT (DO NOT DO THIS):
  {"tool_calls":[{"tool":"task_management_agent","params":{"task_id":"search_upcoming_tests","instruction":"List all upcoming tests"}}]}
  CORRECT (ALWAYS DO THIS):
  {"tool_calls":[{"tool":"conduct_tool","params":{"tasks":[{"task_id":"search_upcoming_tests","agent_id":"task_management_agent","instruction":"List all upcoming tests"}]}}]}
  Example for web research:
  {"tasks": [{"task_id": "web_search_1", "agent_id": "web_research_agent", "instruction": "Search for information about evannotfound"}]}

As you can see, it is not the most efficient. I am new to the AI field, and LUMON is still at its early stage, so it will be what it is for now. I would expect it to improve over time.

We can see that we need to instruct the model to use the conduct_tool to interact with the agents.

In real-world testing using the model gpt-4o-mini, without these instructions and by only relying on the prompts provided by the library orchestra, the model will not be able to function well.

Therefore, prompting is a current issue since when we try to expand the system, the prompt length will grow extensively.

This leads me to the next issue.

Cost Efficiency

The cost is a big issue.

I am using the gpt-4o-mini model, which is the cheapest one on the market when you look at cost vs performance. GPT-4o-mini is a smaller, more affordable version of OpenAI's GPT-4o model that still offers good capabilities.

However, the cost is still too high.

In each run, on average, every prompt you send to the model and when it is using the conduct_tool to interact with the agents, it costs around $0.005 - $0.01.

So cost optimization is something I need to focus on in the future.

Memory Management & Storage System

In order to achieve long-term memory storage, I need to store the memories in a database.

But it should not be just a simple database.

I need it to be able to store the memories in a way that is easy to query and easy to update. I could choose to use a simple key-value store, but retrieving using text queries is not efficient.

Therefore, I chose to use a FAISS vector database. FAISS (Facebook AI Similarity Search) is a library that allows for efficient similarity search and clustering of dense vectors.

The benefit of using a vector database is that it allows for efficient similarity searches and can handle high-dimensional data, making it ideal for querying and updating memory storage. The language model can directly use similarity search to retrieve the most relevant memories.

We first store the memories in the vector database in the form of embeddings (numerical representations of text that capture semantic meaning), then after every interaction, we save the vector database locally.

Vector Database Implementation

I implemented a FAISS-based vector database system that converts text memories into high-dimensional embeddings:

@classmethod
def save_memory(cls, memory: str) -> Union[str, Dict]:
    """Save a new memory to the vector store."""
    if cls.vector_store is None:
        cls.vector_store = cls._initialize_store()
    
    # Generate a unique ID for the memory
    memory_id = str(uuid.uuid4())
    
    # Add timestamp metadata
    timestamp = datetime.now().isoformat()
    
    # Create the document with metadata
    doc = Document(
        page_content=memory,
        metadata={
            "memory_id": memory_id,
            "timestamp": timestamp,
            "type": "user_memory"
        }
    )
    
    # Add to vector store
    cls.vector_store.add_documents([doc])
    
    # Save the updated store
    cls.vector_store.save_local(cls.persist_directory)
    
    return {
        "status": "success",
        "memory_id": memory_id,
        "timestamp": timestamp
    }
@classmethod
def save_memory(cls, memory: str) -> Union[str, Dict]:
    """Save a new memory to the vector store."""
    if cls.vector_store is None:
        cls.vector_store = cls._initialize_store()
    
    # Generate a unique ID for the memory
    memory_id = str(uuid.uuid4())
    
    # Add timestamp metadata
    timestamp = datetime.now().isoformat()
    
    # Create the document with metadata
    doc = Document(
        page_content=memory,
        metadata={
            "memory_id": memory_id,
            "timestamp": timestamp,
            "type": "user_memory"
        }
    )
    
    # Add to vector store
    cls.vector_store.add_documents([doc])
    
    # Save the updated store
    cls.vector_store.save_local(cls.persist_directory)
    
    return {
        "status": "success",
        "memory_id": memory_id,
        "timestamp": timestamp
    }

The search functionality uses semantic similarity to find relevant memories:

@classmethod
def search_memories(cls, query: str, limit: int = 5) -> str:
    """Search for memories similar to the query."""
    if cls.vector_store is None:
        cls.vector_store = cls._initialize_store()
    
    results = cls.vector_store.similarity_search(query, k=limit)
    
    if not results:
        return "No relevant memories found."
    
    formatted_results = []
    for doc in results:
        memory_content = doc.page_content
        metadata = doc.metadata
        memory_id = metadata.get("memory_id", "unknown")
        timestamp = metadata.get("timestamp", "unknown")
        
        # Format the timestamp for readability if it exists
        if timestamp != "unknown":
            try:
                dt = datetime.fromisoformat(timestamp)
                formatted_time = dt.strftime("%Y-%m-%d %H:%M:%S")
            except:
                formatted_time = timestamp
        else:
            formatted_time = "unknown time"
        
        formatted_results.append(
            f"Memory [{memory_id[:8]}...] from {formatted_time}:\n{memory_content}\n"
        )
    
    return "\n".join(formatted_results)
@classmethod
def search_memories(cls, query: str, limit: int = 5) -> str:
    """Search for memories similar to the query."""
    if cls.vector_store is None:
        cls.vector_store = cls._initialize_store()
    
    results = cls.vector_store.similarity_search(query, k=limit)
    
    if not results:
        return "No relevant memories found."
    
    formatted_results = []
    for doc in results:
        memory_content = doc.page_content
        metadata = doc.metadata
        memory_id = metadata.get("memory_id", "unknown")
        timestamp = metadata.get("timestamp", "unknown")
        
        # Format the timestamp for readability if it exists
        if timestamp != "unknown":
            try:
                dt = datetime.fromisoformat(timestamp)
                formatted_time = dt.strftime("%Y-%m-%d %H:%M:%S")
            except:
                formatted_time = timestamp
        else:
            formatted_time = "unknown time"
        
        formatted_results.append(
            f"Memory [{memory_id[:8]}...] from {formatted_time}:\n{memory_content}\n"
        )
    
    return "\n".join(formatted_results)

This allows Lumon to retrieve memories based on semantic relevance rather than exact keyword matching.

Memory aspect is done.

Task Management

Continuing with the memory aspect, we could use a vector database to store the tasks. However, the problem is, what if the model retrieves the wrong/irrelevant memories?

By using a vector database, we have to accept the reality that it is not structured.

When it comes to tasks, we need more precision. Tasks have specific attributes like due dates, priorities, and statuses that need to be accurately stored and retrieved. A vector database might return semantically similar tasks, but that's not always what we need - sometimes we need exactly the task due tomorrow, not one that sounds similar.

Currently, I am still using a vector database to store the tasks, with a TypedDict (a Python typing construct that defines dictionaries with specific key-value types) to ensure the data is somewhat structured.

However, during real-world testing, when I try to filter the tasks based on the due date, the model is not consistent enough. It often gives me an incomplete list of tasks.

Therefore, I am thinking of using a more structured database like PostgreSQL, which is a powerful, open-source relational database system.

Finish it off

The system is not perfect, but it is a good start.

The GitHub repository for LUMON is available here.

The project is still in the early stage, and I will continue to work on it.