Back to all posts

Building a Second Brain: Semantic Memory Assistant, v0

8 min readBy Ethan Ham

A lightweight, local-first approach to never losing track of your stuff again.


TL;DR

I built a simple proof-of-concept semantic search tool to remember where I put things during a cross-country move. It uses embeddings and vector search to let you query in natural language (e.g. "where's my passport?") and get instant answers. 100% local, sub-second query responses, ~500MB footprint (including model), "built" in a few hours.

The Result: A working RAG-lite system that actually solved a real problem.


The Tracking Problem:

In summer 2025, I moved across the country. It was chaotic for a number of reasons; boxes got relabeled as contents shifted, items moved between cars and storage units, the "bathroom box" now houses all your kitchen supplies.

Traditional solutions didn't quite work. Physical labels often didn't exist (My passport was in my backpack, but now it's in my glove box), or became outdated quickly. Spreadsheets and notes apps took too much friction - finding and updating the existing line items is tedious.

I needed something that understood what I was looking for, not just keyword matching. Enter semantic search...


My Approach: RAG Without the G

Architecture Overview

The system has three core components:

User Query: "where's my passport?"
    |
    v
Embedding Engine: Convert to 384-dim vector
    |
    v
Vector Store: Find top similar memories (ChromaDB)
    |
    v
Hybrid Scoring: Combine similarity + recency
    |
    v
Response: "passport in blue suitcase (2 days ago)"

Core Concept: Hybrid Scoring

The secret sauce that made this work is combining two signals: Similarity Score and Recency Score.

Similarity Score is a measure of how similar the user query is to the individual memory logs. The result is on a bounded interval of 0 to 1, with 1 representing identical semantic meaning (perfectly aligned vectors) and 0 representing no similarity (completely orthogonal vectors).

Recency Score is a measure of how recently a memory was logged. I chose to calculate the recency score with an exponential decay function, trying to mimic the way our brains might work. Memories decay quickly early on, but are slow to completely disappear. An event 30 minutes ago is much clearer in your memory than a memory 1 or 2 days ago. But there is likely not much difference between that same memory 3 weeks ago vs. 4 weeks ago.

# For each stored memory: similarity_score = cosine_similarity(query_vector, memory_vector) # 0-1 recency_score = exp(-days_old * decay_rate) # Newer = higher # The final score weighting here is easily tunable final_score = (similarity_score * 0.7) + (recency_score * 0.3)

Why this works: When I move my passport from my blue suitcase to my backpack, I don't need to update anything. I just add a new memory. The recency boost ensures the latest location wins.

Design Principles for the v0

  1. Solve the immediate problem: Simple tracking, nothing more
  2. No feature creep: Resist the urge to add LLMs, web UI, categories, etc. at least for the proof of concept v0
  3. Optimize for friction: CLI should be faster than opening a notes app
  4. Local-first: Works on planes, in storage units, anywhere. + added privacy for personal data

Nice-to-Haves (Add in v1+)

  • Web interface
  • Multi-user support
  • Categories/tags
  • LLM-powered conversational responses
  • Image attachments
  • Voice input

The Tech Stack: Pythonic and Local

I wanted to keep this as straightforward as possible with minimal external dependencies and live integrations. The goal here was roughly 70% learning, 30% utility.

  1. Local embeddings: No OpenAI API calls
  2. Lightweight: Must run on a laptop without eating RAM
  3. Persistent: Can't lose data between runs
  4. Fast: Sub-second responses or it's not useful

Technical Decisions (thanks, ChatGPT)

Embeddings: sentence-transformers (all-MiniLM-L6-v2)

from sentence_transformers import SentenceTransformer model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') embedding = model.encode("passport in blue suitcase") # Returns 384-dim vector

This model is small (80MB download, 384 dimensions), fast (less than 100ms inference), and accurate enough for item/location matching.

Alternatives considered:

  • OpenAI embeddings: Requires internet, costs money
  • Larger models (BGE, E5): Overkill for this use case
  • Custom training: Way too much work for v0

Vector Store: ChromaDB

import chromadb # Initialize persistent client - data saves to disk at ./data/chroma # This means your memories survive between program runs client = chromadb.PersistentClient(path="./data/chroma") # Get or create a collection named "memories" # Collections are like tables in a database - they store related vectors collection = client.get_or_create_collection("memories") # Add a memory to the collection collection.add( ids=["a3f2"], # Unique identifier for this memory (like a primary key) embeddings=[embedding], # The 384-dim vector representation of the text documents=["passport in blue suitcase"], # Original text (stored for retrieval) metadatas=[{"timestamp": "2024-11-10T14:32:00"}] # Extra info for filtering/sorting ) # Search for similar memories using vector similarity results = collection.query( query_embeddings=[query_embedding], # Vector of your search query n_results=10 # Return top 10 most similar memories )

ChromaDB has embedded mode (no server needed), persistent storage out of the box, a simple Python API, and is fast enough for thousands of memories, making it perfect for this lightweight v0 app.

Alternatives considered:

  • FAISS: Faster but requires manual persistence layer
  • Qdrant/Pinecone: Server-based, overkill for local use
  • Weaviate: Great but too heavy for a weekend project

CLI: Click

# Add a memory $ ./brain add "passport in blue suitcase" ✓ Memory saved! [ID: a3f2] # Recall it later $ ./brain recall "where's my travel document?" Found 1 matching memory: 1. passport in blue suitcase [a3f2] · 2 days ago · Score: 0.89

Click has built-in input validation, automatic help text generation, and is easy to extend.

The "No LLM" Decision

"Why not use GPT to make responses natural?" As I mentioned earlier, the v0 here is largely about learning and getting a proof of concept off the ground ASAP, which requires minimal complexity and as few dependencies as possible. Introducing an API or huge local model not only adds complexity and dependencies, but creates another layer to test and validate along the way.

Raw memory text is fine when you just need to find your passport.

Performance Characteristics

  • First run (model download): ~30 sec (one-time, cached)
  • Add memory: ~50ms (embedding generation)
  • Search query: ~100ms (embedding + vector search + scoring)
  • Storage footprint: ~500MB (model + data)

How It Went

Example Usage

# Day 1: Packing $ ./brain add "winter coats in box labeled 'bedroom closet'" $ ./brain add "passport in blue suitcase top pocket" $ ./brain add "laptop charger in black backpack" # Day 15: Unpacking at new place $ ./brain recall "where are my winter jackets?" > winter coats in box labeled 'bedroom closet' (14 days ago) # Day 30: Moving items around $ ./brain add "winter coats now in hallway closet" $ ./brain recall "where are my coats?" > winter coats now in hallway closet (just now) # Recency wins!

What I Learned

Technical Takeaways

RAG doesn't need generation: For many use cases, retrieval alone is probably sufficient, though you probably don't get to mention "AI" as many times. Recency is a feature: Exponential decay naturally handles "updates". Local ML is viable: Modern embedding models are shockingly lightweight. Hybrid scoring > pure similarity: Combining signals beats any single metric. Playing with the function weights made a big difference in getting this to work well for me.

Product Lessons

Building a first draft app is now surprisingly easy regardless of your background. However, scope creep is also now easier than ever. I had to constantly remind myself - "build the bare minimum, get it to work, then iterate". Additionally, I predict that product taste will become more important than ever. When the barrier to entry for a new product is next to nothing and the "out of the box" AI generated content looks similar to everyone else's "average of the internet" content (...like this website, actually), product taste will be more valuable than ever because it's difficult to hack. You need to know what you want the experience to feel like and who you're building it for, or you risk having your super quirky, fun new app looking like every other corporate creation that has been optimized into oblivion.

What's Next

v0.5 (Quick Wins):

  • Export/import memories (backup)
  • Search filtering by date range
  • Better error messages

v1.0 Blow Up the Scope:

  • FastAPI REST endpoints
  • Simple web UI
  • LLM integration for natural responses
  • Multi-device sync

v2.0 (Ambitious):

  • Voice input/output
  • Image attachments (find things by photos)
  • Programmed / automatic reminders ("You haven't accessed your passport in 6 months")
  • Mobile app

Try It Yourself

The entire project is open source and documented for learning: