ctx0-supabase-data-architecture.md

ctx0 Supabase Data Architecture

This document defines the database schema, storage architecture, and setup workflow for ctx0's Supabase backend.


Overview

ctx0 uses Supabase (PostgreSQL + pgvector + Storage) as its backend. This architecture supports:

  • Self-hosted mode: User provides their own Supabase project
  • Bytespace hosted mode: Managed by Bytespace (shared infrastructure)

Both modes use the same schema, with Row Level Security (RLS) isolating user data.


Database Tables

Core Tables

TablePurpose
ctx0_entriesMain vault storage with tree structure (files, folders, agents, skills)
ctx0_entry_versionsVersion history for entries
ctx0_sessionsSession/conversation metadata from all sources
ctx0_session_messagesIndividual messages, tool calls, compaction points
ctx0_learned_queriesSQL templates that improve with use
ctx0_embedding_queueAsync embedding generation
ctx0_working_memoryTemporary storage before archival
ctx0_user_configPer-user configuration
ctx0_schema_versionsMigration version tracking

Table Schemas

ctx0_user_config

Per-user configuration and settings.

sql
CREATE TABLE ctx0_user_config ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID UNIQUE NOT NULL, -- Storage mode storage_mode TEXT DEFAULT 'self-hosted', -- 'self-hosted' or 'hosted' -- Settings auto_remember BOOLEAN DEFAULT true, retention_days INT DEFAULT 365, -- Preferences preferences JSONB DEFAULT '{}', -- Stats (computed) vault_stats JSONB DEFAULT '{}', -- Timestamps created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

ctx0_entries

Main vault storage with hierarchical tree structure. Represents files, folders, agents, skills, tools, and MCP configs.

sql
CREATE TABLE ctx0_entries ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, -- Tree structure path TEXT NOT NULL, -- Full path: "/contacts/sarah-chen" parent_path TEXT, -- Parent folder: "/contacts" entry_kind TEXT NOT NULL DEFAULT 'file', -- 'file' or 'folder' -- Type classification entry_type TEXT NOT NULL DEFAULT 'note', -- 'note', 'contact', 'project', 'decision', 'knowledge', -- 'preference', 'agent', 'skill', 'tool', 'mcp_config' -- Content type (file format) content_type TEXT, -- '.md', '.json', '.yaml', 'sql_table' -- For skills that reference database tables (DB-as-skill pattern) table_reference TEXT, -- 'sync_gmail', 'sync_slack', etc. -- System protection is_system BOOLEAN DEFAULT false, -- System-created (default agents, skills, tools) is_locked BOOLEAN DEFAULT false, -- Cannot be modified or deleted -- Content title TEXT, content_preview TEXT, -- First 500 chars for quick display content_ref TEXT, -- Reference to Storage bucket file -- Metadata frontmatter JSONB DEFAULT '{}', -- Parsed YAML frontmatter from AGENT.md, SKILL.md, etc. tags TEXT[] DEFAULT '{}', links UUID[] DEFAULT '{}', -- Related entries -- External sync sync_sources JSONB DEFAULT '{}', -- {"gmail": "msg_id", "slack": "channel/ts"} -- Vector embedding embedding VECTOR(1536), -- Versioning version INT DEFAULT 1, -- Timestamps created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), -- Constraints UNIQUE(user_id, path) );

Entry Types:

entry_typedescriptioncontent_typeExample path
agentAgent configuration.md + .json/agents/ctx0/AGENT.md
skillDB-as-skill definition.md/skills/gmail/SKILL.md
toolTool definition.md/tools/ctx0_remember/TOOL.md
mcp_configMCP server config.json/mcp/mcp.json
contactPerson/organization.md/contacts/sarah-chen.md
projectProject context.md/projects/bot0/overview.md
noteGeneral note.md/knowledge/supabase-rls.md
decisionDecision record.md/decisions/use-supabase.md

ctx0_entry_versions

Version history for tracking changes.

sql
CREATE TABLE ctx0_entry_versions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), entry_id UUID NOT NULL REFERENCES ctx0_entries(id) ON DELETE CASCADE, user_id UUID NOT NULL, -- Version info version INT NOT NULL, -- Snapshot content_ref TEXT, -- Reference to versioned content frontmatter JSONB, -- Change tracking changed_by TEXT, -- 'user', 'agent:bot0', 'agent:curator', 'sync:gmail' change_summary TEXT, -- Timestamp created_at TIMESTAMPTZ DEFAULT NOW() );

ctx0_sessions

Session/conversation metadata from all sources (bot0, Claude Code, Cursor, etc.).

sql
CREATE TABLE ctx0_sessions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, -- Identity title TEXT, -- Source information agent TEXT NOT NULL DEFAULT 'bot0', -- 'bot0', 'claude_code', 'cursor', 'gemini', 'chatgpt' agent_version TEXT, -- Version of the agent/tool source_path TEXT, -- Original file path (e.g., ~/.claude/projects/.../session.jsonl) -- Context working_directory TEXT, project_name TEXT, -- Status status TEXT DEFAULT 'active', -- 'active', 'paused', 'completed', 'archived' -- Stats (denormalized for quick access) message_count INT DEFAULT 0, tool_call_count INT DEFAULT 0, total_input_tokens INT DEFAULT 0, total_output_tokens INT DEFAULT 0, compaction_count INT DEFAULT 0, -- How many times context was compacted -- Files touched during session files_modified TEXT[] DEFAULT '{}', files_created TEXT[] DEFAULT '{}', files_deleted TEXT[] DEFAULT '{}', -- Summary summary TEXT, -- AI-generated summary of the session -- Metadata metadata JSONB DEFAULT '{}', -- Flexible metadata -- Timestamps started_at TIMESTAMPTZ DEFAULT NOW(), last_activity_at TIMESTAMPTZ DEFAULT NOW(), ended_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

ctx0_session_messages

Individual messages and turns within a session. Stores tool calls, results, and compaction points.

sql
CREATE TABLE ctx0_session_messages ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_id UUID NOT NULL REFERENCES ctx0_sessions(id) ON DELETE CASCADE, user_id UUID NOT NULL, -- Sequencing message_sequence INT NOT NULL, -- Order in conversation (1, 2, 3...) stream_step INT DEFAULT 0, -- For streaming (multiple steps per message) -- Content role TEXT NOT NULL, -- 'user', 'assistant', 'system', 'tool_call', 'tool_result' content TEXT, -- Tool call details (when role='tool_call' or 'tool_result') tool_name TEXT, tool_params JSONB, tool_result JSONB, tool_error TEXT, -- Subagent/skill info (when delegated) subagent_name TEXT, -- Which subagent handled this (e.g., 'ctx0', 'explorer') skill_name TEXT, -- Which skill was invoked (e.g., 'gmail') -- Compaction markers is_compaction_point BOOLEAN DEFAULT false, -- Context was compacted at this message compaction_summary TEXT, -- The summary used for compaction (stored here!) context_tokens_before INT, -- Token count before compaction -- Token usage input_tokens INT DEFAULT 0, output_tokens INT DEFAULT 0, model_used TEXT, -- 'claude-3-opus', 'claude-3-sonnet', 'gpt-4', etc. -- Timestamps created_at TIMESTAMPTZ DEFAULT NOW(), -- Constraints UNIQUE(session_id, message_sequence, stream_step) );

Message Roles:

roledescription
userUser input message
assistantAgent response
systemSystem message (context, instructions)
tool_callAgent requested a tool (tool_name, tool_params populated)
tool_resultTool returned a result (tool_result populated)

ctx0_learned_queries

SQL templates that improve with usage (for DB-as-skill pattern).

sql
CREATE TABLE ctx0_learned_queries ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, -- Query identity name TEXT NOT NULL, natural_language TEXT[] DEFAULT '{}', -- Variations that match this query -- SQL template sql_template TEXT NOT NULL, -- SQL with :param placeholders -- Performance tracking use_count INT DEFAULT 0, avg_latency_ms INT, last_used_at TIMESTAMPTZ, -- Effectiveness was_helpful BOOLEAN, context_type TEXT, -- What type of task this helps with -- Vector for matching query_embedding VECTOR(1536), -- Timestamps created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

ctx0_embedding_queue

Async embedding generation queue.

sql
CREATE TABLE ctx0_embedding_queue ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, -- Target entry_id UUID REFERENCES ctx0_entries(id) ON DELETE CASCADE, content_hash TEXT NOT NULL, -- To detect changes -- Status status TEXT DEFAULT 'pending', -- 'pending', 'processing', 'done', 'failed' error TEXT, -- Timestamps created_at TIMESTAMPTZ DEFAULT NOW(), processed_at TIMESTAMPTZ );

ctx0_working_memory

Temporary storage for ctx0_remember calls before curator agent archives.

sql
CREATE TABLE ctx0_working_memory ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, session_id UUID REFERENCES ctx0_sessions(id), -- Memory content memory_type TEXT NOT NULL, -- 'fact', 'decision', 'contact_update', 'project_update', 'preference', 'insight' content TEXT NOT NULL, suggested_path TEXT, -- Where curator might file this metadata JSONB DEFAULT '{}', -- Processing status processed BOOLEAN DEFAULT false, processed_at TIMESTAMPTZ, processed_by TEXT, -- 'curator', 'user', 'auto' -- Timestamps created_at TIMESTAMPTZ DEFAULT NOW() );

ctx0_schema_versions

Migration version tracking for self-hosted updates.

sql
CREATE TABLE ctx0_schema_versions ( id SERIAL PRIMARY KEY, version INT NOT NULL UNIQUE, migration_name TEXT NOT NULL, description TEXT, applied_at TIMESTAMPTZ DEFAULT NOW(), applied_by TEXT DEFAULT 'bot0-daemon' );

Indexes

sql
-- ctx0_entries indexes CREATE INDEX idx_entries_user ON ctx0_entries(user_id); CREATE INDEX idx_entries_path ON ctx0_entries(user_id, path); CREATE INDEX idx_entries_parent ON ctx0_entries(user_id, parent_path); CREATE INDEX idx_entries_type ON ctx0_entries(user_id, entry_type); CREATE INDEX idx_entries_kind ON ctx0_entries(user_id, entry_kind); CREATE INDEX idx_entries_content_type ON ctx0_entries(user_id, content_type); CREATE INDEX idx_entries_system ON ctx0_entries(user_id, is_system); CREATE INDEX idx_entries_tags ON ctx0_entries USING GIN(tags); CREATE INDEX idx_entries_updated ON ctx0_entries(updated_at DESC); -- Vector index (IVFFlat for <100k entries, consider HNSW for larger) CREATE INDEX idx_entries_embedding ON ctx0_entries USING ivfflat(embedding vector_cosine_ops) WITH (lists = 100); -- ctx0_sessions indexes CREATE INDEX idx_sessions_user ON ctx0_sessions(user_id); CREATE INDEX idx_sessions_agent ON ctx0_sessions(user_id, agent); CREATE INDEX idx_sessions_status ON ctx0_sessions(user_id, status); CREATE INDEX idx_sessions_started ON ctx0_sessions(started_at DESC); CREATE INDEX idx_sessions_last_activity ON ctx0_sessions(last_activity_at DESC); -- ctx0_session_messages indexes CREATE INDEX idx_messages_session ON ctx0_session_messages(session_id); CREATE INDEX idx_messages_user ON ctx0_session_messages(user_id); CREATE INDEX idx_messages_sequence ON ctx0_session_messages(session_id, message_sequence); CREATE INDEX idx_messages_role ON ctx0_session_messages(session_id, role); CREATE INDEX idx_messages_compaction ON ctx0_session_messages(session_id, is_compaction_point) WHERE is_compaction_point = true; CREATE INDEX idx_messages_tool ON ctx0_session_messages(session_id, tool_name) WHERE tool_name IS NOT NULL; -- ctx0_entry_versions indexes CREATE INDEX idx_versions_entry ON ctx0_entry_versions(entry_id); CREATE INDEX idx_versions_user ON ctx0_entry_versions(user_id); -- ctx0_learned_queries indexes CREATE INDEX idx_queries_user ON ctx0_learned_queries(user_id); CREATE INDEX idx_queries_embedding ON ctx0_learned_queries USING ivfflat(query_embedding vector_cosine_ops) WITH (lists = 50); -- ctx0_embedding_queue indexes CREATE INDEX idx_queue_status ON ctx0_embedding_queue(status, created_at); CREATE INDEX idx_queue_user ON ctx0_embedding_queue(user_id); -- ctx0_working_memory indexes CREATE INDEX idx_memory_user ON ctx0_working_memory(user_id); CREATE INDEX idx_memory_session ON ctx0_working_memory(session_id); CREATE INDEX idx_memory_processed ON ctx0_working_memory(processed, created_at);

Functions

Auto-update timestamps

sql
CREATE OR REPLACE FUNCTION ctx0_update_updated_at() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ LANGUAGE plpgsql; -- Apply to tables with updated_at CREATE TRIGGER trigger_entries_updated_at BEFORE UPDATE ON ctx0_entries FOR EACH ROW EXECUTE FUNCTION ctx0_update_updated_at(); CREATE TRIGGER trigger_config_updated_at BEFORE UPDATE ON ctx0_user_config FOR EACH ROW EXECUTE FUNCTION ctx0_update_updated_at(); CREATE TRIGGER trigger_queries_updated_at BEFORE UPDATE ON ctx0_learned_queries FOR EACH ROW EXECUTE FUNCTION ctx0_update_updated_at(); CREATE TRIGGER trigger_sessions_updated_at BEFORE UPDATE ON ctx0_sessions FOR EACH ROW EXECUTE FUNCTION ctx0_update_updated_at();
sql
CREATE OR REPLACE FUNCTION ctx0_semantic_search( p_user_id UUID, p_query_embedding VECTOR(1536), p_limit INT DEFAULT 10, p_entry_type TEXT DEFAULT NULL, p_min_similarity FLOAT DEFAULT 0.5 ) RETURNS TABLE ( id UUID, path TEXT, title TEXT, content_preview TEXT, entry_type TEXT, similarity FLOAT ) AS $$ BEGIN RETURN QUERY SELECT e.id, e.path, e.title, e.content_preview, e.entry_type, 1 - (e.embedding <=> p_query_embedding) AS similarity FROM ctx0_entries e WHERE e.user_id = p_user_id AND e.embedding IS NOT NULL AND (p_entry_type IS NULL OR e.entry_type = p_entry_type) AND 1 - (e.embedding <=> p_query_embedding) >= p_min_similarity ORDER BY e.embedding <=> p_query_embedding LIMIT p_limit; END; $$ LANGUAGE plpgsql SECURITY DEFINER;

Read-only query function (for DB-as-skill)

sql
CREATE OR REPLACE FUNCTION ctx0_query(query_text TEXT) RETURNS JSONB LANGUAGE plpgsql SECURITY DEFINER SET search_path = public AS $$ DECLARE result JSONB; BEGIN -- Validate: only SELECT allowed IF NOT (query_text ~* '^\s*SELECT\b') THEN RAISE EXCEPTION 'Only SELECT queries allowed'; END IF; -- Block dangerous patterns IF query_text ~* '\b(DROP|DELETE|TRUNCATE|INSERT|UPDATE|ALTER|CREATE|GRANT|REVOKE|EXEC)\b' THEN RAISE EXCEPTION 'Forbidden keyword in query'; END IF; -- Block multiple statements IF query_text ~ ';\s*\S' THEN RAISE EXCEPTION 'Multiple statements not allowed'; END IF; -- Execute with row limit EXECUTE format( 'SELECT jsonb_agg(row_to_json(t)) FROM (SELECT * FROM (%s) sub LIMIT 1000) t', query_text ) INTO result; RETURN COALESCE(result, '[]'::jsonb); END; $$; -- Only authenticated users can execute GRANT EXECUTE ON FUNCTION ctx0_query TO authenticated;

Row Level Security (RLS)

Enable RLS on all tables

sql
ALTER TABLE ctx0_user_config ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_entries ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_entry_versions ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_sessions ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_session_messages ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_learned_queries ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_embedding_queue ENABLE ROW LEVEL SECURITY; ALTER TABLE ctx0_working_memory ENABLE ROW LEVEL SECURITY;

User isolation policies

sql
-- Users can only access their own data CREATE POLICY "user_isolation" ON ctx0_user_config FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_entries FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_entry_versions FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_sessions FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_session_messages FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_learned_queries FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_embedding_queue FOR ALL USING (auth.uid() = user_id); CREATE POLICY "user_isolation" ON ctx0_working_memory FOR ALL USING (auth.uid() = user_id);

Service role bypass

For daemon operations using service_role key:

sql
-- Service role can access all data (for daemon operations) CREATE POLICY "service_role_bypass" ON ctx0_user_config FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_entries FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_entry_versions FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_sessions FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_session_messages FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_learned_queries FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_embedding_queue FOR ALL USING (auth.jwt()->>'role' = 'service_role'); CREATE POLICY "service_role_bypass" ON ctx0_working_memory FOR ALL USING (auth.jwt()->>'role' = 'service_role');

Storage Bucket

Bucket: ctx0-vault

Stores file content (markdown, json) referenced by content_ref fields.

Structure:

ctx0-vault/
├── {userId}/
│   ├── vault/                     # Entry file contents
│   │   ├── .ctx0/                 # System config
│   │   ├── agents/                # Agent definitions
│   │   │   ├── ctx0/
│   │   │   │   ├── AGENT.md
│   │   │   │   └── config.json
│   │   │   └── curator/
│   │   ├── skills/                # Skill definitions
│   │   │   ├── gmail/
│   │   │   │   └── SKILL.md
│   │   │   └── slack/
│   │   ├── tools/                 # Tool definitions
│   │   ├── mcp/                   # MCP configs
│   │   ├── contacts/
│   │   ├── projects/
│   │   └── knowledge/
│   │
│   └── sessions/                  # Session archives
│       └── {sessionId}/
│           ├── messages.json      # Full message history
│           └── summary.md

Bucket settings:

  • Public: No (private)
  • File size limit: 50MB
  • Allowed MIME types: text/plain, text/markdown, application/json

Storage RLS policy:

sql
-- Users can only access their own files CREATE POLICY "user_files" ON storage.objects FOR ALL USING ( bucket_id = 'ctx0-vault' AND auth.uid()::text = (storage.foldername(name))[1] );

User Authentication Strategy

Self-hosted mode

  1. User provides Supabase URL + service_role key
  2. User authenticates via Bytespace OAuth
  3. user_id from Bytespace is used in their database
  4. Service role key bypasses RLS for daemon operations

Bytespace hosted mode

  1. User authenticates via Bytespace OAuth
  2. user_id from Bytespace auth
  3. Data stored in shared Bytespace Supabase
  4. RLS isolates user data

Why Bytespace auth for self-hosted?

  • Single identity across both modes
  • Easy migration from self-hosted to hosted
  • Bytespace manages identity, user manages data
  • Future features: sharing, collaboration

Setup Workflow

Self-Hosted Onboarding Steps

  1. Create Supabase Project

    • Create free project at supabase.com
    • Note project URL
  2. Get API Keys

    • Go to Project Settings → API Keys → Legacy tab
    • Copy anon key and service_role key
  3. Create Storage Bucket

    • Go to Storage → New Bucket
    • Name: ctx0-vault
    • Public: No
  4. Run Schema Migration

    • Go to SQL Editor
    • Paste migration SQL
    • Click Run
  5. Verify Setup

    • App checks tables exist
    • App checks bucket exists
  6. Connect Bytespace Account

    • OAuth flow to get user_id
    • Links identity for future features

Developer Tools

cleanup.sql (Dev Only)

This is for bot0/ctx0 developers only - not exposed to end users.

Use this to completely reset the schema when iterating on the architecture during development.

sql
-- ============================================================================ -- ctx0 cleanup.sql -- FOR DEVELOPERS ONLY - Drops all ctx0 tables for fresh iteration -- ============================================================================ -- Drop tables in reverse dependency order DROP TABLE IF EXISTS ctx0_working_memory CASCADE; DROP TABLE IF EXISTS ctx0_embedding_queue CASCADE; DROP TABLE IF EXISTS ctx0_learned_queries CASCADE; DROP TABLE IF EXISTS ctx0_session_messages CASCADE; DROP TABLE IF EXISTS ctx0_sessions CASCADE; DROP TABLE IF EXISTS ctx0_entry_versions CASCADE; DROP TABLE IF EXISTS ctx0_entries CASCADE; DROP TABLE IF EXISTS ctx0_user_config CASCADE; DROP TABLE IF EXISTS ctx0_schema_versions CASCADE; -- Drop functions DROP FUNCTION IF EXISTS ctx0_update_updated_at CASCADE; DROP FUNCTION IF EXISTS ctx0_semantic_search CASCADE; DROP FUNCTION IF EXISTS ctx0_query CASCADE; -- Drop storage bucket contents (run separately or via API) -- DELETE FROM storage.objects WHERE bucket_id = 'ctx0-vault'; -- ============================================================================ -- After running this, run the latest migration SQL to recreate tables -- ============================================================================

Clear storage bucket (Dev)

typescript
// Delete all files in ctx0-vault bucket const { data: files } = await supabase.storage .from('ctx0-vault') .list('', { limit: 1000 }); for (const file of files) { await supabase.storage.from('ctx0-vault').remove([file.name]); }

Reset local config (Dev)

bash
rm -rf ~/.ctx0 # Remove ctx0-related fields from ~/.bot0/config.json

Schema Migrations (Future - For Users)

Note: This section describes future functionality for self-hosted users to upgrade their schema.

Self-hosted users will be able to upgrade their ctx0 schema via the desktop app:

  1. App detects new schema version available (v1 → v2)
  2. Shows notification: "ctx0 schema update available"
  3. User clicks "Apply Update"
  4. App runs migration SQL to upgrade tables

This is NOT the same as cleanup.sql - migrations preserve user data while adding/modifying schema.

See Schema Versioning for the full migration system design.