ctx0-skills.md

ctx0 Skills (DB-as-Skill)

This document defines the skill system for ctx0, which enables agents to query sync tables using the DB-as-Skill pattern.


Overview

External data syncs (Gmail, Slack, Calendar, etc.) are stored in Supabase tables. Rather than exporting thousands of rows as files, we treat database tables as skills - the agent learns the schema and writes SQL queries to extract what it needs.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DB AS SKILL PATTERN                                  │
│                                                                              │
│   Traditional approach:            DB as Skill:                              │
│   ────────────────────             ────────────                              │
│                                                                              │
│   50,000 emails                    SKILL.md (schema + examples)              │
│       ↓                                ↓                                     │
│   Export to 50,000 .md files       Agent reads schema                        │
│       ↓                                ↓                                     │
│   Agent browses files              Agent writes SQL query                    │
│       ↓                                ↓                                     │
│   Slow, stale, overwhelming        Fast, fresh, precise                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Benefits:

  • Agent understands data shape (schema)
  • Agent sees examples (knows what data looks like)
  • Agent writes precise SQL queries
  • No need to export thousands of rows
  • Always fresh data (queries live DB)
  • Agent decides what's relevant to extract/store

Skill File Structure

Each skill lives in its own folder under /skills/:

skills/
├── _index.md                 # Overview of connected syncs
├── gmail/
│   ├── SKILL.md              # Schema + examples + query patterns
│   └── _results/             # Agent-created query result cache
│       ├── investor-threads.md
│       └── term-sheet-emails.md
├── slack/
│   ├── SKILL.md
│   └── _results/
├── calendar/
│   ├── SKILL.md
│   └── _results/
└── {user-skills}/

SKILL.md Format

Each skill has a SKILL.md file that teaches the agent how to query that data source.

yaml
--- name: gmail description: Query email data from Gmail sync. Use when user asks about emails, contacts from email, or communication history. table: sync_gmail disable-model-invocation: false --- # Gmail Skill Query email data using `ctx0_sql`. Save useful results to `_results/`. ## Connection Info - **Service:** Gmail - **Last sync:** 2026-01-29T10:30:00Z - **Total records:** 47,832 - **Sync frequency:** Every 15 minutes ## Schema \`\`\`sql CREATE TABLE sync_gmail ( id UUID PRIMARY KEY, user_id UUID NOT NULL, -- Gmail identifiers message_id TEXT UNIQUE, thread_id TEXT, -- Participants from_email TEXT, from_name TEXT, to_emails TEXT[], cc_emails TEXT[], -- Content subject TEXT, body_preview TEXT, -- First 500 chars body_full TEXT, -- Full body (use sparingly, large) -- Metadata labels TEXT[], -- ['INBOX', 'IMPORTANT', 'STARRED', 'SENT'] is_read BOOLEAN, is_starred BOOLEAN, has_attachments BOOLEAN, attachment_names TEXT[], -- Timestamps received_at TIMESTAMPTZ, created_at TIMESTAMPTZ, -- Vector (for semantic search) embedding VECTOR(1536) ); \`\`\` ## Example Data | from_email | from_name | subject | labels | received_at | |------------|-----------|---------|--------|-------------| | sarah@sequoia.com | Sarah Chen | RE: Term sheet follow-up | ['INBOX', 'IMPORTANT'] | 2026-01-29 08:30 | | mike@ycombinator.com | Mike Seibel | Demo day prep | ['INBOX'] | 2026-01-28 14:22 | | notifications@github.com | GitHub | [bot0] PR #42 merged | ['INBOX'] | 2026-01-28 09:15 | ## Query Patterns ### Recent emails from a contact \`\`\`sql SELECT subject, body_preview, received_at FROM sync_gmail WHERE from_email ILIKE '%sequoia%' ORDER BY received_at DESC LIMIT 20; \`\`\` ### Unread important emails \`\`\`sql SELECT from_name, subject, received_at FROM sync_gmail WHERE 'IMPORTANT' = ANY(labels) AND NOT is_read ORDER BY received_at DESC; \`\`\` ### Emails mentioning a topic \`\`\`sql SELECT from_name, subject, body_preview, received_at FROM sync_gmail WHERE body_full ILIKE '%term sheet%' OR subject ILIKE '%term sheet%' ORDER BY received_at DESC LIMIT 20; \`\`\` ## Notes - `body_full` is large - use `body_preview` when possible - RLS automatically filters to current user's emails - Only SELECT queries are allowed (read-only) - Results are limited to 1000 rows max

Frontmatter Fields

FieldRequiredDescription
nameYesSkill identifier
descriptionYesWhen to use this skill
tableYesThe sync table this skill queries
disable-model-invocationNoIf true, only manual invocation

Agent Workflow

When an agent needs data from a sync source:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         AGENT QUERY WORKFLOW                                 │
│                                                                              │
│   1. Agent needs email context                                               │
│      ↓                                                                       │
│   2. Reads /skills/gmail/SKILL.md                                            │
│      "I see the schema: from_email, subject, body_preview, labels..."        │
│      ↓                                                                       │
│   3. Writes appropriate SQL query                                            │
│      SELECT from_name, subject, body_preview                                 │
│      FROM sync_gmail                                                         │
│      WHERE from_email ILIKE '%investor%'                                     │
│      ORDER BY received_at DESC LIMIT 20                                      │
│      ↓                                                                       │
│   4. Executes via ctx0_sql                                                   │
│      ↓                                                                       │
│   5. Analyzes results                                                        │
│      ↓                                                                       │
│   6. Decision: Save to _results/ or just use in context?                     │
│      - Important/reusable → Save to _results/investor-emails.md              │
│      - One-time use → Just use in current response                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Example Agent Interaction

User: "What have investors said about our valuation?"

Agent thinking:
1. Need to search emails from investors about valuation
2. Check /skills/gmail/SKILL.md for schema
3. Also check /contacts/ to identify investor emails

Agent actions:
1. ctx0_retrieve("list of investor contacts")
   → Returns: sarah@sequoia.com, mike@a16z.com, etc.

2. ctx0_sql(`
     SELECT from_name, subject, body_preview, received_at
     FROM sync_gmail
     WHERE from_email IN ('sarah@sequoia.com', 'mike@a16z.com')
       AND (body_full ILIKE '%valuation%' OR subject ILIKE '%valuation%')
     ORDER BY received_at DESC
     LIMIT 20
   `)
   → Returns: 8 relevant emails

3. Synthesizes answer from results

4. ctx0_remember({
     content: "Summary of investor valuation feedback...",
     path: "/skills/gmail/_results/investor-valuation-feedback.md"
   })

Security Model

Agent queries must be secure. Four layers of protection:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         SECURITY LAYERS                                      │
│                                                                              │
│   Layer 1: Application Validation                                            │
│   └── Block dangerous keywords before query reaches DB                       │
│                                                                              │
│   Layer 2: PostgreSQL Function                                               │
│   └── Only execute SELECT, reject all mutations                              │
│                                                                              │
│   Layer 3: Row-Level Security (RLS)                                          │
│   └── user_id filter automatically applied                                   │
│                                                                              │
│   Layer 4: Service Role Context                                              │
│   └── Daemon uses service_role, agent queries scoped to user                 │
│                                                                              │
│   Result: Agent can ONLY read their own user's data                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Layer 1: Application Validation

typescript
class Ctx0SqlTool { private forbiddenPatterns = [ /\bDROP\b/i, /\bDELETE\b/i, /\bTRUNCATE\b/i, /\bINSERT\b/i, /\bUPDATE\b/i, /\bALTER\b/i, /\bCREATE\b/i, /\bGRANT\b/i, /\bREVOKE\b/i, ]; validateQuery(sql: string): void { if (!/^\s*SELECT\b/i.test(sql)) { throw new Error('Only SELECT queries are allowed'); } for (const pattern of this.forbiddenPatterns) { if (pattern.test(sql)) { throw new Error(`Forbidden keyword in query`); } } } }

Layer 2: PostgreSQL Function

The ctx0_query function wraps all agent queries:

sql
CREATE OR REPLACE FUNCTION ctx0_query(query_text TEXT) RETURNS JSONB LANGUAGE plpgsql SECURITY DEFINER SET search_path = public AS $$ DECLARE result JSONB; BEGIN -- Validate: only SELECT allowed IF NOT (query_text ~* '^\s*SELECT\b') THEN RAISE EXCEPTION 'Only SELECT queries allowed'; END IF; -- Block dangerous patterns IF query_text ~* '\b(DROP|DELETE|TRUNCATE|INSERT|UPDATE|ALTER|CREATE|GRANT|REVOKE|EXEC)\b' THEN RAISE EXCEPTION 'Forbidden keyword in query'; END IF; -- Block multiple statements IF query_text ~ ';\s*\S' THEN RAISE EXCEPTION 'Multiple statements not allowed'; END IF; -- Execute with row limit EXECUTE format( 'SELECT jsonb_agg(row_to_json(t)) FROM (SELECT * FROM (%s) sub LIMIT 1000) t', query_text ) INTO result; RETURN COALESCE(result, '[]'::jsonb); END; $$;

Layer 3: RLS Auto-Filter

Row-Level Security automatically filters queries:

sql
-- Agent writes this query: SELECT * FROM sync_gmail WHERE subject ILIKE '%term sheet%' -- RLS automatically transforms it to: SELECT * FROM sync_gmail WHERE subject ILIKE '%term sheet%' AND user_id = 'current-user-uuid' -- ← Injected by RLS

Sync Table Schemas

sync_gmail

sql
CREATE TABLE sync_gmail ( id UUID PRIMARY KEY, user_id UUID NOT NULL, message_id TEXT UNIQUE, thread_id TEXT, from_email TEXT, from_name TEXT, to_emails TEXT[], cc_emails TEXT[], subject TEXT, body_preview TEXT, body_full TEXT, labels TEXT[], is_read BOOLEAN, is_starred BOOLEAN, has_attachments BOOLEAN, attachment_names TEXT[], received_at TIMESTAMPTZ, created_at TIMESTAMPTZ, embedding VECTOR(1536) );

sync_slack

sql
CREATE TABLE sync_slack ( id UUID PRIMARY KEY, user_id UUID NOT NULL, message_id TEXT UNIQUE, channel_id TEXT, channel_name TEXT, thread_ts TEXT, sender_id TEXT, sender_name TEXT, sender_email TEXT, text TEXT, message_type TEXT, is_thread_reply BOOLEAN, has_attachments BOOLEAN, reactions JSONB, sent_at TIMESTAMPTZ, created_at TIMESTAMPTZ, embedding VECTOR(1536) );

sync_calendar

sql
CREATE TABLE sync_calendar ( id UUID PRIMARY KEY, user_id UUID NOT NULL, event_id TEXT UNIQUE, calendar_id TEXT, title TEXT, description TEXT, location TEXT, organizer_email TEXT, attendees JSONB, start_at TIMESTAMPTZ, end_at TIMESTAMPTZ, is_all_day BOOLEAN, timezone TEXT, status TEXT, recurrence TEXT, meeting_link TEXT, created_at TIMESTAMPTZ, updated_at TIMESTAMPTZ, embedding VECTOR(1536) );

sync_notion

sql
CREATE TABLE sync_notion ( id UUID PRIMARY KEY, user_id UUID NOT NULL, page_id TEXT UNIQUE, parent_id TEXT, title TEXT, content_preview TEXT, content_full TEXT, page_type TEXT, properties JSONB, notion_created TIMESTAMPTZ, notion_updated TIMESTAMPTZ, created_at TIMESTAMPTZ, embedding VECTOR(1536) );

Creating Custom Skills

Step 1: Set Up Sync

First, create the sync table and populate it with data from your integration.

Step 2: Create SKILL.md

yaml
--- name: my-crm description: Query CRM data for leads, deals, and customer information. table: sync_crm --- # CRM Skill Query CRM data using `ctx0_sql`. ## Schema \`\`\`sql CREATE TABLE sync_crm ( id UUID PRIMARY KEY, user_id UUID NOT NULL, contact_id TEXT UNIQUE, name TEXT, email TEXT, company TEXT, deal_stage TEXT, deal_value NUMERIC, last_contact TIMESTAMPTZ, notes TEXT, created_at TIMESTAMPTZ, embedding VECTOR(1536) ); \`\`\` ## Query Patterns ### Find high-value deals \`\`\`sql SELECT name, company, deal_stage, deal_value FROM sync_crm WHERE deal_value > 10000 ORDER BY deal_value DESC; \`\`\` ### Contacts needing follow-up \`\`\`sql SELECT name, email, last_contact FROM sync_crm WHERE last_contact < NOW() - INTERVAL '7 days' ORDER BY last_contact ASC; \`\`\`

Step 3: Add to Agent Skills

In your agent's AGENT.md:

yaml
--- skills: - gmail - my-crm ---

_results/ Folder

Each skill has a _results/ folder where agents save useful query results.

Purpose:

  • Cache frequently-accessed data
  • Store synthesized insights
  • Avoid re-running expensive queries

Example structure:

skills/gmail/_results/
├── investor-threads.md         # Summary of investor email threads
├── term-sheet-emails.md        # Emails about term sheets
└── meeting-requests-jan.md     # Meeting requests from January

Example result file:

markdown
--- query: "investor emails about valuation" created_at: 2026-01-29T10:30:00Z source_table: sync_gmail row_count: 8 --- # Investor Emails About Valuation ## Summary 8 emails from investors mentioning valuation: - Sarah Chen (Sequoia): Comfortable with $50M - Mike at A16Z: Wants to discuss further - ... ## Raw Results | from_name | subject | received_at | |-----------|---------|-------------| | Sarah Chen | RE: Valuation discussion | 2026-01-28 | | ... | ... | ... |

Best Practices

Writing Good Skill Documentation

  1. Complete schema - Include all columns with types
  2. Example data - Show what real data looks like
  3. Query patterns - Provide templates for common queries
  4. Notes - Document gotchas (large columns, performance tips)

Query Optimization

  1. Use body_preview instead of body_full when possible
  2. Always include LIMIT clause
  3. Use indexes (from_email, received_at, labels)
  4. Filter early with WHERE clause

Caching Results

  • Save reusable insights to _results/
  • Include metadata (query, date, row count)
  • Update periodically if data changes