ctx0 Skills (DB-as-Skill)

This document defines the skill system for ctx0, which enables agents to query sync tables using the DB-as-Skill pattern.

Overview

External data syncs (Gmail, Slack, Calendar, etc.) are stored in Supabase tables. Rather than exporting thousands of rows as files, we treat database tables as skills - the agent learns the schema and writes SQL queries to extract what it needs.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DB AS SKILL PATTERN                                  │
│                                                                              │
│   Traditional approach:            DB as Skill:                              │
│   ────────────────────             ────────────                              │
│                                                                              │
│   50,000 emails                    SKILL.md (schema + examples)              │
│       ↓                                ↓                                     │
│   Export to 50,000 .md files       Agent reads schema                        │
│       ↓                                ↓                                     │
│   Agent browses files              Agent writes SQL query                    │
│       ↓                                ↓                                     │
│   Slow, stale, overwhelming        Fast, fresh, precise                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Benefits:

Agent understands data shape (schema)
Agent sees examples (knows what data looks like)
Agent writes precise SQL queries
No need to export thousands of rows
Always fresh data (queries live DB)
Agent decides what's relevant to extract/store

Skill File Structure

Each skill lives in its own folder under /skills/:

skills/
├── _index.md                 # Overview of connected syncs
├── gmail/
│   ├── SKILL.md              # Schema + examples + query patterns
│   └── _results/             # Agent-created query result cache
│       ├── investor-threads.md
│       └── term-sheet-emails.md
├── slack/
│   ├── SKILL.md
│   └── _results/
├── calendar/
│   ├── SKILL.md
│   └── _results/
└── {user-skills}/

SKILL.md Format

Each skill has a SKILL.md file that teaches the agent how to query that data source.

yaml

---
name: gmail
description: Query email data from Gmail sync. Use when user asks about emails, contacts from email, or communication history.
table: sync_gmail
disable-model-invocation: false
---

# Gmail Skill

Query email data using `ctx0_sql`. Save useful results to `_results/`.

## Connection Info

- **Service:** Gmail
- **Last sync:** 2026-01-29T10:30:00Z
- **Total records:** 47,832
- **Sync frequency:** Every 15 minutes

## Schema

\`\`\`sql
CREATE TABLE sync_gmail (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,

  -- Gmail identifiers
  message_id TEXT UNIQUE,
  thread_id TEXT,

  -- Participants
  from_email TEXT,
  from_name TEXT,
  to_emails TEXT[],
  cc_emails TEXT[],

  -- Content
  subject TEXT,
  body_preview TEXT,         -- First 500 chars
  body_full TEXT,            -- Full body (use sparingly, large)

  -- Metadata
  labels TEXT[],             -- ['INBOX', 'IMPORTANT', 'STARRED', 'SENT']
  is_read BOOLEAN,
  is_starred BOOLEAN,
  has_attachments BOOLEAN,
  attachment_names TEXT[],

  -- Timestamps
  received_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ,

  -- Vector (for semantic search)
  embedding VECTOR(1536)
);
\`\`\`

## Example Data

| from_email | from_name | subject | labels | received_at |
|------------|-----------|---------|--------|-------------|
| sarah@sequoia.com | Sarah Chen | RE: Term sheet follow-up | ['INBOX', 'IMPORTANT'] | 2026-01-29 08:30 |
| mike@ycombinator.com | Mike Seibel | Demo day prep | ['INBOX'] | 2026-01-28 14:22 |
| notifications@github.com | GitHub | [bot0] PR #42 merged | ['INBOX'] | 2026-01-28 09:15 |

## Query Patterns

### Recent emails from a contact
\`\`\`sql
SELECT subject, body_preview, received_at
FROM sync_gmail
WHERE from_email ILIKE '%sequoia%'
ORDER BY received_at DESC
LIMIT 20;
\`\`\`

### Unread important emails
\`\`\`sql
SELECT from_name, subject, received_at
FROM sync_gmail
WHERE 'IMPORTANT' = ANY(labels)
  AND NOT is_read
ORDER BY received_at DESC;
\`\`\`

### Emails mentioning a topic
\`\`\`sql
SELECT from_name, subject, body_preview, received_at
FROM sync_gmail
WHERE body_full ILIKE '%term sheet%'
   OR subject ILIKE '%term sheet%'
ORDER BY received_at DESC
LIMIT 20;
\`\`\`

## Notes

- `body_full` is large - use `body_preview` when possible
- RLS automatically filters to current user's emails
- Only SELECT queries are allowed (read-only)
- Results are limited to 1000 rows max

Frontmatter Fields

Field	Required	Description
`name`	Yes	Skill identifier
`description`	Yes	When to use this skill
`table`	Yes	The sync table this skill queries
`disable-model-invocation`	No	If true, only manual invocation

Agent Workflow

When an agent needs data from a sync source:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         AGENT QUERY WORKFLOW                                 │
│                                                                              │
│   1. Agent needs email context                                               │
│      ↓                                                                       │
│   2. Reads /skills/gmail/SKILL.md                                            │
│      "I see the schema: from_email, subject, body_preview, labels..."        │
│      ↓                                                                       │
│   3. Writes appropriate SQL query                                            │
│      SELECT from_name, subject, body_preview                                 │
│      FROM sync_gmail                                                         │
│      WHERE from_email ILIKE '%investor%'                                     │
│      ORDER BY received_at DESC LIMIT 20                                      │
│      ↓                                                                       │
│   4. Executes via ctx0_sql                                                   │
│      ↓                                                                       │
│   5. Analyzes results                                                        │
│      ↓                                                                       │
│   6. Decision: Save to _results/ or just use in context?                     │
│      - Important/reusable → Save to _results/investor-emails.md              │
│      - One-time use → Just use in current response                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Example Agent Interaction

User: "What have investors said about our valuation?"

Agent thinking:
1. Need to search emails from investors about valuation
2. Check /skills/gmail/SKILL.md for schema
3. Also check /contacts/ to identify investor emails

Agent actions:
1. ctx0_retrieve("list of investor contacts")
   → Returns: sarah@sequoia.com, mike@a16z.com, etc.

2. ctx0_sql(`
     SELECT from_name, subject, body_preview, received_at
     FROM sync_gmail
     WHERE from_email IN ('sarah@sequoia.com', 'mike@a16z.com')
       AND (body_full ILIKE '%valuation%' OR subject ILIKE '%valuation%')
     ORDER BY received_at DESC
     LIMIT 20
   `)
   → Returns: 8 relevant emails

3. Synthesizes answer from results

4. ctx0_remember({
     content: "Summary of investor valuation feedback...",
     path: "/skills/gmail/_results/investor-valuation-feedback.md"
   })

Security Model

Agent queries must be secure. Four layers of protection:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         SECURITY LAYERS                                      │
│                                                                              │
│   Layer 1: Application Validation                                            │
│   └── Block dangerous keywords before query reaches DB                       │
│                                                                              │
│   Layer 2: PostgreSQL Function                                               │
│   └── Only execute SELECT, reject all mutations                              │
│                                                                              │
│   Layer 3: Row-Level Security (RLS)                                          │
│   └── user_id filter automatically applied                                   │
│                                                                              │
│   Layer 4: Service Role Context                                              │
│   └── Daemon uses service_role, agent queries scoped to user                 │
│                                                                              │
│   Result: Agent can ONLY read their own user's data                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Layer 1: Application Validation

typescript

class Ctx0SqlTool {
  private forbiddenPatterns = [
    /\bDROP\b/i,
    /\bDELETE\b/i,
    /\bTRUNCATE\b/i,
    /\bINSERT\b/i,
    /\bUPDATE\b/i,
    /\bALTER\b/i,
    /\bCREATE\b/i,
    /\bGRANT\b/i,
    /\bREVOKE\b/i,
  ];

  validateQuery(sql: string): void {
    if (!/^\s*SELECT\b/i.test(sql)) {
      throw new Error('Only SELECT queries are allowed');
    }

    for (const pattern of this.forbiddenPatterns) {
      if (pattern.test(sql)) {
        throw new Error(`Forbidden keyword in query`);
      }
    }
  }
}

Layer 2: PostgreSQL Function

The ctx0_query function wraps all agent queries:

sql

CREATE OR REPLACE FUNCTION ctx0_query(query_text TEXT)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = public
AS $$
DECLARE
  result JSONB;
BEGIN
  -- Validate: only SELECT allowed
  IF NOT (query_text ~* '^\s*SELECT\b') THEN
    RAISE EXCEPTION 'Only SELECT queries allowed';
  END IF;

  -- Block dangerous patterns
  IF query_text ~* '\b(DROP|DELETE|TRUNCATE|INSERT|UPDATE|ALTER|CREATE|GRANT|REVOKE|EXEC)\b' THEN
    RAISE EXCEPTION 'Forbidden keyword in query';
  END IF;

  -- Block multiple statements
  IF query_text ~ ';\s*\S' THEN
    RAISE EXCEPTION 'Multiple statements not allowed';
  END IF;

  -- Execute with row limit
  EXECUTE format(
    'SELECT jsonb_agg(row_to_json(t)) FROM (SELECT * FROM (%s) sub LIMIT 1000) t',
    query_text
  ) INTO result;

  RETURN COALESCE(result, '[]'::jsonb);
END;
$$;

Layer 3: RLS Auto-Filter

Row-Level Security automatically filters queries:

sql

-- Agent writes this query:
SELECT * FROM sync_gmail WHERE subject ILIKE '%term sheet%'

-- RLS automatically transforms it to:
SELECT * FROM sync_gmail
WHERE subject ILIKE '%term sheet%'
  AND user_id = 'current-user-uuid'  -- ← Injected by RLS

Sync Table Schemas

sync_gmail

sql

CREATE TABLE sync_gmail (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  message_id TEXT UNIQUE,
  thread_id TEXT,
  from_email TEXT,
  from_name TEXT,
  to_emails TEXT[],
  cc_emails TEXT[],
  subject TEXT,
  body_preview TEXT,
  body_full TEXT,
  labels TEXT[],
  is_read BOOLEAN,
  is_starred BOOLEAN,
  has_attachments BOOLEAN,
  attachment_names TEXT[],
  received_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ,
  embedding VECTOR(1536)
);

sync_slack

sql

CREATE TABLE sync_slack (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  message_id TEXT UNIQUE,
  channel_id TEXT,
  channel_name TEXT,
  thread_ts TEXT,
  sender_id TEXT,
  sender_name TEXT,
  sender_email TEXT,
  text TEXT,
  message_type TEXT,
  is_thread_reply BOOLEAN,
  has_attachments BOOLEAN,
  reactions JSONB,
  sent_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ,
  embedding VECTOR(1536)
);

sync_calendar

sql

CREATE TABLE sync_calendar (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  event_id TEXT UNIQUE,
  calendar_id TEXT,
  title TEXT,
  description TEXT,
  location TEXT,
  organizer_email TEXT,
  attendees JSONB,
  start_at TIMESTAMPTZ,
  end_at TIMESTAMPTZ,
  is_all_day BOOLEAN,
  timezone TEXT,
  status TEXT,
  recurrence TEXT,
  meeting_link TEXT,
  created_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ,
  embedding VECTOR(1536)
);

sync_notion

sql

CREATE TABLE sync_notion (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  page_id TEXT UNIQUE,
  parent_id TEXT,
  title TEXT,
  content_preview TEXT,
  content_full TEXT,
  page_type TEXT,
  properties JSONB,
  notion_created TIMESTAMPTZ,
  notion_updated TIMESTAMPTZ,
  created_at TIMESTAMPTZ,
  embedding VECTOR(1536)
);

Creating Custom Skills

Step 1: Set Up Sync

First, create the sync table and populate it with data from your integration.

Step 2: Create SKILL.md

yaml

---
name: my-crm
description: Query CRM data for leads, deals, and customer information.
table: sync_crm
---

# CRM Skill

Query CRM data using `ctx0_sql`.

## Schema

\`\`\`sql
CREATE TABLE sync_crm (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  contact_id TEXT UNIQUE,
  name TEXT,
  email TEXT,
  company TEXT,
  deal_stage TEXT,
  deal_value NUMERIC,
  last_contact TIMESTAMPTZ,
  notes TEXT,
  created_at TIMESTAMPTZ,
  embedding VECTOR(1536)
);
\`\`\`

## Query Patterns

### Find high-value deals
\`\`\`sql
SELECT name, company, deal_stage, deal_value
FROM sync_crm
WHERE deal_value > 10000
ORDER BY deal_value DESC;
\`\`\`

### Contacts needing follow-up
\`\`\`sql
SELECT name, email, last_contact
FROM sync_crm
WHERE last_contact < NOW() - INTERVAL '7 days'
ORDER BY last_contact ASC;
\`\`\`

Step 3: Add to Agent Skills

In your agent's AGENT.md:

yaml

---
skills:
  - gmail
  - my-crm
---

_results/ Folder

Each skill has a _results/ folder where agents save useful query results.

Purpose:

Cache frequently-accessed data
Store synthesized insights
Avoid re-running expensive queries

Example structure:

skills/gmail/_results/
├── investor-threads.md         # Summary of investor email threads
├── term-sheet-emails.md        # Emails about term sheets
└── meeting-requests-jan.md     # Meeting requests from January

Example result file:

markdown

---
query: "investor emails about valuation"
created_at: 2026-01-29T10:30:00Z
source_table: sync_gmail
row_count: 8
---

# Investor Emails About Valuation

## Summary

8 emails from investors mentioning valuation:
- Sarah Chen (Sequoia): Comfortable with $50M
- Mike at A16Z: Wants to discuss further
- ...

## Raw Results

| from_name | subject | received_at |
|-----------|---------|-------------|
| Sarah Chen | RE: Valuation discussion | 2026-01-28 |
| ... | ... | ... |

Best Practices

Writing Good Skill Documentation

Complete schema - Include all columns with types
Example data - Show what real data looks like
Query patterns - Provide templates for common queries
Notes - Document gotchas (large columns, performance tips)

Query Optimization

Use body_preview instead of body_full when possible
Always include LIMIT clause
Use indexes (from_email, received_at, labels)
Filter early with WHERE clause

Caching Results

Save reusable insights to _results/
Include metadata (query, date, row count)
Update periodically if data changes

ctx0 Agents - Agent system design
ctx0 Vault Structure - Folder organization
ctx0 Supabase Data Architecture - Database schema

ctx0 Skills (DB-as-Skill)

Overview

Skill File Structure

SKILL.md Format

Frontmatter Fields

Agent Workflow

Example Agent Interaction

Security Model

Layer 1: Application Validation

Layer 2: PostgreSQL Function

Layer 3: RLS Auto-Filter

Sync Table Schemas

sync_gmail

sync_slack

sync_calendar

sync_notion

Creating Custom Skills

Step 1: Set Up Sync

Step 2: Create SKILL.md

Step 3: Add to Agent Skills

_results/ Folder

Best Practices

Writing Good Skill Documentation

Query Optimization

Caching Results

Related Documentation