OLAP Analytics Schema Guide

Strategic Alignment

Strategic Alignment: The OLAP analytics schema exists exclusively to power durable, business-critical analytics for PenguinMails. It is not used for live product state, notifications, or infrastructure logging.

Scope:

  • Aggregated metrics for campaigns, mailboxes, leads, warmups.

  • Billing and usage analytics.

  • Compliance-relevant audit summaries.

  • Long-term trend and BI queries.

Out of Scope:

  • User-facing notifications.

  • Live system events/incidents.

  • Raw logs, infra metrics, rate limits, or queue/job internals.

  • Heavy content (email bodies, attachments).

For those concerns:


OLTP vs OLAP Separation of Concerns

What Lives in OLTP (Operational Billing)

  • βœ… Subscription records (subscriptions, plans, payments)
  • βœ… Stripe object references (stripe_subscription_id, stripe_product_id)
  • βœ… Current subscription state (status, current_period_end)
  • βœ… Billing lifecycle (cancel_reason, cancel_date)

What Lives in OLAP (Analytics)

  • βœ… Email campaign metrics (campaign_analytics.sent, delivered, opened)
  • βœ… Mailbox performance (mailbox_analytics.health_score)
  • βœ… Lead engagement (lead_analytics.replied)
  • βœ… Aggregated usage summaries (billing_analytics.emails_sent)

How Billing Uses Analytics

When billing needs analytics data (e.g., usage-based billing checks):

// βœ… CORRECT: Query OLAP from billing controller
const usage = await olapClient.query(`
  SELECT SUM(sent) as total_sent
  FROM campaign_analytics
  WHERE tenant_id = $1 
    AND updated >= $2 
    AND updated < $3
`, [tenantId, periodStart, periodEnd]);

if (usage.total_sent > currentPlan.max_emails_per_month) {
  // Handle overage
}

Anti-pattern:

// ❌ WRONG: Don't duplicate analytics in billing tables
// This violates separation of concerns and creates data redundancy

billing_analytics Table Purpose

Clarification: This table is usage aggregation for all purposes, not just billing.

Use cases:

  • Billing usage summaries
  • Dashboard widgets (resource counts)
  • Executive reports (tenant growth)

NOT for:

  • Detailed campaign metrics β†’ Use campaign_analytics
  • Email deliverability insights β†’ Use mailbox_analytics
  • Lead engagement scoring β†’ Use lead_analytics

1. Core Analytics Tables

The OLAP warehouse contains the following canonical tables:

  1. billing_analytics

  2. campaign_analytics

  3. mailbox_analytics

  4. lead_analytics

  5. warmup_analytics

  6. sequence_step_analytics

  7. admin_audit_log (compliance-focused only)

All other entities (notifications, system events, infra telemetry, etc.) are deliberately excluded or handled in other tiers.

1.1 Update Process & Data Freshness

OLAP tables are updated via queued batch jobs, NOT real-time:

Aggregation Schedule:

  • Daily batch jobs (typically 2 AM UTC)
  • Weekly rollup jobs (Sundays for week-over-week analytics)

Dependency Chain:

graph LR
    A[Email Events] --> B[mailbox_analytics]
    A --> C[lead_analytics]
    B --> D[campaign_analytics]
    C --> D
    D --> E[billing_analytics]
    E --> F[sequence_step_analytics]

Order of Operations:

  1. Email-level events aggregated into mailbox_analytics and lead_analytics
  2. Campaign metrics calculated from mailbox/lead data β†’ campaign_analytics
  3. Billing period summaries rolled up β†’ billing_analytics
  4. Sequence step analytics linked to campaigns β†’ sequence_step_analytics

Implications:

  • Usage numbers may be 1-24 hours behind real-time activity
  • Always display updated timestamp on analytics dashboards
  • For real-time subscription status, query OLTP subscriptions table

1.2 Billing Analytics

CREATE TABLE billing_analytics (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    tenant_id TEXT NOT NULL,
    subscription_id TEXT,
    emails_sent INTEGER DEFAULT 0,
    mailboxes_used INTEGER DEFAULT 0,
    domains_used INTEGER DEFAULT 0,
    campaigns_used INTEGER DEFAULT 0,
    leads_used INTEGER DEFAULT 0,
    warmups_active INTEGER DEFAULT 0,
    period_start TIMESTAMPTZ NOT NULL,
    period_end TIMESTAMPTZ NOT NULL,
    updated TIMESTAMPTZ DEFAULT NOW()
);

CREATE UNIQUE INDEX idx_billing_analytics_tenant_period
    ON billing_analytics(tenant_id, period_start, period_end);

Purpose:

  • Aggregated usage per tenant per period.

  • Drives billing, quotas, and revenue analytics.

1.3 Campaign Analytics

CREATE TABLE campaign_analytics (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    campaign_id TEXT NOT NULL,
    company_id TEXT NOT NULL,
    sent INTEGER DEFAULT 0,
    delivered INTEGER DEFAULT 0,
    opened_tracked INTEGER DEFAULT 0,
    clicked_tracked INTEGER DEFAULT 0,
    replied INTEGER DEFAULT 0,
    bounced INTEGER DEFAULT 0,
    unsubscribed INTEGER DEFAULT 0,
    spam_complaints INTEGER DEFAULT 0,
    status TEXT,
    completed_leads INTEGER DEFAULT 0,
    billing_id BIGINT REFERENCES billing_analytics(id),
    updated TIMESTAMPTZ DEFAULT NOW()
);

Purpose:

  • Aggregated per-campaign performance for reporting and optimization.

1.4 Mailbox Analytics

CREATE TABLE mailbox_analytics (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    mailbox_id TEXT NOT NULL,
    company_id TEXT NOT NULL,
    sent INTEGER DEFAULT 0,
    delivered INTEGER DEFAULT 0,
    opened_tracked INTEGER DEFAULT 0,
    clicked_tracked INTEGER DEFAULT 0,
    replied INTEGER DEFAULT 0,
    bounced INTEGER DEFAULT 0,
    unsubscribed INTEGER DEFAULT 0,
    spam_complaints INTEGER DEFAULT 0,
    warmup_status TEXT,
    health_score INTEGER DEFAULT 0,
    current_volume INTEGER DEFAULT 0,
    billing_id BIGINT REFERENCES billing_analytics(id),
    campaign_status TEXT,
    updated TIMESTAMPTZ DEFAULT NOW()
);

Purpose:

  • Mailbox-level deliverability, health, and warmup analytics.

1.5 Lead Analytics

CREATE TABLE lead_analytics (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    lead_id TEXT NOT NULL,
    campaign_id TEXT NOT NULL,
    sent INTEGER DEFAULT 0,
    delivered INTEGER DEFAULT 0,
    opened_tracked INTEGER DEFAULT 0,
    clicked_tracked INTEGER DEFAULT 0,
    replied INTEGER DEFAULT 0,
    bounced INTEGER DEFAULT 0,
    unsubscribed INTEGER DEFAULT 0,
    spam_complaints INTEGER DEFAULT 0,
    status TEXT,
    billing_id BIGINT REFERENCES billing_analytics(id),
    updated TIMESTAMPTZ DEFAULT NOW()
);

Purpose:

  • Per-lead engagement summaries to support scoring and segmentation.

1.6 Warmup Analytics

CREATE TABLE warmup_analytics (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    mailbox_id TEXT NOT NULL,
    company_id TEXT NOT NULL,
    sent INTEGER DEFAULT 0,
    delivered INTEGER DEFAULT 0,
    opened_tracked INTEGER DEFAULT 0,
    clicked_tracked INTEGER DEFAULT 0,
    replied INTEGER DEFAULT 0,
    bounced INTEGER DEFAULT 0,
    unsubscribed INTEGER DEFAULT 0,
    spam_complaints INTEGER DEFAULT 0,
    health_score INTEGER DEFAULT 0,
    progress_percentage INTEGER DEFAULT 0,
    billing_id BIGINT REFERENCES billing_analytics(id),
    updated TIMESTAMPTZ DEFAULT NOW()
);

Purpose:

  • Warmup performance and reputation metrics.

1.7 Sequence Step Analytics

CREATE TABLE sequence_step_analytics (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    step_id TEXT NOT NULL,
    campaign_id TEXT NOT NULL,
    company_id TEXT NOT NULL,
    sent INTEGER DEFAULT 0,
    delivered INTEGER DEFAULT 0,
    opened_tracked INTEGER DEFAULT 0,
    clicked_tracked INTEGER DEFAULT 0,
    replied INTEGER DEFAULT 0,
    bounced INTEGER DEFAULT 0,
    unsubscribed INTEGER DEFAULT 0,
    spam_complaints INTEGER DEFAULT 0,
    billing_id BIGINT REFERENCES billing_analytics(id),
    updated TIMESTAMPTZ DEFAULT NOW()
);

Purpose:

  • Step-level performance to support optimization and A/B testing.

2. Administrative Audit (Compliance-Focused)

2.1 admin_audit_log

CREATE TABLE admin_audit_log (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    creation_time TIMESTAMPTZ DEFAULT NOW(),
    admin_user_id TEXT NOT NULL,
    action TEXT NOT NULL,
    resource_type TEXT NOT NULL,
    resource_id TEXT NOT NULL,
    tenant_id TEXT NOT NULL,
    old_values JSONB,
    new_values JSONB,
    ip_address TEXT,
    user_agent TEXT,
    notes TEXT,
    metadata JSONB,
    data_classification VARCHAR(20),
    retention_category VARCHAR(20)
);

Purpose:

  • OLAP-resident, compliance-scope audit log for high-risk actions:

    • Permission/role changes.

    • Billing/subscription changes.

    • Tenant-wide configuration changes.

    • Sensitive export approvals.

  • Not a full technical log stream.

Key constraints:

  • Do NOT store:

    • Raw performance metrics.

    • Low-risk UI events.

    • Full request/response payloads.

  • Those go to external logging (see external-analytics-logging).


3. Removed / Out-of-Scope for OLAP

The following are intentionally NOT present in OLAP (and must not be reintroduced):

  • admin_system_events:

    • Live/operational system events are owned by the Notifications DB:

    • OLAP may later define aggregates, but no admin_system_events base table exists here.

  • notifications:

    • Owned by Notifications DB, not OLAP.
  • analytics_connection_pools / analytics_pool_metrics / analytics_rate_limits / analytics_access_audit / analytics_export_controls:

    • Infra, config, or security telemetry concerns; moved to:

      • External logging/monitoring.

      • Appropriate operational stores.

  • Transactional email histories:

    • No generic transactional_emails fact table.

    • Operational behavior is via jobs + external logging; analytics only if/when explicitly required.

This keeps the OLAP schema lean, focused, and maintainable.


4. Relationships (High-Level)

See ER diagram:

Key relationships:

  • billing_analytics:

    • Hub for per-tenant period metrics.

    • Linked to:

      • campaign_analytics

      • mailbox_analytics

      • lead_analytics

      • warmup_analytics

      • sequence_step_analytics

  • campaign_analytics Ò†” sequence_step_analytics:

    • Per-campaign breakdown.
  • mailbox_analytics Ò†” warmup_analytics:

    • Per-mailbox warmup tracking.

Logical notes:

  • IDs like tenant_id, campaign_id, company_id, mailbox_id, lead_id are logical references to OLTP, denormalized for warehouse flexibility.

  • No foreign keys to operational schemas.


5. Security, RLS, and Access

Apply RLS and access controls to all OLAP tables:

  • Enforce tenant isolation where OLAP is exposed to tenants.

  • Restrict admin_audit_log to authorized roles and necessary scopes.

Detailed security and logging strategy:


6. Roadmap (Admin)

We explicitly defer any OLAP-specific modeling of admin/system events.

If future needs arise (not implemented now), consider:

  • Aggregated views, such as:

    • incidents_by_tenant_by_month

    • mean_time_to_resolve_by_severity

    • quota_breach_counts_over_time

These would be:

  • Derived from the Notifications DB (admin_system_events) and/or external logs.

  • Implemented as clearly named aggregate tables/views.

  • Still respecting OLAP’s β€œaggregated and lean” constraints.

Currently:

  • No such OLAP tables are defined.

  • All admin/system event analytics are future/roadmap only.


7. Summary

OLAP is now clearly constrained to:

  • Business-critical aggregates.

  • Compliance-focused summaries (admin_audit_log).

  • No live notifications or system events.

  • No infra/config/telemetry storage.

Notifications/system events, logs, and jobs:

  • Live in their respective tiers and feed OLAP only via intentional, aggregate pipelines when justified.

Use this guide as the authoritative reference for what belongs in the OLAP warehouse.