White Paper 3: The scale2rev Sovereign Architecture & Tech Stack

Version 1.3.0 · Date: February 3, 2026

Subject: Cloud-Native Intelligence, Regional Sovereignty, and Vector Persistence

Standards Focus: #4 (High-Stability Providers), #5 (Zero Hardcoding)

1. Regional Anchoring & Global Elasticity

scale2rev treats geography as a configuration, not a constraint.

The Current Anchor (us-east4): We currently leverage Northern Virginia (us-east4) to provide sub-millisecond latency for our core enterprise customers. This region serves as the initial sovereign perimeter for companies with 50+ users who require potent, low-latency sales enablement capabilities to drive complex revenue cycles.

Architecture for Expansion: scale2rev's "Zero Hardcoding" (Standard #5) means our environment variables and SystemConfig are region-agnostic. Our stack—comprised of Vercel Edgefunctions and Google Cloud's global backbone—is engineered to be cloned into new regions (e.g., europe-west1, australia-southeast1) as our customers' global data residency needs evolve.

2. The Persistence Layer: PostgreSQL + pgvector

We reject proprietary, external vector databases that create data silos. scale2rev utilizes PostgreSQL with the pgvector extension as its unified source of truth.

Unified Intelligence & Atomic Consistency:By storing relational metadata (Customers, Repositories) alongside high-dimensional vector embeddings in a single database, we eliminate the "Sync Gap." In traditional RAG architectures, if a document is deleted in the SQL database but the vector store fails to sync, the AI may still "hallucinate" based on stale vectors. In scale2rev, a deletion is a single ACID-compliant transaction across both text and embeddings.

Prisma Orchestration: We use Prisma ORMto maintain a type-safe schema. Every query for an "Atomic Fact" is an atomic operation that joins the Snippet table with the UserRepositoryAccess table. This means that at the very moment a vector is retrieved, the database is simultaneously checking the user's specific permission to see that exact repository. If the user doesn't have access, the data does not exist as far as the query is concerned.

3. Compute Sovereignty: Edge-to-Serverless Handoff

scale2rev leverages a multi-tier compute strategy to balance lightning-fast response times with heavy-duty AI processing.

The Edge Guard (Vercel Edge): All authentication, session verification, and initial routing occur at the Vercel Edge. By executing this logic at the point of presence nearest to the user, we eliminate the "Cold Start" latency of traditional serverless functions for the most frequent user interactions. This allows us to enforce Identity Sovereignty globally without a performance penalty.

Asynchronous Serverless Intelligence:Heavy operations—such as the Omnivore Ingestion Engine (parsing 100MB+ files) or the Reflective RAG synthesis—are handed off to Vercel Serverless Functions. These environments provide the dedicated memory and CPU cycles required for intensive AI orchestration. This tiered approach ensures that while the "brains" are working hard in the background, the user interface remains responsive, fluid, and "Visual Zero" (Standard #7).

4. The AI Core: Model-Agnostic Intelligence

scale2rev is built to be "future-proof." While we leverage current state-of-the-art models via Google Vertex AI, our orchestration layer is designed to flex as newer, more capable models emerge.

Gemini-2.0-flash:Our current "Workhorse" for fast reasoning. It powers the real-time search synthesis and the high-speed "Turbo-Pulse" ingestion heartbeat.

Gemini-2.0-pro:Our "Strategic Architect" model. It is deployed for high-complexity tasks such as deep RFP drafting and multi-document gap analysis.

Dynamic Routing Layer: In accordance with Standard #5, our model IDs are not hardcoded. We can shift the entire platform to a newer model version via a single entry in SystemConfig.

5. Supporting Infrastructure: The Reliability Grid

The scale2rev "Grid" is designed for 100% durability and forensic auditability:

Upstash (Redis): Acts as our "High-Speed Memory," managing low-latency rate limiting and session caching. It prevents database fatigue during high-concurrent sales cycles.
Google Cloud Storage (GCS): The primary vault for raw document binary data. All files are versioned and stored with SHA-256 hashes (Standard #3), ensuring a document can never be overwritten or lost.
QStash: Our asynchronous task orchestrator. It manages the lifecycle of long-running ingestion jobs, ensuring that every background worker is idempotent and capable of self-healing.

6. Mathematical Integrity: Normalized Similarity & Health

We convert raw infrastructure outputs into decision-grade signals for sales and engineering leaders.

The Normalization Logic:Raw cosine similarity values are often abstract. scale2rev normalizes these into a 0–100% "Match Badge." This allows a sales rep to immediately see the "Truth Confidence" of a generated answer.

Vector Health Diagnostic:We maintain a dedicated API (/api/admin/repair/vector-health) that performs periodic audits of our embedding integrity. It checks for "Orphaned Vectors" or stale embeddings, ensuring that the mathematical representation of your company's knowledge remains 100% accurate.

Related White Papers

For identity sovereignty and MFA, see Identity & Sovereignty. For the engineering standards that govern this stack, see Development Philosophy.