docs: fix blockers and warnings in SaaS design spec

Fixes from spec review:
- BLOCKER: JWT payload migration (schemaName → databaseName)
- BLOCKER: FIEL encryption key separation from JWT_SECRET
- BLOCKER: PM2 cluster pool count (max:3 × 2 workers = 6/tenant)
- BLOCKER: Pending subscription grace period for new clients
- WARNING: Add indexes on subscriptions/payments tables
- WARNING: Fix Nginx rate limit zone definitions
- WARNING: Fix backup auth (.pgpass), retention, and schedule
- WARNING: Preserve admin X-View-Tenant impersonation
- WARNING: Encrypt metadata.json for NDA compliance
- SUGGESTION: Add health check, reduce upload limit, add rollback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Consultoria AS
2026-03-15 22:50:38 +00:00
parent c44e7cea34
commit 3c9268ea30

View File

@@ -38,13 +38,29 @@ PostgreSQL Server (max_connections: 300)
Existing tables (modified):
- `tenants` — add `database_name` column, remove `schema_name`
- `users` — no changes
- `refresh_tokens`no changes
- `refresh_tokens`flush all existing tokens at migration cutover (invalidate all sessions)
- `fiel_credentials` — no changes
New tables:
- `subscriptions` — MercadoPago subscription tracking
- `payments` — payment history
### Prisma schema migration
The Prisma schema (`apps/api/prisma/schema.prisma`) must be updated:
- Replace `schema_name String @unique @map("schema_name")` with `database_name String @unique @map("database_name")` on the `Tenant` model
- Add `Subscription` and `Payment` models
- Run `prisma migrate dev` to generate and apply migration
- Update `Tenant` type in `packages/shared/src/types/tenant.ts`: replace `schemaName` with `databaseName`
### JWT payload migration
The current JWT payload embeds `schemaName`. This must change:
- Update `JWTPayload` in `packages/shared/src/types/auth.ts`: replace `schemaName` with `databaseName`
- Update token generation in `auth.service.ts`: read `tenant.databaseName` instead of `tenant.schemaName`
- Update `refreshTokens` function to embed `databaseName`
- At migration cutover: flush `refresh_tokens` table to invalidate all existing sessions (forces re-login)
### Client DB naming
Formula: `horux_<rfc_normalized>`
@@ -85,10 +101,12 @@ class TenantConnectionManager {
```
Pool configuration per tenant:
- `max`: 5 connections
- `max`: 3 connections (with 2 PM2 cluster instances, this means 6 connections/tenant max; at 50 tenants = 300, matching `max_connections`)
- `idleTimeoutMillis`: 300000 (5 min)
- `connectionTimeoutMillis`: 10000 (10 sec)
**Note on PM2 cluster mode:** Each PM2 worker is a separate Node.js process with its own `TenantConnectionManager` instance. With `instances: 2` and `max: 3` per pool, worst case is 50 tenants × 3 connections × 2 workers = 300 connections, which matches `max_connections = 300`. If scaling beyond 50 tenants, either increase `max_connections` or reduce pool `max` to 2.
### Tenant middleware change
Current: Sets `search_path` on a shared connection.
@@ -105,6 +123,20 @@ req.tenantPool = tenantConnectionManager.getPool(tenant.id, tenant.databaseName)
All tenant service functions change from using a shared pool with schema prefix to using `req.tenantPool` with direct table names.
### Admin impersonation (X-View-Tenant)
The current `X-View-Tenant` header support for admin "view-as" functionality is preserved. The new middleware resolves the `databaseName` for the viewed tenant:
```typescript
// If admin is viewing another tenant
if (req.headers['x-view-tenant'] && req.user.role === 'admin') {
const viewedTenant = await getTenantByRfc(req.headers['x-view-tenant']);
req.tenantPool = tenantConnectionManager.getPool(viewedTenant.id, viewedTenant.databaseName);
} else {
req.tenantPool = tenantConnectionManager.getPool(tenant.id, tenant.databaseName);
}
```
### Provisioning flow (new client)
1. Admin creates tenant via UI → POST `/api/tenants/`
@@ -115,6 +147,13 @@ All tenant service functions change from using a shared pool with schema prefix
6. Send welcome email with temporary credentials
7. Generate MercadoPago subscription link
**Rollback on partial failure:** If any step 3-7 fails:
- Drop the created database if it exists (`DROP DATABASE IF EXISTS horux_<rfc>`)
- Delete the `tenants` row
- Delete the `users` row if created
- Return error to admin with the specific step that failed
- The entire provisioning is wrapped in a try/catch with explicit cleanup
### PostgreSQL tuning
```
@@ -162,7 +201,9 @@ When a client uploads their FIEL (.cer + .key + password):
- Algorithm: AES-256-GCM
- Key: `FIEL_ENCRYPTION_KEY` environment variable (separate from other secrets)
- Each file gets its own IV (initialization vector)
- **Code change required:** `sat-crypto.service.ts` currently derives the key from `JWT_SECRET` via `createHash('sha256').update(env.JWT_SECRET).digest()`. This must be changed to read `FIEL_ENCRYPTION_KEY` from the env schema. The `env.ts` Zod schema must be updated to declare `FIEL_ENCRYPTION_KEY` as required.
- Each component (certificate, private key, password) is encrypted separately with its own IV and auth tag. The `fiel_credentials` table stores separate `encryption_iv` and `encryption_tag` per row. The filesystem also stores each file independently encrypted.
- **Code change required:** The current `sat-crypto.service.ts` shares a single IV/tag across all three components. Refactor to encrypt each component independently with its own IV/tag. Store per-component IV/tags in the DB (add columns: `cer_iv`, `cer_tag`, `key_iv`, `key_tag`, `password_iv`, `password_tag` — or use a JSON column).
- Password is encrypted, never stored in plaintext
### Manual decryption CLI
@@ -179,7 +220,7 @@ node scripts/decrypt-fiel.js --rfc CAS2408138W2
- `/var/horux/fiel/` permissions: `700` (root only)
- Encrypted files are useless without `FIEL_ENCRYPTION_KEY`
- `metadata.json` is NOT encrypted (contains only non-sensitive info: serial number, validity dates)
- `metadata.json` is also encrypted (contains serial number + RFC which could be used to query SAT's certificate validation service, violating NDA confidentiality requirements)
### Upload flow
@@ -216,6 +257,9 @@ CREATE TABLE subscriptions (
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_subscriptions_tenant_id ON subscriptions(tenant_id);
CREATE INDEX idx_subscriptions_status ON subscriptions(status);
CREATE TABLE payments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
@@ -228,6 +272,9 @@ CREATE TABLE payments (
paid_at TIMESTAMP,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_payments_tenant_id ON payments(tenant_id);
CREATE INDEX idx_payments_subscription_id ON payments(subscription_id);
```
### Plans and pricing
@@ -398,6 +445,10 @@ Auto-restart on crash. Log rotation via `pm2-logrotate`.
### Nginx reverse proxy
```nginx
# Rate limiting zone definitions (in http block of nginx.conf)
limit_req_zone $binary_remote_addr zone=auth:10m rate=5r/m;
limit_req_zone $binary_remote_addr zone=webhooks:10m rate=30r/m;
server {
listen 80;
server_name horux360.consultoria-as.com;
@@ -420,6 +471,11 @@ server {
gzip on;
gzip_types text/plain application/json application/javascript text/css;
# Health check (for monitoring)
location /api/health {
proxy_pass http://127.0.0.1:4000;
}
# Rate limiting for public endpoints
location /api/auth/ {
limit_req zone=auth burst=5 nodelay;
@@ -438,7 +494,7 @@ server {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 1G; # For bulk XML uploads
client_max_body_size 200M; # Bulk XML uploads (200MB is enough for ~50k XML files)
}
# Next.js
@@ -452,6 +508,10 @@ server {
}
```
### Health check endpoint
The existing `GET /health` endpoint returns `{ status: 'ok', timestamp }`. PM2 uses this for liveness checks. Nginx can optionally use it for upstream health monitoring.
### SSL
Let's Encrypt with certbot. Auto-renewal via cron.
@@ -473,29 +533,53 @@ PostgreSQL only on localhost (no external access).
### Backups
Cron job at 2:00 AM daily:
Cron job at **1:00 AM** daily (runs before SAT cron at 3:00 AM, with enough gap to complete):
**Authentication:** Create a `.pgpass` file at `/root/.pgpass` with `localhost:5432:*:postgres:<password>` and `chmod 600`. This allows `pg_dump` to authenticate without inline passwords.
```bash
#!/bin/bash
# /var/horux/scripts/backup.sh
set -euo pipefail
BACKUP_DIR=/var/horux/backups
DATE=$(date +%Y-%m-%d)
DOW=$(date +%u) # Day of week: 1=Monday, 7=Sunday
DAILY_DIR=$BACKUP_DIR/daily
WEEKLY_DIR=$BACKUP_DIR/weekly
mkdir -p $DAILY_DIR $WEEKLY_DIR
# Backup central DB
pg_dump -h localhost -U postgres horux360 | gzip > $BACKUP_DIR/horux360_$DATE.sql.gz
pg_dump -h localhost -U postgres horux360 | gzip > $DAILY_DIR/horux360_$DATE.sql.gz
# Backup each tenant DB
for db in $(psql -h localhost -U postgres -t -c "SELECT database_name FROM tenants WHERE active = true"); do
pg_dump -h localhost -U postgres $db | gzip > $BACKUP_DIR/${db}_${DATE}.sql.gz
for db in $(psql -h localhost -U postgres -t -c "SELECT database_name FROM tenants WHERE active = true" horux360); do
db_trimmed=$(echo $db | xargs) # trim whitespace
pg_dump -h localhost -U postgres "$db_trimmed" | gzip > $DAILY_DIR/${db_trimmed}_${DATE}.sql.gz
done
# Remove backups older than 7 days
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete
# On Sundays, copy to weekly directory
if [ "$DOW" -eq 7 ]; then
cp $DAILY_DIR/*_${DATE}.sql.gz $WEEKLY_DIR/
fi
# Keep weekly backups (Sundays) for 4 weeks
# (daily cleanup skips files from Sundays in the last 28 days)
# Remove daily backups older than 7 days
find $DAILY_DIR -name "*.sql.gz" -mtime +7 -delete
# Remove weekly backups older than 28 days
find $WEEKLY_DIR -name "*.sql.gz" -mtime +28 -delete
# Verify backup files are not empty (catch silent pg_dump failures)
for f in $DAILY_DIR/*_${DATE}.sql.gz; do
if [ ! -s "$f" ]; then
echo "WARNING: Empty backup file: $f" >&2
fi
done
```
**Schedule separation:** Backups run at 1:00 AM, SAT cron runs at 3:00 AM. With 50 clients, backup should complete in ~15-30 minutes, leaving ample gap before SAT sync starts.
### Environment variables (production)
```
@@ -534,9 +618,12 @@ async function checkPlanLimits(req, res, next) {
const tenant = await getTenantWithCache(req.user.tenantId); // cached 5 min
const subscription = await getActiveSubscription(tenant.id);
// Check subscription is active
if (!subscription || subscription.status !== 'authorized') {
// Allow read-only access
// Allowed statuses: 'authorized' (paid) or 'pending' (grace period for new clients)
const allowedStatuses = ['authorized', 'pending'];
// Check subscription status
if (!subscription || !allowedStatuses.includes(subscription.status)) {
// Allow read-only access for cancelled/paused subscriptions
if (req.method !== 'GET') {
return res.status(403).json({
message: 'Suscripción inactiva. Contacta soporte para reactivar.'
@@ -544,10 +631,18 @@ async function checkPlanLimits(req, res, next) {
}
}
// Admin-impersonated requests bypass subscription check
// (admin needs to complete client setup regardless of payment status)
if (req.headers['x-view-tenant'] && req.user.role === 'admin') {
return next();
}
next();
}
```
**Grace period:** New clients start with `status: 'pending'` and have full write access (can upload FIEL, upload CFDIs, etc.). Once the subscription moves to `'cancelled'` or `'paused'` (e.g., failed payment), write access is revoked. Admin can also manually set status to `'authorized'` for clients who pay by bank transfer.
### CFDI limit check
Applied on `POST /api/cfdi/` and `POST /api/cfdi/bulk`: