docs: fix blockers and warnings in SaaS design spec

Fixes from spec review:
- BLOCKER: JWT payload migration (schemaName → databaseName)
- BLOCKER: FIEL encryption key separation from JWT_SECRET
- BLOCKER: PM2 cluster pool count (max:3 × 2 workers = 6/tenant)
- BLOCKER: Pending subscription grace period for new clients
- WARNING: Add indexes on subscriptions/payments tables
- WARNING: Fix Nginx rate limit zone definitions
- WARNING: Fix backup auth (.pgpass), retention, and schedule
- WARNING: Preserve admin X-View-Tenant impersonation
- WARNING: Encrypt metadata.json for NDA compliance
- SUGGESTION: Add health check, reduce upload limit, add rollback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Consultoria AS
2026-03-15 22:50:38 +00:00
parent c44e7cea34
commit 3c9268ea30

View File

@@ -38,13 +38,29 @@ PostgreSQL Server (max_connections: 300)
Existing tables (modified): Existing tables (modified):
- `tenants` — add `database_name` column, remove `schema_name` - `tenants` — add `database_name` column, remove `schema_name`
- `users` — no changes - `users` — no changes
- `refresh_tokens`no changes - `refresh_tokens`flush all existing tokens at migration cutover (invalidate all sessions)
- `fiel_credentials` — no changes - `fiel_credentials` — no changes
New tables: New tables:
- `subscriptions` — MercadoPago subscription tracking - `subscriptions` — MercadoPago subscription tracking
- `payments` — payment history - `payments` — payment history
### Prisma schema migration
The Prisma schema (`apps/api/prisma/schema.prisma`) must be updated:
- Replace `schema_name String @unique @map("schema_name")` with `database_name String @unique @map("database_name")` on the `Tenant` model
- Add `Subscription` and `Payment` models
- Run `prisma migrate dev` to generate and apply migration
- Update `Tenant` type in `packages/shared/src/types/tenant.ts`: replace `schemaName` with `databaseName`
### JWT payload migration
The current JWT payload embeds `schemaName`. This must change:
- Update `JWTPayload` in `packages/shared/src/types/auth.ts`: replace `schemaName` with `databaseName`
- Update token generation in `auth.service.ts`: read `tenant.databaseName` instead of `tenant.schemaName`
- Update `refreshTokens` function to embed `databaseName`
- At migration cutover: flush `refresh_tokens` table to invalidate all existing sessions (forces re-login)
### Client DB naming ### Client DB naming
Formula: `horux_<rfc_normalized>` Formula: `horux_<rfc_normalized>`
@@ -85,10 +101,12 @@ class TenantConnectionManager {
``` ```
Pool configuration per tenant: Pool configuration per tenant:
- `max`: 5 connections - `max`: 3 connections (with 2 PM2 cluster instances, this means 6 connections/tenant max; at 50 tenants = 300, matching `max_connections`)
- `idleTimeoutMillis`: 300000 (5 min) - `idleTimeoutMillis`: 300000 (5 min)
- `connectionTimeoutMillis`: 10000 (10 sec) - `connectionTimeoutMillis`: 10000 (10 sec)
**Note on PM2 cluster mode:** Each PM2 worker is a separate Node.js process with its own `TenantConnectionManager` instance. With `instances: 2` and `max: 3` per pool, worst case is 50 tenants × 3 connections × 2 workers = 300 connections, which matches `max_connections = 300`. If scaling beyond 50 tenants, either increase `max_connections` or reduce pool `max` to 2.
### Tenant middleware change ### Tenant middleware change
Current: Sets `search_path` on a shared connection. Current: Sets `search_path` on a shared connection.
@@ -105,6 +123,20 @@ req.tenantPool = tenantConnectionManager.getPool(tenant.id, tenant.databaseName)
All tenant service functions change from using a shared pool with schema prefix to using `req.tenantPool` with direct table names. All tenant service functions change from using a shared pool with schema prefix to using `req.tenantPool` with direct table names.
### Admin impersonation (X-View-Tenant)
The current `X-View-Tenant` header support for admin "view-as" functionality is preserved. The new middleware resolves the `databaseName` for the viewed tenant:
```typescript
// If admin is viewing another tenant
if (req.headers['x-view-tenant'] && req.user.role === 'admin') {
const viewedTenant = await getTenantByRfc(req.headers['x-view-tenant']);
req.tenantPool = tenantConnectionManager.getPool(viewedTenant.id, viewedTenant.databaseName);
} else {
req.tenantPool = tenantConnectionManager.getPool(tenant.id, tenant.databaseName);
}
```
### Provisioning flow (new client) ### Provisioning flow (new client)
1. Admin creates tenant via UI → POST `/api/tenants/` 1. Admin creates tenant via UI → POST `/api/tenants/`
@@ -115,6 +147,13 @@ All tenant service functions change from using a shared pool with schema prefix
6. Send welcome email with temporary credentials 6. Send welcome email with temporary credentials
7. Generate MercadoPago subscription link 7. Generate MercadoPago subscription link
**Rollback on partial failure:** If any step 3-7 fails:
- Drop the created database if it exists (`DROP DATABASE IF EXISTS horux_<rfc>`)
- Delete the `tenants` row
- Delete the `users` row if created
- Return error to admin with the specific step that failed
- The entire provisioning is wrapped in a try/catch with explicit cleanup
### PostgreSQL tuning ### PostgreSQL tuning
``` ```
@@ -162,7 +201,9 @@ When a client uploads their FIEL (.cer + .key + password):
- Algorithm: AES-256-GCM - Algorithm: AES-256-GCM
- Key: `FIEL_ENCRYPTION_KEY` environment variable (separate from other secrets) - Key: `FIEL_ENCRYPTION_KEY` environment variable (separate from other secrets)
- Each file gets its own IV (initialization vector) - **Code change required:** `sat-crypto.service.ts` currently derives the key from `JWT_SECRET` via `createHash('sha256').update(env.JWT_SECRET).digest()`. This must be changed to read `FIEL_ENCRYPTION_KEY` from the env schema. The `env.ts` Zod schema must be updated to declare `FIEL_ENCRYPTION_KEY` as required.
- Each component (certificate, private key, password) is encrypted separately with its own IV and auth tag. The `fiel_credentials` table stores separate `encryption_iv` and `encryption_tag` per row. The filesystem also stores each file independently encrypted.
- **Code change required:** The current `sat-crypto.service.ts` shares a single IV/tag across all three components. Refactor to encrypt each component independently with its own IV/tag. Store per-component IV/tags in the DB (add columns: `cer_iv`, `cer_tag`, `key_iv`, `key_tag`, `password_iv`, `password_tag` — or use a JSON column).
- Password is encrypted, never stored in plaintext - Password is encrypted, never stored in plaintext
### Manual decryption CLI ### Manual decryption CLI
@@ -179,7 +220,7 @@ node scripts/decrypt-fiel.js --rfc CAS2408138W2
- `/var/horux/fiel/` permissions: `700` (root only) - `/var/horux/fiel/` permissions: `700` (root only)
- Encrypted files are useless without `FIEL_ENCRYPTION_KEY` - Encrypted files are useless without `FIEL_ENCRYPTION_KEY`
- `metadata.json` is NOT encrypted (contains only non-sensitive info: serial number, validity dates) - `metadata.json` is also encrypted (contains serial number + RFC which could be used to query SAT's certificate validation service, violating NDA confidentiality requirements)
### Upload flow ### Upload flow
@@ -216,6 +257,9 @@ CREATE TABLE subscriptions (
updated_at TIMESTAMP NOT NULL DEFAULT NOW() updated_at TIMESTAMP NOT NULL DEFAULT NOW()
); );
CREATE INDEX idx_subscriptions_tenant_id ON subscriptions(tenant_id);
CREATE INDEX idx_subscriptions_status ON subscriptions(status);
CREATE TABLE payments ( CREATE TABLE payments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id), tenant_id UUID NOT NULL REFERENCES tenants(id),
@@ -228,6 +272,9 @@ CREATE TABLE payments (
paid_at TIMESTAMP, paid_at TIMESTAMP,
created_at TIMESTAMP NOT NULL DEFAULT NOW() created_at TIMESTAMP NOT NULL DEFAULT NOW()
); );
CREATE INDEX idx_payments_tenant_id ON payments(tenant_id);
CREATE INDEX idx_payments_subscription_id ON payments(subscription_id);
``` ```
### Plans and pricing ### Plans and pricing
@@ -398,6 +445,10 @@ Auto-restart on crash. Log rotation via `pm2-logrotate`.
### Nginx reverse proxy ### Nginx reverse proxy
```nginx ```nginx
# Rate limiting zone definitions (in http block of nginx.conf)
limit_req_zone $binary_remote_addr zone=auth:10m rate=5r/m;
limit_req_zone $binary_remote_addr zone=webhooks:10m rate=30r/m;
server { server {
listen 80; listen 80;
server_name horux360.consultoria-as.com; server_name horux360.consultoria-as.com;
@@ -420,6 +471,11 @@ server {
gzip on; gzip on;
gzip_types text/plain application/json application/javascript text/css; gzip_types text/plain application/json application/javascript text/css;
# Health check (for monitoring)
location /api/health {
proxy_pass http://127.0.0.1:4000;
}
# Rate limiting for public endpoints # Rate limiting for public endpoints
location /api/auth/ { location /api/auth/ {
limit_req zone=auth burst=5 nodelay; limit_req zone=auth burst=5 nodelay;
@@ -438,7 +494,7 @@ server {
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 1G; # For bulk XML uploads client_max_body_size 200M; # Bulk XML uploads (200MB is enough for ~50k XML files)
} }
# Next.js # Next.js
@@ -452,6 +508,10 @@ server {
} }
``` ```
### Health check endpoint
The existing `GET /health` endpoint returns `{ status: 'ok', timestamp }`. PM2 uses this for liveness checks. Nginx can optionally use it for upstream health monitoring.
### SSL ### SSL
Let's Encrypt with certbot. Auto-renewal via cron. Let's Encrypt with certbot. Auto-renewal via cron.
@@ -473,29 +533,53 @@ PostgreSQL only on localhost (no external access).
### Backups ### Backups
Cron job at 2:00 AM daily: Cron job at **1:00 AM** daily (runs before SAT cron at 3:00 AM, with enough gap to complete):
**Authentication:** Create a `.pgpass` file at `/root/.pgpass` with `localhost:5432:*:postgres:<password>` and `chmod 600`. This allows `pg_dump` to authenticate without inline passwords.
```bash ```bash
#!/bin/bash #!/bin/bash
# /var/horux/scripts/backup.sh # /var/horux/scripts/backup.sh
set -euo pipefail
BACKUP_DIR=/var/horux/backups BACKUP_DIR=/var/horux/backups
DATE=$(date +%Y-%m-%d) DATE=$(date +%Y-%m-%d)
DOW=$(date +%u) # Day of week: 1=Monday, 7=Sunday
DAILY_DIR=$BACKUP_DIR/daily
WEEKLY_DIR=$BACKUP_DIR/weekly
mkdir -p $DAILY_DIR $WEEKLY_DIR
# Backup central DB # Backup central DB
pg_dump -h localhost -U postgres horux360 | gzip > $BACKUP_DIR/horux360_$DATE.sql.gz pg_dump -h localhost -U postgres horux360 | gzip > $DAILY_DIR/horux360_$DATE.sql.gz
# Backup each tenant DB # Backup each tenant DB
for db in $(psql -h localhost -U postgres -t -c "SELECT database_name FROM tenants WHERE active = true"); do for db in $(psql -h localhost -U postgres -t -c "SELECT database_name FROM tenants WHERE active = true" horux360); do
pg_dump -h localhost -U postgres $db | gzip > $BACKUP_DIR/${db}_${DATE}.sql.gz db_trimmed=$(echo $db | xargs) # trim whitespace
pg_dump -h localhost -U postgres "$db_trimmed" | gzip > $DAILY_DIR/${db_trimmed}_${DATE}.sql.gz
done done
# Remove backups older than 7 days # On Sundays, copy to weekly directory
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete if [ "$DOW" -eq 7 ]; then
cp $DAILY_DIR/*_${DATE}.sql.gz $WEEKLY_DIR/
fi
# Keep weekly backups (Sundays) for 4 weeks # Remove daily backups older than 7 days
# (daily cleanup skips files from Sundays in the last 28 days) find $DAILY_DIR -name "*.sql.gz" -mtime +7 -delete
# Remove weekly backups older than 28 days
find $WEEKLY_DIR -name "*.sql.gz" -mtime +28 -delete
# Verify backup files are not empty (catch silent pg_dump failures)
for f in $DAILY_DIR/*_${DATE}.sql.gz; do
if [ ! -s "$f" ]; then
echo "WARNING: Empty backup file: $f" >&2
fi
done
``` ```
**Schedule separation:** Backups run at 1:00 AM, SAT cron runs at 3:00 AM. With 50 clients, backup should complete in ~15-30 minutes, leaving ample gap before SAT sync starts.
### Environment variables (production) ### Environment variables (production)
``` ```
@@ -534,9 +618,12 @@ async function checkPlanLimits(req, res, next) {
const tenant = await getTenantWithCache(req.user.tenantId); // cached 5 min const tenant = await getTenantWithCache(req.user.tenantId); // cached 5 min
const subscription = await getActiveSubscription(tenant.id); const subscription = await getActiveSubscription(tenant.id);
// Check subscription is active // Allowed statuses: 'authorized' (paid) or 'pending' (grace period for new clients)
if (!subscription || subscription.status !== 'authorized') { const allowedStatuses = ['authorized', 'pending'];
// Allow read-only access
// Check subscription status
if (!subscription || !allowedStatuses.includes(subscription.status)) {
// Allow read-only access for cancelled/paused subscriptions
if (req.method !== 'GET') { if (req.method !== 'GET') {
return res.status(403).json({ return res.status(403).json({
message: 'Suscripción inactiva. Contacta soporte para reactivar.' message: 'Suscripción inactiva. Contacta soporte para reactivar.'
@@ -544,10 +631,18 @@ async function checkPlanLimits(req, res, next) {
} }
} }
// Admin-impersonated requests bypass subscription check
// (admin needs to complete client setup regardless of payment status)
if (req.headers['x-view-tenant'] && req.user.role === 'admin') {
return next();
}
next(); next();
} }
``` ```
**Grace period:** New clients start with `status: 'pending'` and have full write access (can upload FIEL, upload CFDIs, etc.). Once the subscription moves to `'cancelled'` or `'paused'` (e.g., failed payment), write access is revoked. Admin can also manually set status to `'authorized'` for clients who pay by bank transfer.
### CFDI limit check ### CFDI limit check
Applied on `POST /api/cfdi/` and `POST /api/cfdi/bulk`: Applied on `POST /api/cfdi/` and `POST /api/cfdi/bulk`: