Enterprise · Technical Reference

Hardware & Software
Requirements.

Deployment reference for the EthiCompass Enterprise platform. On-premise, private cloud, and hybrid configurations for regulated enterprises. Structured sizing for every scenario, from minimum viable deployment to high-concurrency production.

CISOsHeads of PlatformInfrastructure LeadsProcurement

This page is a technical reference. For commercial discussion, implementation timelines, and engagement scope, contact our team.

Back to Enterprise

Deployment Modes

Three paths. One platform.
Data sovereignty preserved.

Enterprise supports three deployment modes. Each preserves the immutable audit trail, the 7-dimension evaluation framework, and the EU AI Act Articles 9–15 mapping. The difference is where the inference workload runs and what infrastructure footprint you accept.

On-Premise

Deployed inside your data center. All inference runs locally on dedicated GPUs. No outbound calls. Suitable for the strictest data residency and regulatory requirements, including sectors with absolute on-premise mandates.

Private Cloud

Deployed in your AWS, Azure, or GCP tenancy. Inference runs on managed GPU instances within your network boundary. Data remains in your cloud account. Suitable for enterprises with approved cloud regions and residency controls.

Hybrid

Local Embeddings + Cloud Inference

Embeddings run locally on a small GPU. The two larger models are served through a regulated inference provider over private endpoints. Lower hardware footprint, faster time to deployment. Suitable for pilot deployments or environments with limited local GPU capacity.

Deployment mode is a decision made during the Architecture Assessment phase of implementation. All modes produce the same compliance evidence and the same immutable audit trail.

Reference Architecture

What Enterprise Runs

The Enterprise platform is composed of infrastructure services and evaluation models. The infrastructure services manage data storage, versioning, messaging, and the application layer. The evaluation models produce the quantitative scores behind the 7-dimension framework.

Infrastructure Services

Component	Role
MinIO	S3-compatible object storage for datasets and evidence artifacts
LakeFS	Versioned dataset management with full lineage
Apache Kafka + Zookeeper	Event streaming for evaluation pipelines and audit events
API Orchestrator (FastAPI)	Coordinates evaluation requests across dimensions
Metrics Service (FastAPI)	Aggregates scores and produces the compliance scorecards

Evaluation Models

Model	Role in the 7-Dimension Framework	Served Via
Qwen3-Embedding (600M parameters)	Embeddings for Toxicity and Regulatory Compliance dimensions	vLLM (embedding task)
Gemma 4 E4B (4.7B parameters)	Judge model for the Context dimension (LLM-as-Judge)	vLLM
Llama Guard 4 (12B parameters)	Guardian model for the Bias dimension	vLLM

All models are served through vLLM with paged attention, continuous batching, and efficient KV cache management. The embedding model is shared across two dimensions and runs as a single instance.

Configuration Profiles

Minimum · Recommended · Production

Three reference configurations. Each is validated for a specific deployment posture. Select the profile that matches your evaluation concurrency, residency requirements, and regulatory commitments.

Resource	MinimumINT4, sequential	RecommendedINT4, parallel	Productionbf16, high concurrency
GPU	1 × 16–24 GB VRAM (e.g., RTX 3090, RTX 4080, RTX 4090)	1 × 24 GB VRAM (e.g., RTX 4090, A5000)	1 × 80 GB VRAM (A100) or 2 × 48 GB (A6000)
System RAM	32 GB DDR4/DDR5	64 GB DDR5	128 GB DDR5 ECC
CPU	14 cores	20 cores	32 cores
Storage	256 GB NVMe SSD	512 GB NVMe SSD	1 TB NVMe SSD
Network	1 Gbps	1 Gbps	10 Gbps
Use case	Proof of concept, low-volume evaluation	Standard enterprise deployment	Multi-tenant, high-concurrency, real-time monitoring

All profiles assume a single-node deployment. Multi-node clustering is available for production environments with failover requirements. Discuss during Architecture Assessment.

GPU & Inference Sizing

Three inference scenarios.
Three VRAM profiles.

The three evaluation models have materially different memory profiles. The deployment can run in full precision, INT8, or INT4. Each scenario is a tradeoff between VRAM footprint, throughput, and evaluation latency. All three scenarios produce scores inside the operational tolerance of the 7-dimension framework.

VRAM by Scenario

Scenario	Precision	Qwen3 (0.6B)	Gemma 4 (4.7B)	Llama Guard 4 (12B)	Total VRAM
A — Full precision	bf16	~2.5 GB	~12.0 GB	~27.7 GB	~42.2 GB
B — INT4 quantized	embeddings bf16, models INT4	~2.5 GB	~5.0 GB	~9.7 GB	~17.2 GB
C — INT8 quantized	embeddings bf16, models INT8	~2.5 GB	~7.0 GB	~15.7 GB	~25.2 GB

Recommended GPUs per Scenario

Scenario	Single-GPU Option	Multi-GPU Option
A (bf16)	1 × NVIDIA A100 80 GB	2 × NVIDIA A6000 48 GB
B (INT4)	1 × NVIDIA RTX 4090 24 GB	1 × NVIDIA A5000 24 GB
C (INT8)	1 × NVIDIA L40 48 GB	2 × NVIDIA RTX 4080 16 GB

Concurrency Planning

All sizing above assumes single-request baseline per model. vLLM handles multiple concurrent requests through continuous batching, which grows the KV cache pool proportionally.

—Each active request adds ~50–100 MB of KV cache on Gemma 4
—Each active request adds ~150–250 MB of KV cache on Llama Guard 4
—Budget an additional ~2–3 GB of VRAM per 8 concurrent requests

Sequential Execution Option

If evaluations run sequentially (one model active at a time), the inactive model can be offloaded from GPU memory. The platform then only requires the largest model plus the embeddings:

—bf16: ~30.2 GB (Llama Guard 4 + embeddings)
—INT8: ~18.2 GB (Llama Guard 4 INT8 + embeddings)
—INT4: ~12.2 GB (Llama Guard 4 INT4 + embeddings)

Sequential execution reduces VRAM but increases end-to-end evaluation latency. Suitable for audit-style evaluation or low-volume Enterprise pilots. Not recommended for real-time monitoring of production AI systems.

Infrastructure Services

RAM, CPU, and storage for the
non-GPU components.

The infrastructure services run on CPU and system RAM. They are independent of the GPU workload and can be co-located on the same node as the evaluation models or distributed across dedicated hosts.

Component	RAM (min)	RAM (rec)	CPU (min)	CPU (rec)	Storage
MinIO	2 GB	4 GB	2 cores	4 cores	50 GB NVMe SSD min, variable
LakeFS	1 GB	2 GB	1 core	2 cores	10 GB for metadata
LakeFS PostgreSQL	1 GB	—	—	—	Included with LakeFS
Apache Kafka	2 GB	4 GB	2 cores	4 cores	20 GB for message logs
Zookeeper	512 MB	1 GB	—	—	Included with Kafka
API Orchestrator (FastAPI)	256 MB	512 MB	1 core	1 core	—
Metrics Service (FastAPI)	512 MB	1 GB	2 cores	2 cores	—

MinIO is I/O-intensive. NVMe SSD is required for any deployment handling datasets larger than 10 GB. Direct-attached storage is preferred over network-attached storage for the MinIO volume.

Storage, Network & Operating System

Baseline footprint for a
single-node deployment.

Storage — Total Footprint

Item	Allocation
Operating system and Docker runtime	30 GB
Docker images (services)	15 GB
Evaluation model weights (from Hugging Face)	~35 GB
MinIO dataset storage	50 GB minimum, variable with use
Kafka logs	20 GB
LakeFS metadata	10 GB
Total storage	160 GB minimum, NVMe SSD

Network

—1 Gbps minimum for minimum and recommended profiles
—10 Gbps recommended for production profile
—HTTPS egress for initial model weight download (or internal mirror if egress is restricted)

Operating System & Runtime

—Linux x86_64 (Ubuntu 22.04 LTS or equivalent RHEL-family)
—Docker 24.x or containerd 1.7+
—NVIDIA Container Toolkit for GPU passthrough
—Kubernetes 1.28+ supported for orchestrated deployments

Security Controls

—Air-gapped environments with pre-mirrored model registry
—SSO/LDAP integration for platform access
—SIEM-compatible audit log streaming
—Full-disk encryption required for immutable audit trail retention

Hybrid Deployment

When local GPU capacity
is constrained.

For pilot deployments or environments without local GPU capacity, Enterprise supports a hybrid configuration. The embedding model runs locally on a small GPU. The two larger evaluation models (Gemma 4 and Llama Guard 4) are served through a regulated inference provider over private endpoints.

This configuration preserves the 7-dimension framework and the immutable audit trail. The tradeoff is an operational dependency on the inference provider and per-request inference costs instead of a one-time hardware investment.

Hybrid Configuration Requirements

Resource	Requirement
GPU	Any GPU with 4+ GB VRAM (e.g., NVIDIA T4, RTX 3060)
System RAM	16 GB
CPU	8 cores
Storage	160 GB NVMe SSD
Inference provider	Regulated provider with private endpoints and EU-region routing

When to Choose Hybrid

—Pilot deployments where local GPU procurement is not yet approved
—Environments where data residency permits regulated third-party inference
—Organizations optimizing for rapid deployment over long-term unit economics

When to Choose On-Premise or Private Cloud

—Strict data residency mandates that prohibit third-party inference
—Sectors where all AI inference must run inside the regulated environment
—High-volume production workloads where per-request costs exceed amortized hardware cost

Next Steps

From reference to deployment.

This page is a reference, not a procurement checklist. Every deployment is calibrated to the organization's regulatory posture, AI system inventory, and existing infrastructure. The Architecture Assessment — part of the standard Enterprise implementation — turns this reference into a concrete deployment plan.

Request an Architecture Review

A focused technical session with our platform team. We review your environment, your residency constraints, and your expected evaluation volume. You leave with a sized deployment plan.

Talk to Our Team

If you are earlier in evaluation, our team can walk through the Enterprise product, the 7-dimension framework, and deployment options in a 30-minute conversation.

Return to Enterprise

For the full product overview, differentiators, and implementation timeline, return to the Enterprise product page.

Back to Enterprise

Last updated: April 2026

Hardware & Software
Requirements.

Three paths. One platform.
Data sovereignty preserved.

On-Premise

Private Cloud

Hybrid

What Enterprise Runs

Infrastructure Services

Evaluation Models

Minimum · Recommended · Production

Three inference scenarios.
Three VRAM profiles.

VRAM by Scenario

Recommended GPUs per Scenario

Concurrency Planning

Sequential Execution Option

RAM, CPU, and storage for the
non-GPU components.

Baseline footprint for a
single-node deployment.

Storage — Total Footprint

Network

Operating System & Runtime

Security Controls

When local GPU capacity
is constrained.

Hybrid Configuration Requirements

When to Choose Hybrid

When to Choose On-Premise or Private Cloud

From reference to deployment.

Request an Architecture Review

Talk to Our Team

Return to Enterprise

Products

Platform

Industries

Company

Legal

Hardware & SoftwareRequirements.

Three paths. One platform.Data sovereignty preserved.

On-Premise

Private Cloud

Hybrid

What Enterprise Runs

Infrastructure Services

Evaluation Models

Minimum · Recommended · Production

Three inference scenarios.Three VRAM profiles.

VRAM by Scenario

Recommended GPUs per Scenario

Concurrency Planning

Sequential Execution Option

RAM, CPU, and storage for thenon-GPU components.

Baseline footprint for asingle-node deployment.

Storage — Total Footprint

Network

Operating System & Runtime

Security Controls

When local GPU capacityis constrained.

Hybrid Configuration Requirements

When to Choose Hybrid

When to Choose On-Premise or Private Cloud

From reference to deployment.

Request an Architecture Review

Talk to Our Team

Return to Enterprise

Hardware & Software
Requirements.

Three paths. One platform.
Data sovereignty preserved.

Three inference scenarios.
Three VRAM profiles.

RAM, CPU, and storage for the
non-GPU components.

Baseline footprint for a
single-node deployment.

When local GPU capacity
is constrained.