From: Vsevolod Stakhov Date: Tue, 20 Jan 2026 12:16:36 +0000 (+0000) Subject: Add GPU and vast.ai support for neural embedding service X-Git-Tag: 4.0.0~179^2~13 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=f4cfde49ec22fa4ce4bc9724b9391700e5ec80aa;p=thirdparty%2Frspamd.git Add GPU and vast.ai support for neural embedding service - Add Dockerfile.gpu for GPU-accelerated inference with PyTorch CUDA - Add requirements-gpu.txt with pinned versions for CUDA compatibility - Add vastai-launch.sh script for deploying on vast.ai cloud GPUs - Update README with GPU deployment instructions and model recommendations Default GPU model: intfloat/multilingual-e5-large (100+ languages including Russian) Tested on RTX 4090 with ~20-50ms latency per embedding. --- diff --git a/contrib/neural-embedding-service/Dockerfile.gpu b/contrib/neural-embedding-service/Dockerfile.gpu new file mode 100644 index 0000000000..2468f955e1 --- /dev/null +++ b/contrib/neural-embedding-service/Dockerfile.gpu @@ -0,0 +1,50 @@ +# Rspamd Neural Embedding Service - GPU Version +# +# GPU-optimized embedding service using sentence-transformers + CUDA +# +# Build: +# docker build -f Dockerfile.gpu -t rspamd-embedding-service:gpu . +# +# Run: +# docker run --gpus all -p 8080:8080 rspamd-embedding-service:gpu + +FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime + +# Build arguments - multilingual-e5-large recommended for GPU (100+ languages) +ARG EMBEDDING_MODEL="intfloat/multilingual-e5-large" + +# Environment +ENV PYTHONUNBUFFERED=1 \ + PYTHONDONTWRITEBYTECODE=1 \ + PIP_NO_CACHE_DIR=1 \ + PIP_DISABLE_PIP_VERSION_CHECK=1 \ + EMBEDDING_MODEL=${EMBEDDING_MODEL} \ + EMBEDDING_PORT=8080 \ + EMBEDDING_HOST=0.0.0.0 \ + EMBEDDING_DEVICE=cuda + +WORKDIR /app + +# Install Python dependencies for GPU +COPY requirements-gpu.txt . +RUN pip install --no-cache-dir -r requirements-gpu.txt + +# Pre-download model during build (recommended for vast.ai to avoid download on each run) +RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('${EMBEDDING_MODEL}')" + +# Copy application +COPY embedding_service.py . + +# Non-root user +RUN useradd -m -u 1000 embedding && chown -R embedding:embedding /app +USER embedding + +# Expose port +EXPOSE 8080 + +# Health check +HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ + CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" + +# Run with uvicorn +CMD ["python", "embedding_service.py"] diff --git a/contrib/neural-embedding-service/README.md b/contrib/neural-embedding-service/README.md index dacc513a95..fc566ec93b 100644 --- a/contrib/neural-embedding-service/README.md +++ b/contrib/neural-embedding-service/README.md @@ -180,6 +180,145 @@ EMBEDDING_MODEL="BAAI/bge-small-en-v1.5" python embedding_service.py - Consider increasing workers for parallel processing - Use batching for bulk operations +## GPU Deployment + +For higher throughput, you can run this service on a GPU. GPU inference is 10-50x faster than CPU. + +### Local GPU (Docker) + +```bash +# Build GPU image +docker build -f Dockerfile.gpu -t rspamd-embedding-service:gpu . + +# Run with GPU access +docker run --gpus all -p 8080:8080 rspamd-embedding-service:gpu + +# With larger model (GPU has more memory) +docker run --gpus all -p 8080:8080 \ + -e EMBEDDING_MODEL="BAAI/bge-large-en-v1.5" \ + rspamd-embedding-service:gpu +``` + +### Vast.ai Cloud GPU + +[Vast.ai](https://vast.ai) provides affordable GPU rentals ($0.10-0.50/hr). This is useful for: +- Testing GPU performance before buying hardware +- Burst capacity during high-volume periods +- Running larger models that need more VRAM + +#### Quick Start + +```bash +# Install vast.ai CLI +pip install vastai + +# Set your API key (get from https://vast.ai/console/account/) +vastai set api-key YOUR_API_KEY + +# Search for available GPUs +./vastai-launch.sh --search-only + +# Launch an instance +./vastai-launch.sh --model "BAAI/bge-small-en-v1.5" --gpu RTX_3090 +``` + +#### Launch Script Options + +```bash +./vastai-launch.sh [options] + +Options: + --model MODEL Embedding model (default: BAAI/bge-small-en-v1.5) + --gpu GPU_TYPE GPU type filter (default: RTX_3090) + --max-price MAX Maximum $/hr (default: 0.30) + --disk DISK_GB Disk space in GB (default: 20) + --search-only Only search for instances, don't launch + --show-url ID Show service URL for a running instance +``` + +#### Getting the Service URL + +After launching, get your service URL: + +```bash +# Option 1: Use the helper +./vastai-launch.sh --show-url + +# Option 2: Manual lookup +vastai show instance +# Look for: 8080/tcp -> 0.0.0.0:XXXXX +# Your URL is: http://:XXXXX +``` + +**Important:** The SSH port (22) is NOT your service port. Look for port 8080's mapping. + +#### Manual Vast.ai Setup + +1. Go to [vast.ai/console/create](https://vast.ai/console/create/) +2. Select a GPU instance (RTX 3090 or better recommended) +3. Choose `pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime` as the image +4. In the on-start script, add: + +```bash +pip install uv +uv pip install --system "numpy<2" "transformers==4.40.0" "sentence-transformers==2.7.0" fastapi uvicorn pydantic +# Copy embedding_service.py to /root/ +EMBEDDING_MODEL="intfloat/multilingual-e5-large" python /root/embedding_service.py +``` + +5. After the instance starts, find your service URL: + ```bash + # List your instances + vastai show instances + + # Get instance details (replace ID with your instance ID) + vastai show instance + + # Look for port mapping like: 8080/tcp -> 0.0.0.0:41234 + # Your service URL is: http://:41234 + ``` + +6. Configure Rspamd to use `http://:/api/embeddings` + +**Note:** Vast.ai maps container ports to random high ports. The SSH port (usually 22) is different from your service port (8080 mapped to something like 41234). + +#### Recommended GPU Instances + +| GPU | VRAM | Price | Use Case | +|-----|------|-------|----------| +| RTX 3090 | 24GB | $0.15-0.30/hr | Best value, handles all models | +| RTX 4090 | 24GB | $0.40-0.60/hr | Faster inference | +| A100 | 40-80GB | $1.00-2.00/hr | Very large models, batch processing | + +#### Cost Estimation + +| Volume | GPU Cost | Notes | +|--------|----------|-------| +| 10K emails/day | ~$3-7/month | RTX 3090, shared instance | +| 100K emails/day | ~$20-50/month | Dedicated RTX 3090 | +| 1M emails/day | ~$150-300/month | Multiple GPUs or A100 | + +### GPU Requirements + +| Model | VRAM | Dims | Notes | +|-------|------|------|-------| +| `intfloat/multilingual-e5-large` | 2GB | 1024 | **Recommended** - 100+ languages, excellent Russian | +| `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 1GB | 768 | Good multilingual, smaller | +| `BAAI/bge-base-en-v1.5` | 1GB | 768 | English only, fast | +| `BAAI/bge-large-en-v1.5` | 2GB | 1024 | English only, high quality | + +### Multilingual Models (Recommended for GPU) + +For multilingual support including Russian, use `intfloat/multilingual-e5-large`: +- 1024-dim embeddings +- Supports 100+ languages with excellent Russian performance +- State-of-the-art on multilingual benchmarks + +```bash +# Use multilingual-e5-large (default for vast.ai script) +./vastai-launch.sh --model "intfloat/multilingual-e5-large" +``` + ## License Apache License 2.0 diff --git a/contrib/neural-embedding-service/docker-compose.yml b/contrib/neural-embedding-service/docker-compose.yml new file mode 100644 index 0000000000..0d3a335306 --- /dev/null +++ b/contrib/neural-embedding-service/docker-compose.yml @@ -0,0 +1,47 @@ +# Docker Compose for Rspamd Neural Embedding Service Testing +# +# Usage: +# cd contrib/neural-embedding-service +# docker compose up -d +# +# Test: +# curl http://localhost:8080/health +# curl http://localhost:8080/api/embeddings -d '{"model":"bge-small-en-v1.5","prompt":"test spam"}' + +services: + redis: + image: redis:7-alpine + ports: + - "6379:6379" + volumes: + - redis_data:/data + command: redis-server --appendonly yes + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 5s + timeout: 3s + retries: 3 + + embedding: + build: + context: . + dockerfile: Dockerfile + args: + EMBEDDING_MODEL: BAAI/bge-small-en-v1.5 + ports: + - "8080:8080" + environment: + - EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 + - EMBEDDING_PORT=8080 + healthcheck: + test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"] + interval: 30s + timeout: 10s + start_period: 60s + retries: 3 + depends_on: + redis: + condition: service_healthy + +volumes: + redis_data: diff --git a/contrib/neural-embedding-service/requirements-gpu.txt b/contrib/neural-embedding-service/requirements-gpu.txt new file mode 100644 index 0000000000..5d866974d1 --- /dev/null +++ b/contrib/neural-embedding-service/requirements-gpu.txt @@ -0,0 +1,21 @@ +# Rspamd Neural Embedding Service Dependencies (GPU) +# +# Install: pip install -r requirements-gpu.txt +# +# Uses sentence-transformers with PyTorch CUDA for GPU inference. +# Versions pinned for compatibility with PyTorch 2.1 / CUDA 12.1. + +# FastAPI web framework +fastapi>=0.100.0 + +# ASGI server +uvicorn[standard]>=0.23.0 + +# sentence-transformers for GPU inference (uses PyTorch + CUDA) +# Pin versions for compatibility +numpy<2 +transformers==4.40.0 +sentence-transformers==2.7.0 + +# Data validation +pydantic>=2.0.0 diff --git a/contrib/neural-embedding-service/vastai-launch.sh b/contrib/neural-embedding-service/vastai-launch.sh new file mode 100755 index 0000000000..fde947b3d8 --- /dev/null +++ b/contrib/neural-embedding-service/vastai-launch.sh @@ -0,0 +1,285 @@ +#!/bin/bash +# Rspamd Neural Embedding Service - Vast.ai Launch Script +# +# This script helps launch the embedding service on vast.ai GPU instances. +# +# Prerequisites: +# 1. Install vastai CLI: pip install vastai +# 2. Set API key: vastai set api-key YOUR_API_KEY +# +# Usage: +# ./vastai-launch.sh [options] +# +# Options: +# --model MODEL Embedding model (default: BAAI/bge-small-en-v1.5) +# --gpu GPU_TYPE GPU type filter (default: RTX_3090) +# --max-price MAX Maximum $/hr (default: 0.30) +# --disk DISK_GB Disk space in GB (default: 20) +# --search-only Only search for instances, don't launch +# --show-url ID Show service URL for a running instance +# --help Show this help message + +set -e + +# Defaults - use multilingual-e5-large for GPU (supports Russian and 100+ languages) +MODEL="${EMBEDDING_MODEL:-intfloat/multilingual-e5-large}" +GPU_TYPE="RTX_3090" +MAX_PRICE="0.30" +DISK_GB="20" +SEARCH_ONLY=false + +# Parse arguments +while [[ $# -gt 0 ]]; do + case $1 in + --model) + MODEL="$2" + shift 2 + ;; + --gpu) + GPU_TYPE="$2" + shift 2 + ;; + --max-price) + MAX_PRICE="$2" + shift 2 + ;; + --disk) + DISK_GB="$2" + shift 2 + ;; + --search-only) + SEARCH_ONLY=true + shift + ;; + --show-url) + # Show service URL for a running instance + if [ -z "$2" ]; then + echo "Usage: $0 --show-url " + exit 1 + fi + echo "Getting connection info for instance $2..." + INFO=$(vastai show instance "$2" --raw 2>/dev/null) + if [ $? -ne 0 ]; then + echo "Error: Could not get instance info" + exit 1 + fi + + SSH_HOST=$(echo "$INFO" | grep -oE '"ssh_host": "[^"]+"' | cut -d'"' -f4) + SSH_PORT=$(echo "$INFO" | grep -oE '"ssh_port": [0-9]+' | grep -oE '[0-9]+') + PUBLIC_IP=$(echo "$INFO" | grep -oE '"public_ipaddr": "[^"]+"' | cut -d'"' -f4) + STATUS=$(echo "$INFO" | grep -oE '"actual_status": "[^"]+"' | cut -d'"' -f4) + + echo "" + echo "Instance Status: $STATUS" + echo "Public IP: $PUBLIC_IP" + echo "SSH: ssh -p $SSH_PORT root@$SSH_HOST" + echo "" + echo "=== Access Methods ===" + echo "" + echo "Option 1: SSH Tunnel (recommended for testing)" + echo " Run this in a separate terminal:" + echo " ssh -L 8080:localhost:8080 -p $SSH_PORT root@$SSH_HOST" + echo " Then access: http://localhost:8080/health" + echo "" + echo "Option 2: Direct access via public IP" + echo " First, SSH in and check if the service is running:" + echo " ssh -p $SSH_PORT root@$SSH_HOST" + echo " curl localhost:8080/health" + echo "" + echo " If running, access via: http://$PUBLIC_IP:8080" + echo " (Note: May require firewall/port config on vast.ai)" + echo "" + exit 0 + ;; + --help) + head -25 "$0" | tail -20 + exit 0 + ;; + *) + echo "Unknown option: $1" + exit 1 + ;; + esac +done + +# Check vastai CLI +if ! command -v vastai &> /dev/null; then + echo "Error: vastai CLI not found. Install with: pip install vastai" + exit 1 +fi + +# Startup script that runs inside the vast.ai instance +ONSTART_SCRIPT=$(cat << 'SCRIPT' +#!/bin/bash +set -e + +# Install dependencies using uv (10-100x faster than pip) +pip install uv +# Use sentence-transformers with PyTorch CUDA (pinned versions for compatibility) +uv pip install --system "numpy<2" "transformers==4.40.0" "sentence-transformers==2.7.0" fastapi uvicorn[standard] pydantic + +# Create embedding service +cat > /root/embedding_service.py << 'EOF' +import os +import logging +from typing import List, Union, Optional +from fastapi import FastAPI, HTTPException +from pydantic import BaseModel +import uvicorn + +# Use sentence-transformers with PyTorch CUDA +from sentence_transformers import SentenceTransformer + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +app = FastAPI(title="Rspamd Embedding Service (GPU)") + +# Configuration +MODEL_NAME = os.environ.get("EMBEDDING_MODEL", "intfloat/multilingual-e5-large") + +# Initialize model on CUDA +logger.info(f"Loading model {MODEL_NAME} on CUDA...") +model = SentenceTransformer(MODEL_NAME, device="cuda") +logger.info(f"Loaded {MODEL_NAME} on {model.device}") + +class OllamaRequest(BaseModel): + model: str + prompt: str + +class OllamaResponse(BaseModel): + embedding: List[float] + +class OpenAIRequest(BaseModel): + model: str + input: Union[str, List[str]] + +class EmbeddingData(BaseModel): + embedding: List[float] + index: int + object: str = "embedding" + +class OpenAIResponse(BaseModel): + object: str = "list" + data: List[EmbeddingData] + model: str + usage: dict + +def get_embeddings(texts: List[str]) -> List[List[float]]: + embs = model.encode(texts, convert_to_numpy=True) + return [emb.tolist() for emb in embs] + +@app.get("/health") +async def health(): + return {"status": "ok", "model": MODEL_NAME, "device": str(model.device)} + +@app.post("/api/embeddings", response_model=OllamaResponse) +async def ollama_embeddings(request: OllamaRequest): + try: + embeddings = get_embeddings([request.prompt]) + return OllamaResponse(embedding=embeddings[0]) + except Exception as e: + logger.error(f"Embedding error: {e}") + raise HTTPException(status_code=500, detail=str(e)) + +@app.post("/v1/embeddings", response_model=OpenAIResponse) +async def openai_embeddings(request: OpenAIRequest): + try: + texts = [request.input] if isinstance(request.input, str) else request.input + embeddings = get_embeddings(texts) + data = [EmbeddingData(embedding=emb, index=i) for i, emb in enumerate(embeddings)] + return OpenAIResponse( + data=data, + model=request.model, + usage={"prompt_tokens": len(texts), "total_tokens": len(texts)} + ) + except Exception as e: + logger.error(f"Embedding error: {e}") + raise HTTPException(status_code=500, detail=str(e)) + +if __name__ == "__main__": + port = int(os.environ.get("EMBEDDING_PORT", "8080")) + host = os.environ.get("EMBEDDING_HOST", "0.0.0.0") + uvicorn.run(app, host=host, port=port) +EOF + +# Start service +cd /root +EMBEDDING_MODEL="${EMBEDDING_MODEL}" python embedding_service.py & + +echo "Embedding service started on port 8080" +SCRIPT +) + +echo "=== Rspamd Embedding Service - Vast.ai Launcher ===" +echo "Model: $MODEL" +echo "GPU: $GPU_TYPE" +echo "Max price: \$$MAX_PRICE/hr" +echo "" + +# Search for available instances +echo "Searching for available instances..." +QUERY="gpu_name=$GPU_TYPE rentable=true dph<$MAX_PRICE disk_space>=$DISK_GB cuda_vers>=12.0" + +vastai search offers "$QUERY" --order 'dph' | head -20 + +if [ "$SEARCH_ONLY" = true ]; then + echo "" + echo "Search only mode. To launch, run without --search-only" + exit 0 +fi + +echo "" +read -p "Enter instance ID to rent (or 'q' to quit): " INSTANCE_ID + +if [ "$INSTANCE_ID" = "q" ]; then + echo "Aborted." + exit 0 +fi + +# Create the instance +echo "Creating instance $INSTANCE_ID..." +vastai create instance "$INSTANCE_ID" \ + --image pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime \ + --disk "$DISK_GB" \ + --env "EMBEDDING_MODEL=$MODEL" \ + --onstart-cmd "$ONSTART_SCRIPT" + +echo "" +echo "Instance created! Monitor with: vastai show instances" +echo "" +echo "=== Finding your service URL ===" +echo "" +echo "1. Wait for instance to be 'running': vastai show instances" +echo "" +echo "2. Get the public URL (port 8080 is mapped to a random port):" +echo " vastai show instance " +echo "" +echo " Look for 'ports' section, e.g.:" +echo " 8080/tcp -> 0.0.0.0:41234" +echo " This means your service is at: http://:41234" +echo "" +echo "3. Or use SSH tunnel for testing:" +echo " vastai ssh-url " +echo " ssh -L 8080:localhost:8080 " +echo " Then use: http://localhost:8080" +echo "" +echo "4. Configure Rspamd with the public URL:" +echo "" +echo " neural {" +echo " rules {" +echo " default {" +echo " providers = [" +echo " {" +echo " type = \"llm\";" +echo " llm_type = \"ollama\";" +echo " model = \"$MODEL\";" +echo " url = \"http://:/api/embeddings\";" +echo " }" +echo " ];" +echo " }" +echo " }" +echo " }" +echo "" +echo "5. Test the endpoint:" +echo " curl http://:/health"