Add GPU and vast.ai support for neural embedding service
- Add Dockerfile.gpu for GPU-accelerated inference with PyTorch CUDA
- Add requirements-gpu.txt with pinned versions for CUDA compatibility
- Add vastai-launch.sh script for deploying on vast.ai cloud GPUs
- Update README with GPU deployment instructions and model recommendations
Default GPU model: intfloat/multilingual-e5-large (100+ languages including Russian)
Tested on RTX 4090 with ~20-50ms latency per embedding.