Deployment Strategy
This project deploys a small, containerized stack: backend (FastAPI), frontend (Streamlit), and an Ollama service for LLM helpers. You can run locally with Docker Compose or push images to GHCR for remote environments.
Prerequisites
- Docker and Docker Compose installed
- Node.js ≥ 20 (for
make api-docs/ Redoc build) - Bruno CLI (
bru) on PATH for API tests (optional) - Python 3.10+ and
uvfor local tooling
System Architecture & API
- Framework: FastAPI app (
src/backend/app.py) with CORS and Prometheus metrics. - Routers under
src/backend/routers/:predict.py(prefix/predict):POST /predict: Classify uploaded image (multipartimage).POST /predict/explain: Return attention heatmap + prediction metadata.- Model + processor lazy-loaded via
@lru_cache; device auto-selected (MPS → CUDA → CPU).
dashboard.py:GET /dashboardHTML metrics dashboard fed by Prometheus/metrics.llm.py(prefix/llm):POST /llm/generate: Generate text from prompt (JSON:prompt,temperature,max_tokens).GET /llm/health: Check Ollama availability; returns available models and configured model.- Config:
OLLAMA_HOST(defaulthttp://localhost:11434), modelgemma3:270m.
- OpenAPI: Exportable via
src/backend/export_schema.pytosrc/backend/openapi.jsonand published to docs. - Observability:
prometheus_fastapi_instrumentatorexposes metrics; CORS configured for local dev andtikkamasalai.tech.
Images and Tagging
For per-service Dockerfiles, Compose service definitions, registry images, and CI/CD details, see Containers → Compose & CI/CD.
Model Artifacts (GCS)
- Bucket:
tikkamasalai-models(location:eu, multi‑region). - Artifacts are downloaded during backend image build via
src/backend/download_model.py. - Build secrets provide GCP credentials (
GOOGLE_APPLICATION_CREDENTIALS);GOOGLE_CLOUD_PROJECTis passed at build time. - Files are synced to
./modelsin the image. This avoids runtime stalls and ensures portable builds.
Environments
Environment-specific Compose usage and image sources are documented in Containers → Compose & CI/CD. In brief:
- Development: docker-compose-local.yml builds backend/frontend locally and mounts frontend secrets.
- Staging/Production: docker-compose.yml uses prebuilt GHCR images.
Secrets and Configuration
- Frontend secrets: mount
./src/frontend/.streamlit/secrets.tomlinto/app/.streamlit/secrets.toml(read-only) for local; use platform secret stores for remote. - Backend runtime variables:
APP_DEBUG=falsefor production-like runs.OLLAMA_HOSTmust point to internal Ollama service (e.g.,http://ollama:11434under Compose).
Cloud Deployment (VM on GCP)
- Provider: GCP VM hosting backend and frontend; Ollama runs as a service.
- VM spec:
- Location:
europe-west1-c(regioneurope-west1). - Machine type:
c4a-standard-2(2 vCPUs, 8 GB RAM). - CPU platform/Architecture: Google Axion, Arm64 (chosen to align with Apple Silicon dev machines and early non-multi-arch builds).
- OS image:
debian-12-bookworm-arm64-v20251014. - Boot disk: 20 GB Hyperdisk Balanced with daily schedule; additional disk attached to store both the image classification model and LLM.
- Location:
- Networking & TLS: Nginx reverse proxy fronts
:443; Certbot manages TLS; routestikkamasalai.techto frontend andapi.tikkamasalai.techto backend.
Rollout with Docker Compose (Server)
On a remote host with Docker and Compose installed:
1) Copy docker-compose.yml (and optionally docker-compose.override.yml).
2) Configure environment (.env) and secrets.
3) Start:
docker compose up -d
4) Verify health:
- Backend: curl http://<host>:8000/health
- Frontend: open http://<host>:8501
Health Checks and Monitoring
Compose healthchecks and startup ordering are detailed in Containers. For quick verification after rollout, use make compose-logs (or docker compose logs -f).
Testing After Deployment
- Quick checks:
- Backend health:
curl http://<host>:8000/health→ 200 - Frontend: available at
http://<host>:8501
- Backend health:
- Full guidance: Development → Testing
- Unit tests:
make test - API tests (Bruno):
make test-local-api,make test-deployed-api
- Unit tests:
Security & Compliance
- Data handling: Images uploaded via multipart form; processed transiently in-memory; no persistence by default.
- Access control: Public API at
https://api.tikkamasalai.tech(demo scope). - CORS: Origins allowed for local and production domains.
- Secrets: Follow
CONTRIBUTING.md; never commit credentials. Use env vars or secret stores. - Future hardening: Consider HTTPS enforcement details (already via Nginx), rate limiting, and auth if usage expands.
Adoption & Developer Experience
- Ease of use: Simple multipart upload; predictable JSON responses; OpenAPI published via
make api-docs. - Versioning: Routes are currently unversioned (e.g.,
/predict). Versioning the API is a potential future enhancement.
See Also
- Containers & Compose details: Containers
- CI/CD, multi-arch builds, registry, and VM rollout: Component Delivery