Vision Models
Hosted computer-vision foundation models — call them via a single API. Detection, segmentation, OCR, and multimodal — no GPU setup required.
Springtech Flash
Fast multimodal LLM optimized for chat & vision Q&A, served on A6000 GPU.
Springtech Cruise
Value-tier multimodal LLM, served on RTX 3060. Best for high-volume batch workloads.
SAM 2
Segment Anything Model v2 — zero-shot instance segmentation from box or point prompts.
YOLO-World
Open-vocabulary object detection — detect any class with a text prompt, no fine-tuning.
CLIP
Zero-shot image classification + image/text embeddings for semantic search.
Florence-2
Unified vision foundation model — captioning, detection, segmentation, OCR.
DocTR
Document text recognition — fast OCR for invoices, receipts, and forms.
EasyOCR
Multi-language OCR supporting 80+ languages including East Asian scripts.
Need a model that's not listed? Ask the Springtech Agent — we host most popular open-source CV models on request.