New

Vision Models

Hosted computer-vision foundation models — call them via a single API. Detection, segmentation, OCR, and multimodal — no GPU setup required.

API docs

8 models

Available

Springtech Flash

S·multimodal

Fast multimodal LLM optimized for chat & vision Q&A, served on A6000 GPU.

ChatVision Q&AFast

Included on Pro planUse Model

Available

Springtech Cruise

S·multimodal

Value-tier multimodal LLM, served on RTX 3060. Best for high-volume batch workloads.

ChatBatchValue

Included on Pro planUse Model

Available

SAM 2

M·segmentation

Segment Anything Model v2 — zero-shot instance segmentation from box or point prompts.

SegmentationZero-shotBox prompt

RM 0.002 / imageUse Model

Preview

YOLO-World

T·detection

Open-vocabulary object detection — detect any class with a text prompt, no fine-tuning.

DetectionOpen-vocabText prompt

RM 0.001 / imageUse Model

Preview

CLIP

S·embedding

Zero-shot image classification + image/text embeddings for semantic search.

ClassificationEmbeddingsZero-shot

RM 0.0005 / imageUse Model

Coming soon

Florence-2

M·multimodal

Unified vision foundation model — captioning, detection, segmentation, OCR.

CaptioningDetectionOCR

—

Available

DocTR

M·ocr

Document text recognition — fast OCR for invoices, receipts, and forms.

OCRDocumentsFast

RM 0.001 / pageUse Model

Coming soon

EasyOCR

J·ocr

Multi-language OCR supporting 80+ languages including East Asian scripts.

OCRMulti-language

—

Need a model that's not listed? Ask the Springtech Agent — we host most popular open-source CV models on request.

Language ModelsVision Models

New

Vision Models

Hosted computer-vision foundation models — call them via a single API. Detection, segmentation, OCR, and multimodal — no GPU setup required.

API docs

8 models

Available

Springtech Flash

S·multimodal

Fast multimodal LLM optimized for chat & vision Q&A, served on A6000 GPU.

ChatVision Q&AFast

Included on Pro planUse Model

Available

Springtech Cruise

S·multimodal

Value-tier multimodal LLM, served on RTX 3060. Best for high-volume batch workloads.

ChatBatchValue

Included on Pro planUse Model

Available

SAM 2

M·segmentation

Segment Anything Model v2 — zero-shot instance segmentation from box or point prompts.

SegmentationZero-shotBox prompt

RM 0.002 / imageUse Model

Preview

YOLO-World

T·detection

Open-vocabulary object detection — detect any class with a text prompt, no fine-tuning.

DetectionOpen-vocabText prompt

RM 0.001 / imageUse Model

Preview

CLIP

S·embedding

Zero-shot image classification + image/text embeddings for semantic search.

ClassificationEmbeddingsZero-shot

RM 0.0005 / imageUse Model

Coming soon

Florence-2

M·multimodal

Unified vision foundation model — captioning, detection, segmentation, OCR.

CaptioningDetectionOCR

—

Available

DocTR

M·ocr

Document text recognition — fast OCR for invoices, receipts, and forms.

OCRDocumentsFast

RM 0.001 / pageUse Model

Coming soon

EasyOCR

J·ocr

Multi-language OCR supporting 80+ languages including East Asian scripts.

OCRMulti-language

—

Need a model that's not listed? Ask the Springtech Agent — we host most popular open-source CV models on request.