Groq LPU Inference Engine

TSP (Tensor Streaming Processor) Architecture

Active

Launched February 2024

Core Specifications

VendorGroq
ArchitectureTSP (Tensor Streaming Processor)
Form Factor
VRAM230 GB
Memory Bandwidth
TDP300 W

Compute Performance

PrecisionTFLOPs

Performance Benchmarks

llm inference

ConfigurationPrecisionPerformanceSource
Mixtral 8x7B, concurrent queries450 throughput_queries_secView
LLaMA 70B, first token latency (fastest in industry)1.5 latency_msView
LLaMA 70B, FP16, batch_size=118,000 tokens_per_secondView

Quick Stats

Similar XPUs

View other Groq GPUs or compare across vendors