AWS Inferentia2

Inferentia Gen2 Architecture

Active

Launched November 2022

Core Specifications

VendorAWS
ArchitectureInferentia Gen2
Form Factor
VRAM32 GB
Memory Bandwidth
TDP150 W

Compute Performance

PrecisionTFLOPs
BF16190

Performance Benchmarks

image gen

ConfigurationPrecisionPerformanceSource
Stable Diffusion 2.1, 512x5121.2 images_per_secondView

llm inference

ConfigurationPrecisionPerformanceSource
EC2 inf2.xlarge pricing0.12 cost_per_million_tokensView
LLaMA 13B, optimized for cost-efficiency5,500 tokens_per_secondView

Quick Stats

Peak Performance
190
TFLOPs (BF16)
Efficiency
1.27
TFLOPs per Watt

Similar XPUs

View other AWS GPUs or compare across vendors