AWS Inferentia2

Inferentia Gen2 Architecture

Active

Launched November 2022

Core Specifications

VendorAWS

ArchitectureInferentia Gen2

Form Factor—

VRAM32 GB

Memory Bandwidth—

TDP150 W

Precision	TFLOPs
BF16	190

Configuration	Precision	Performance	Source
Stable Diffusion 2.1, 512x512	—	1.2 images_per_second	View

Configuration	Precision	Performance	Source
EC2 inf2.xlarge pricing	—	0.12 cost_per_million_tokens	View
LLaMA 13B, optimized for cost-efficiency	—	5,500 tokens_per_second	View

Peak Performance

190

TFLOPs (BF16)

Efficiency

1.27

TFLOPs per Watt

View other AWS GPUs or compare across vendors