AI systems performance engineering : optimizing model training and inference workloads with GPUs, CUDA, and PyTorch

All Copies Checked Out

AI systems performance engineering : optimizing model training and inference workloads with GPUs, CUDA, and PyTorch

(2025)

By: Fregly, Chris

Nonfiction

Book

Call Numbers:

006.3/FREGLY,C

0 Holds on 1 Copy

Check Partner Libraries

Availability

Locations	Call Number	Status
Adult Nonfiction	006.3/FREGLY,C	Due: 2/10/2026

Details

PUBLISHED

Santa Rosa, CA : O'Reilly Media, Inc., 2025

EDITION

First edition

DESCRIPTION

xxvi, 1031 pages : color illustrations ; 24 cm

ISBN/ISSN

9798341627789, 9788341627780, 8341627787, 9798341627789

LANGUAGE

English

NOTES

Includes index

Introduction and AI system overview -- AI system hardware overview -- OS, Docker, and Kubernetes tuning for GPU-based environments -- Tuning distributed networking communication -- GPU-based storage I/O optimizations -- GPU architecture, CUDA programming, and maximizing occupancy -- Profiling and tuning GPU memory access patterns -- Occupancy tuning, warp efficiency, and instruction-level parallelism -- Increasing CUDA kernel efficiency and arithmetic intensity -- Intra-kernel pipelining, warp specialization, and cooperative thread block clusters -- Inter-kernel pipelining, synchronization, and CUDA stream-ordered memory allocations -- Dynamic scheduling, CUDA graphs, and device-initiated kernel orchestration -- Profiling, tuning, and scaling PyTorch -- PyTorch compiler, OpenAI Triton, and XLA backends -- Multinode interference, parallelism, decoding, and routing optimizations -- Profiling, debugging, and tuning inference at scale -- Scaling disaggregated prefill and decode for inference -- Advanced prefill-decode and KV cache tuning -- Dynamic and adaptive inference engine optimizations -- AI-assisted performance optimizations and scaling toward multimillion GPU clusters -- Appendix: AI systems performance checklist (175+ items)