AI systems performance engineering : optimizing model training and inference workloads with GPUs, CUDA, and PyTorch
(2025)

Nonfiction

Book

Call Numbers:
006.3/FREGLY,C

0 Holds on 1 Copy

Availability

Locations Call Number Status
Adult Nonfiction 006.3/FREGLY,C Due: 2/10/2026

Details

PUBLISHED
Santa Rosa, CA : O'Reilly Media, Inc., 2025
EDITION
First edition
DESCRIPTION

xxvi, 1031 pages : color illustrations ; 24 cm

ISBN/ISSN
9798341627789, 9788341627780, 8341627787, 9798341627789
LANGUAGE
English
NOTES

Includes index

Introduction and AI system overview -- AI system hardware overview -- OS, Docker, and Kubernetes tuning for GPU-based environments -- Tuning distributed networking communication -- GPU-based storage I/O optimizations -- GPU architecture, CUDA programming, and maximizing occupancy -- Profiling and tuning GPU memory access patterns -- Occupancy tuning, warp efficiency, and instruction-level parallelism -- Increasing CUDA kernel efficiency and arithmetic intensity -- Intra-kernel pipelining, warp specialization, and cooperative thread block clusters -- Inter-kernel pipelining, synchronization, and CUDA stream-ordered memory allocations -- Dynamic scheduling, CUDA graphs, and device-initiated kernel orchestration -- Profiling, tuning, and scaling PyTorch -- PyTorch compiler, OpenAI Triton, and XLA backends -- Multinode interference, parallelism, decoding, and routing optimizations -- Profiling, debugging, and tuning inference at scale -- Scaling disaggregated prefill and decode for inference -- Advanced prefill-decode and KV cache tuning -- Dynamic and adaptive inference engine optimizations -- AI-assisted performance optimizations and scaling toward multimillion GPU clusters -- Appendix: AI systems performance checklist (175+ items)

Additional Titles