ROCm for Windows培訓
ROCm 是一個用於 GPU 程式設計的開源平臺,支援 AMD GPU,還提供與 CUDA 和 OpenCL 的相容性。ROCm 使程式師瞭解硬體細節,並完全控制並行化過程。但是,這也需要對設備架構、記憶體模型、執行模型和優化技術有很好的理解。
ROCm for Windows 是最近開發的一項發展,允許使用者在 Windows 操作系統上安裝和使用 ROCm,廣泛用於個人和專業目的。ROCm for Windows 使用戶能夠將 AMD GPU 的強大功能用於各種應用,例如人工智慧、遊戲、圖形和科學計算。
這種以講師為主導的現場培訓(現場或遠端)面向希望在Windows上安裝和使用ROCm以程式設計AMD GPU並利用其並行性的初級到中級開發人員。
在培訓結束時,參與者將能夠:
- 在 Windows 上設置包含 ROCm 平臺、AMD GPU 和 Visual Studio 代碼的開發環境。
- 創建一個基本的 ROCm 程式,該程式對 GPU 執行向量加法並從 GPU 記憶體中檢索結果。
- 使用 ROCm API 查詢設備資訊、分配和釋放裝置記憶體、在主機和設備之間複製數據、啟動內核和同步線程。
- 使用 HIP 語言編寫在 GPU 上執行的內核並操作數據。
- 使用 HIP 內置函數、變數和庫來執行常見任務和操作。
- 使用 ROCm 和 HIP 記憶體空間(如全域、共用、常量和本地)來優化數據傳輸和記憶體訪問。
- 使用 ROCm 和 HIP 執行模型來控制定義並行度的線程、塊和網格。
- 使用 ROCm Debugger 和 ROCm Profiler 等工具調試和測試 ROCm 和 HIP 程式。
- 使用合併、緩存、預取和分析等技術優化 ROCm 和 HIP 程式。
課程形式
- 互動講座和討論。
- 大量的練習和練習。
- 在現場實驗室環境中動手實施。
課程自定義選項
- 如需申請本課程的定製培訓,請聯繫我們進行安排。
課程簡介
介紹
- 什麼是ROCm?
- 什麼是HIP?
- ROCm 與 CUDA 與 OpenCL
- ROCm 和 HIP 功能和體系結構概述
- ROCm for Windows 與 ROCm 的 Linux
安裝
- 在 Windows 上安裝 ROCm
- 驗證安裝並檢查設備相容性
- 在 Windows 上更新或卸載 ROCm
- 常見安裝問題疑難解答
開始
- 在 Windows 上使用 Visual Studio 代碼創建新的 ROCm 專案
- 瀏覽項目結構和檔
- 編譯並運行程式
- 使用 printf 和 fprintf 顯示輸出
ROCm API
- 在主機程式中使用 ROCm API
- 查詢設備資訊和功能
- 分配和解除分配設備記憶體
- 在主機和設備之間複製數據
- 啟動內核並同步線程
- 處理錯誤和異常
HIP 語言
- 在設備程式中使用 HIP 語言
- 編寫在 GPU 上執行並操作數據的內核
- 使用數據類型、限定子、運算符和表達式
- 使用內置函數、變數和庫
ROCm 和 HIP 記憶體模型
- 使用不同的記憶體空間,例如全域記憶體空間、共用記憶體空間、常量記憶體空間和局部記憶體空間
- 使用不同的記憶體物件,例如指標、陣列、紋理和表面
- 使用不同的記憶體訪問模式,如唯讀、只寫、讀寫等。
- 使用記憶體一致性模型和同步機制
ROCm 和 HIP 執行模型
- 使用不同的執行模型,例如線程、塊和網格
- 使用線程函數,如hipThreadIdx_x、hipBlockIdx_x、hipBlockDim_x等。
- 使用塊函數,如 __syncthreads、__threadfence_block 等。
- 使用網格函數,如hipGridDim_x、hipGridSync、合作組等。
調試
- 在 Windows 上調試 ROCm 和 HIP 程式
- 使用 Visual Studio 代碼調試器檢查變數、斷點、調用堆疊等。
- 使用 ROCm 除錯器除錯 AMD 裝置上的 ROCm 和 HIP 程式
- 使用 ROCm Profiler 分析 AMD 設備上的 ROCm 和 HIP 程式
優化
- 在 Windows 上優化 ROCm 和 HIP 程式
- 使用合併技術提高記憶體輸送量
- 使用緩存和預取技術來減少記憶體延遲
- 使用共用記憶體和本地記憶體技術來優化記憶體訪問和頻寬
- 使用分析和分析工具來衡量和改進執行時間和資源利用率
總結和下一步
最低要求
- 瞭解 C/C++ 語言和並行程式設計概念
- 計算機體系結構和記憶體層次結構的基礎知識
- 具有命令行工具和代碼編輯器的經驗
- 熟悉 Windows 作業系統和 PowerShell
觀眾
- 希望瞭解如何在 Windows 上安裝和使用 ROCm 對 AMD GPU 進行程式設計並利用其並行性的開發人員
- 希望編寫可在不同 AMD 設備上運行的高性能和可擴展代碼的開發人員
- 希望探索 GPU 程式設計的低級方面並優化其代碼性能的程式師
Open Training Courses require 5+ participants.
ROCm for Windows培訓 - Booking
ROCm for Windows培訓 - Enquiry
ROCm for Windows - 咨詢詢問
咨詢詢問
Upcoming Courses
相關課程
Developing AI Applications with Huawei Ascend and CANN
21 時間:Huawei Ascend is a family of AI processors designed for high-performance inference and training.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI engineers and data scientists who wish to develop and optimize neural network models using Huawei’s Ascend platform and the CANN toolkit.
By the end of this training, participants will be able to:
- Set up and configure the CANN development environment.
- Develop AI applications using MindSpore and CloudMatrix workflows.
- Optimize performance on Ascend NPUs using custom operators and tiling.
- Deploy models to edge or cloud environments.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Huawei Ascend and CANN toolkit in sample applications.
- Guided exercises focused on model building, training, and deployment.
Course Customization Options
- To request a customized training for this course based on your infrastructure or datasets, please contact us to arrange.
Deploying AI Models with CANN and Ascend AI Processors
14 時間:CANN (Compute Architecture for Neural Networks) is Huawei’s AI compute stack for deploying and optimizing AI models on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI developers and engineers who wish to deploy trained AI models efficiently to Huawei Ascend hardware using the CANN toolkit and tools such as MindSpore, TensorFlow, or PyTorch.
By the end of this training, participants will be able to:
- Understand the CANN architecture and its role in the AI deployment pipeline.
- Convert and adapt models from popular frameworks to Ascend-compatible formats.
- Use tools like ATC, OM model conversion, and MindSpore for edge and cloud inference.
- Diagnose deployment issues and optimize performance on Ascend hardware.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab work using CANN tools and Ascend simulators or devices.
- Practical deployment scenarios based on real-world AI models.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
GPU Programming on Biren AI Accelerators
21 時間:Biren AI Accelerators are high-performance GPUs designed for AI and HPC workloads with support for large-scale training and inference.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level developers who wish to program and optimize applications using Biren’s proprietary GPU stack, with practical comparisons to CUDA-based environments.
By the end of this training, participants will be able to:
- Understand Biren GPU architecture and memory hierarchy.
- Set up the development environment and use Biren’s programming model.
- Translate and optimize CUDA-style code for Biren platforms.
- Apply performance tuning and debugging techniques.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Biren SDK in sample GPU workloads.
- Guided exercises focused on porting and performance tuning.
Course Customization Options
- To request a customized training for this course based on your application stack or integration needs, please contact us to arrange.
Cambricon MLU Development with BANGPy and Neuware
21 時間:Cambricon MLUs (Machine Learning 单元) 是专为边缘和数据中心场景中的推理和训练优化的AI芯片。
本次由讲师指导的培训(线上或线下)面向中级开发者,旨在帮助他们使用BANGPy框架和Neuware SDK在Cambricon MLU硬件上构建和部署AI模型。
通过本次培训,参与者将能够:
- 设置和配置BANGPy和Neuware开发环境。
- 开发和优化基于Python和C++的模型,适用于Cambricon MLUs。
- 将模型部署到运行Neuware运行时的边缘和数据中心设备。
- 将ML工作流与MLU特定的加速功能集成。
课程形式
- 互动式讲座和讨论。
- 动手实践,使用BANGPy和Neuware进行开发和部署。
- 指导练习,专注于优化、集成和测试。
课程定制选项
- 如需根据您的Cambricon设备型号或使用场景定制本次培训,请联系我们安排。
Introduction to CANN for AI Framework Developers
7 時間:CANN (Compute Architecture for Neural Networks) is Huawei’s AI computing toolkit used to compile, optimize, and deploy AI models on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at beginner-level AI developers who wish to understand how CANN fits into the model lifecycle from training to deployment, and how it works with frameworks like MindSpore, TensorFlow, and PyTorch.
By the end of this training, participants will be able to:
- Understand the purpose and architecture of the CANN toolkit.
- Set up a development environment with CANN and MindSpore.
- Convert and deploy a simple AI model to Ascend hardware.
- Gain foundational knowledge for future CANN optimization or integration projects.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with simple model deployment.
- Step-by-step walkthrough of the CANN toolchain and integration points.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN for Edge AI Deployment
14 時間:Huawei's Ascend CANN toolkit enables powerful AI inference on edge devices such as the Ascend 310. CANN provides essential tools for compiling, optimizing, and deploying models where compute and memory are constrained.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI developers and integrators who wish to deploy and optimize models on Ascend edge devices using the CANN toolchain.
By the end of this training, participants will be able to:
- Prepare and convert AI models for Ascend 310 using CANN tools.
- Build lightweight inference pipelines using MindSpore Lite and AscendCL.
- Optimize model performance for limited compute and memory environments.
- Deploy and monitor AI applications in real-world edge use cases.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab work with edge-specific models and scenarios.
- Live deployment examples on virtual or physical edge hardware.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Understanding Huawei’s AI Compute Stack: From CANN to MindSpore
14 時間:Huawei’s AI stack — from the low-level CANN SDK to the high-level MindSpore framework — offers a tightly integrated AI development and deployment environment optimized for Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level technical professionals who wish to understand how the CANN and MindSpore components work together to support AI lifecycle management and infrastructure decisions.
By the end of this training, participants will be able to:
- Understand the layered architecture of Huawei’s AI compute stack.
- Identify how CANN supports model optimization and hardware-level deployment.
- Evaluate the MindSpore framework and toolchain in relation to industry alternatives.
- Position Huawei's AI stack within enterprise or cloud/on-prem environments.
Format of the Course
- Interactive lecture and discussion.
- Live system demos and case-based walkthroughs.
- Optional guided labs on model flow from MindSpore to CANN.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Optimizing Neural Network Performance with CANN SDK
14 時間:CANN SDK (Compute Architecture for Neural Networks) is Huawei’s AI compute foundation that allows developers to fine-tune and optimize the performance of deployed neural networks on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at advanced-level AI developers and system engineers who wish to optimize inference performance using CANN’s advanced toolset, including the Graph Engine, TIK, and custom operator development.
By the end of this training, participants will be able to:
- Understand CANN's runtime architecture and performance lifecycle.
- Use profiling tools and Graph Engine for performance analysis and optimization.
- Create and optimize custom operators using TIK and TVM.
- Resolve memory bottlenecks and improve model throughput.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with real-time profiling and operator tuning.
- Optimization exercises using edge-case deployment examples.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN SDK for Computer Vision and NLP Pipelines
14 時間:The CANN SDK (Compute Architecture for Neural Networks) provides powerful deployment and optimization tools for real-time AI applications in computer vision and NLP, especially on Huawei Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI practitioners who wish to build, deploy, and optimize vision and language models using the CANN SDK for production use cases.
By the end of this training, participants will be able to:
- Deploy and optimize CV and NLP models using CANN and AscendCL.
- Use CANN tools to convert models and integrate them into live pipelines.
- Optimize inference performance for tasks like detection, classification, and sentiment analysis.
- Build real-time CV/NLP pipelines for edge or cloud-based deployment scenarios.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab with model deployment and performance profiling.
- Live pipeline design using real CV and NLP use cases.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Building Custom AI Operators with CANN TIK and TVM
14 時間:CANN TIK (Tensor Instruction Kernel) and Apache TVM enable advanced optimization and customization of AI model operators for Huawei Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at advanced-level system developers who wish to build, deploy, and tune custom operators for AI models using CANN’s TIK programming model and TVM compiler integration.
By the end of this training, participants will be able to:
- Write and test custom AI operators using the TIK DSL for Ascend processors.
- Integrate custom ops into the CANN runtime and execution graph.
- Use TVM for operator scheduling, auto-tuning, and benchmarking.
- Debug and optimize instruction-level performance for custom computation patterns.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on coding of operators using TIK and TVM pipelines.
- Testing and tuning on Ascend hardware or simulators.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Migrating CUDA Applications to Chinese GPU Architectures
21 時間:中國的GPU架構,如Huawei Ascend、Biren和Cambricon MLU,提供了專為本地AI和HPC市場量身定制的CUDA替代方案。
這項由講師指導的培訓(線上或線下)旨在為高級GPU程式設計師和基礎設施專家提供遷移和優化現有CUDA應用程序,以便在中國硬件平台上部署的能力。
培訓結束後,參與者將能夠:
- 評估現有CUDA工作負載與中國芯片替代方案的兼容性。
- 將CUDA代碼庫移植到華為CANN、Biren SDK和Cambricon BANGPy環境中。
- 比較性能並識別跨平台的優化點。
- 解決跨架構支持和部署中的實際挑戰。
課程形式
- 互動式講座和討論。
- 實踐代碼翻譯和性能比較實驗。
- 專注於多GPU適應策略的指導練習。
課程定制選項
- 如需根據您的平台或CUDA項目定制培訓,請聯繫我們安排。
Performance Optimization on Ascend, Biren, and Cambricon
21 時間:Ascend、Biren 和 Cambricon 是中國領先的 AI 硬體平台,各自提供獨特的加速和性能分析工具,用於生產規模的 AI 工作負載。
這項由講師指導的培訓(線上或線下)針對高級 AI 基礎設施和性能工程師,旨在優化跨多個中國 AI 晶片平台的模型推理和訓練工作流程。
在培訓結束時,參與者將能夠:
- 在 Ascend、Biren 和 Cambricon 平台上進行模型基準測試。
- 識別系統瓶頸和記憶體/計算效率低下的問題。
- 應用圖層級、核心層級和操作層級的優化。
- 調整部署管道以提高吞吐量和減少延遲。
課程形式
- 互動式講座和討論。
- 在每個平台上實際使用性能分析和優化工具。
- 專注於實際調整情境的指導練習。
課程定制選項
- 如需根據您的性能環境或模型類型定制此課程,請聯繫我們安排。