Introduction to GPU Programming培訓
GPU 程式設計是一種利用 GPU 的並行處理能力來加速需要高性能計算的應用程式的技術,例如人工智慧、遊戲、圖形和科學計算。有幾種框架和工具可以實現 GPU 程式設計,每種框架和工具都有自己的優點和缺點。一些最受歡迎的是 OpenCL、CUDA、ROCm 和 HIP。
這種以講師為主導的現場培訓(現場或遠端)針對的是希望學習GPU程式設計基礎知識以及開發GPU應用程式的主要框架和工具的初級到中級開發人員。
- 在培訓結束時,參與者將能夠:
瞭解 CPU 和 GPU 計算之間的區別以及 GPU 程式設計的好處和挑戰。 - 為他們的 GPU 應用程式選擇正確的框架和工具。
- 創建一個基本的 GPU 程式,該程式使用一個或多個框架和工具執行向量加法。
- 使用相應的 API、語言和庫來查詢設備資訊、分配和解除分配設備記憶體、在主機和設備之間複製數據、啟動內核以及同步線程。
- 使用相應的記憶體空間(如全域、本地、常量和專用)來優化數據傳輸和記憶體訪問。
- 使用相應的執行模型(如工作項、工作組、線程、塊和網格)來控制並行度。
- 使用 CodeXL 、 CUDA-GDB 、 CUDA-MEMCHECK 和 NVIDIA Nsight 等工具調試和測試 GPU 程式。
- 使用合併、緩存、預取和分析等技術優化 GPU 程式。
課程形式
- 互動講座和討論。
- 大量的練習和練習。
- 在現場實驗室環境中動手實施。
課程自定義選項
- 如需申請本課程的定製培訓,請聯繫我們進行安排。
課程簡介
介紹
- 什麼是 GPU 程式設計?
- 為什麼要使用 GPU 程式設計?
- GPU 程式設計的挑戰和權衡是什麼?
- GPU 程式設計的框架和工具是什麼?
- 為您的應用程式選擇正確的框架和工具
OpenCL
- 什麼是OpenCL?
- OpenCL的優點和缺點是什麼?
- 為 OpenCL 設置開發環境
- 創建一個執行向量加法的基本 OpenCL 程式
- 使用 OpenCL API 查詢設備資訊、分配和釋放設備記憶體、在主機和設備之間複製數據、啟動內核和同步線程
- 使用 OpenCL C 語言編寫在設備上執行的內核並操作數據
- 使用 OpenCL 內置函數、變數和庫執行常見任務和操作
- 使用 OpenCL 記憶體空間(例如全域、本地、常量和專用)來優化數據傳輸和記憶體訪問
- 使用 OpenCL 執行模型來控制定義並行度的工作項、工作組和 ND 範圍
- 使用 CodeXL 等工具調試和測試 OpenCL 個程式
- 使用合併、緩存、預取和分析等技術優化 OpenCL 個程式
CUDA的
- 什麼是CUDA?
- CUDA的優缺點是什麼?
- 設置 CUDA 的開發環境
- 創建一個執行向量加法的基本 CUDA 程式
- 使用 CUDA API 查詢設備資訊、分配和釋放設備記憶體、在主機和設備之間複製數據、啟動內核和同步線程
- 使用 CUDA C/C++ 語言編寫在設備上執行的內核並操作數據
- 使用 CUDA 內建函數、變數和庫執行常見任務和操作
- 使用 CUDA 記憶體空間(例如全域、共用、常量和本地)來優化數據傳輸和記憶體訪問
- 使用 CUDA 執行模型來控制定義並行度的線程、塊和網格
- 使用 CUDA-GDB、CUDA-MEMCHECK 和 NVIDIA Nsight 等工具調試和測試 CUDA 程式
- 使用合併、緩存、預取和分析等技術優化 CUDA 程式
中華民國
- 什麼是ROCm?
- ROCm的優缺點是什麼?
- 為 ROCm 設定開發環境
- 創建執行向量加法的基本 ROCm 程式
- 使用 ROCm API 查詢設備資訊、分配和釋放裝置記憶體、在主機和設備之間複製數據、啟動內核和同步線程
- 使用 ROCm C/C++ 語言編寫在設備上執行的內核並操作數據
- 使用 ROCm 內建函數、變數和庫執行常見任務和操作
- 使用 ROCm 記憶體空間(如全域、本地、常量和專用)來優化數據傳輸和記憶體訪問
- 使用 ROCm 執行模型來控制定義並行度的線程、塊和網格
- 使用 ROCm Debugger 和 ROCm Profiler 等工具調試和測試 ROCm 程式
- 使用合併、緩存、預取和分析等技術優化 ROCm 程式
臀部
- 什麼是HIP?
- HIP的優點和缺點是什麼?
- 設置 HIP 的開發環境
- 創建執行向量加法的基本 HIP 程式
- 使用 HIP 語言編寫在設備上執行的內核並操作數據
- 使用 HIP 內置函數、變數和庫執行常見任務和操作
- 使用 HIP 記憶體空間(如全域、共用、常量和本地)來優化數據傳輸和記憶體訪問
- 使用 HIP 執行模型來控制定義並行度的線程、塊和網格
- 使用 ROCm Debugger 和 ROCm Profiler 等工具調試和測試 HIP 程式
- 使用合併、緩存、預取和分析等技術優化 HIP 程式
比較
- 比較 OpenCL、CUDA、ROCm 和 HIP 的功能、性能和相容性
- 使用基準和指標評估 GPU 個程式
- 學習 GPU 程式設計的最佳實踐和技巧
- 探索 GPU 程式設計的當前和未來趨勢和挑戰
總結和下一步
最低要求
- 瞭解 C/C++ 語言和並行程式設計概念
- 計算機體系結構和記憶體層次結構的基礎知識
- 具有命令行工具和代碼編輯器的經驗
觀眾
- 希望學習 GPU 程式設計基礎知識以及開發GPU應用程式的主要框架和工具的開發人員
- 希望編寫可在不同平臺和設備上運行的可移植和可擴展代碼的開發人員
- 希望探索 GPU 程式設計和優化的好處和挑戰的程式師
Open Training Courses require 5+ participants.
Introduction to GPU Programming培訓 - Booking
Introduction to GPU Programming培訓 - Enquiry
Introduction to GPU Programming - 咨詢詢問
咨詢詢問
Upcoming Courses
相關課程
Developing AI Applications with Huawei Ascend and CANN
21 時間:Huawei Ascend is a family of AI processors designed for high-performance inference and training.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI engineers and data scientists who wish to develop and optimize neural network models using Huawei’s Ascend platform and the CANN toolkit.
By the end of this training, participants will be able to:
- Set up and configure the CANN development environment.
- Develop AI applications using MindSpore and CloudMatrix workflows.
- Optimize performance on Ascend NPUs using custom operators and tiling.
- Deploy models to edge or cloud environments.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Huawei Ascend and CANN toolkit in sample applications.
- Guided exercises focused on model building, training, and deployment.
Course Customization Options
- To request a customized training for this course based on your infrastructure or datasets, please contact us to arrange.
Deploying AI Models with CANN and Ascend AI Processors
14 時間:CANN (Compute Architecture for Neural Networks) is Huawei’s AI compute stack for deploying and optimizing AI models on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI developers and engineers who wish to deploy trained AI models efficiently to Huawei Ascend hardware using the CANN toolkit and tools such as MindSpore, TensorFlow, or PyTorch.
By the end of this training, participants will be able to:
- Understand the CANN architecture and its role in the AI deployment pipeline.
- Convert and adapt models from popular frameworks to Ascend-compatible formats.
- Use tools like ATC, OM model conversion, and MindSpore for edge and cloud inference.
- Diagnose deployment issues and optimize performance on Ascend hardware.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab work using CANN tools and Ascend simulators or devices.
- Practical deployment scenarios based on real-world AI models.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
GPU Programming on Biren AI Accelerators
21 時間:Biren AI Accelerators are high-performance GPUs designed for AI and HPC workloads with support for large-scale training and inference.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level developers who wish to program and optimize applications using Biren’s proprietary GPU stack, with practical comparisons to CUDA-based environments.
By the end of this training, participants will be able to:
- Understand Biren GPU architecture and memory hierarchy.
- Set up the development environment and use Biren’s programming model.
- Translate and optimize CUDA-style code for Biren platforms.
- Apply performance tuning and debugging techniques.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Biren SDK in sample GPU workloads.
- Guided exercises focused on porting and performance tuning.
Course Customization Options
- To request a customized training for this course based on your application stack or integration needs, please contact us to arrange.
Cambricon MLU Development with BANGPy and Neuware
21 時間:Cambricon MLUs (Machine Learning 单元) 是专为边缘和数据中心场景中的推理和训练优化的AI芯片。
本次由讲师指导的培训(线上或线下)面向中级开发者,旨在帮助他们使用BANGPy框架和Neuware SDK在Cambricon MLU硬件上构建和部署AI模型。
通过本次培训,参与者将能够:
- 设置和配置BANGPy和Neuware开发环境。
- 开发和优化基于Python和C++的模型,适用于Cambricon MLUs。
- 将模型部署到运行Neuware运行时的边缘和数据中心设备。
- 将ML工作流与MLU特定的加速功能集成。
课程形式
- 互动式讲座和讨论。
- 动手实践,使用BANGPy和Neuware进行开发和部署。
- 指导练习,专注于优化、集成和测试。
课程定制选项
- 如需根据您的Cambricon设备型号或使用场景定制本次培训,请联系我们安排。
Introduction to CANN for AI Framework Developers
7 時間:CANN (Compute Architecture for Neural Networks) is Huawei’s AI computing toolkit used to compile, optimize, and deploy AI models on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at beginner-level AI developers who wish to understand how CANN fits into the model lifecycle from training to deployment, and how it works with frameworks like MindSpore, TensorFlow, and PyTorch.
By the end of this training, participants will be able to:
- Understand the purpose and architecture of the CANN toolkit.
- Set up a development environment with CANN and MindSpore.
- Convert and deploy a simple AI model to Ascend hardware.
- Gain foundational knowledge for future CANN optimization or integration projects.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with simple model deployment.
- Step-by-step walkthrough of the CANN toolchain and integration points.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN for Edge AI Deployment
14 時間:Huawei's Ascend CANN toolkit enables powerful AI inference on edge devices such as the Ascend 310. CANN provides essential tools for compiling, optimizing, and deploying models where compute and memory are constrained.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI developers and integrators who wish to deploy and optimize models on Ascend edge devices using the CANN toolchain.
By the end of this training, participants will be able to:
- Prepare and convert AI models for Ascend 310 using CANN tools.
- Build lightweight inference pipelines using MindSpore Lite and AscendCL.
- Optimize model performance for limited compute and memory environments.
- Deploy and monitor AI applications in real-world edge use cases.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab work with edge-specific models and scenarios.
- Live deployment examples on virtual or physical edge hardware.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Understanding Huawei’s AI Compute Stack: From CANN to MindSpore
14 時間:Huawei’s AI stack — from the low-level CANN SDK to the high-level MindSpore framework — offers a tightly integrated AI development and deployment environment optimized for Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level technical professionals who wish to understand how the CANN and MindSpore components work together to support AI lifecycle management and infrastructure decisions.
By the end of this training, participants will be able to:
- Understand the layered architecture of Huawei’s AI compute stack.
- Identify how CANN supports model optimization and hardware-level deployment.
- Evaluate the MindSpore framework and toolchain in relation to industry alternatives.
- Position Huawei's AI stack within enterprise or cloud/on-prem environments.
Format of the Course
- Interactive lecture and discussion.
- Live system demos and case-based walkthroughs.
- Optional guided labs on model flow from MindSpore to CANN.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Optimizing Neural Network Performance with CANN SDK
14 時間:CANN SDK (Compute Architecture for Neural Networks) is Huawei’s AI compute foundation that allows developers to fine-tune and optimize the performance of deployed neural networks on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at advanced-level AI developers and system engineers who wish to optimize inference performance using CANN’s advanced toolset, including the Graph Engine, TIK, and custom operator development.
By the end of this training, participants will be able to:
- Understand CANN's runtime architecture and performance lifecycle.
- Use profiling tools and Graph Engine for performance analysis and optimization.
- Create and optimize custom operators using TIK and TVM.
- Resolve memory bottlenecks and improve model throughput.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with real-time profiling and operator tuning.
- Optimization exercises using edge-case deployment examples.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN SDK for Computer Vision and NLP Pipelines
14 時間:The CANN SDK (Compute Architecture for Neural Networks) provides powerful deployment and optimization tools for real-time AI applications in computer vision and NLP, especially on Huawei Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI practitioners who wish to build, deploy, and optimize vision and language models using the CANN SDK for production use cases.
By the end of this training, participants will be able to:
- Deploy and optimize CV and NLP models using CANN and AscendCL.
- Use CANN tools to convert models and integrate them into live pipelines.
- Optimize inference performance for tasks like detection, classification, and sentiment analysis.
- Build real-time CV/NLP pipelines for edge or cloud-based deployment scenarios.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab with model deployment and performance profiling.
- Live pipeline design using real CV and NLP use cases.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Building Custom AI Operators with CANN TIK and TVM
14 時間:CANN TIK (Tensor Instruction Kernel) and Apache TVM enable advanced optimization and customization of AI model operators for Huawei Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at advanced-level system developers who wish to build, deploy, and tune custom operators for AI models using CANN’s TIK programming model and TVM compiler integration.
By the end of this training, participants will be able to:
- Write and test custom AI operators using the TIK DSL for Ascend processors.
- Integrate custom ops into the CANN runtime and execution graph.
- Use TVM for operator scheduling, auto-tuning, and benchmarking.
- Debug and optimize instruction-level performance for custom computation patterns.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on coding of operators using TIK and TVM pipelines.
- Testing and tuning on Ascend hardware or simulators.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Migrating CUDA Applications to Chinese GPU Architectures
21 時間:中國的GPU架構,如Huawei Ascend、Biren和Cambricon MLU,提供了專為本地AI和HPC市場量身定制的CUDA替代方案。
這項由講師指導的培訓(線上或線下)旨在為高級GPU程式設計師和基礎設施專家提供遷移和優化現有CUDA應用程序,以便在中國硬件平台上部署的能力。
培訓結束後,參與者將能夠:
- 評估現有CUDA工作負載與中國芯片替代方案的兼容性。
- 將CUDA代碼庫移植到華為CANN、Biren SDK和Cambricon BANGPy環境中。
- 比較性能並識別跨平台的優化點。
- 解決跨架構支持和部署中的實際挑戰。
課程形式
- 互動式講座和討論。
- 實踐代碼翻譯和性能比較實驗。
- 專注於多GPU適應策略的指導練習。
課程定制選項
- 如需根據您的平台或CUDA項目定制培訓,請聯繫我們安排。
Performance Optimization on Ascend, Biren, and Cambricon
21 時間:Ascend、Biren 和 Cambricon 是中國領先的 AI 硬體平台,各自提供獨特的加速和性能分析工具,用於生產規模的 AI 工作負載。
這項由講師指導的培訓(線上或線下)針對高級 AI 基礎設施和性能工程師,旨在優化跨多個中國 AI 晶片平台的模型推理和訓練工作流程。
在培訓結束時,參與者將能夠:
- 在 Ascend、Biren 和 Cambricon 平台上進行模型基準測試。
- 識別系統瓶頸和記憶體/計算效率低下的問題。
- 應用圖層級、核心層級和操作層級的優化。
- 調整部署管道以提高吞吐量和減少延遲。
課程形式
- 互動式講座和討論。
- 在每個平台上實際使用性能分析和優化工具。
- 專注於實際調整情境的指導練習。
課程定制選項
- 如需根據您的性能環境或模型類型定制此課程,請聯繫我們安排。