GPU Programming with OpenACC培訓
OpenACC 是異構程式設計的開放標準,它使代碼能夠在不同的平臺和設備上運行,例如多核 CPU、GPU、FPGA 等。OpenACC 提供了一個高級抽象,允許程式師使用指令和子句對代碼進行註釋,而無需修改原始代碼結構或語法。OpenACC 處理目標設備的並行化、數據移動和優化細節,同時保留代碼的可移植性和可讀性。
這種以講師為主導的現場培訓(遠端或遠端)面向希望使用 OpenACC 對異構設備進行程式設計並利用其並行性的初級到中級開發人員。
在培訓結束時,參與者將能夠:
- 設置一個開發環境,其中包含 OpenACC SDK、支援 OpenACC 的設備和 Visual Studio 代碼。
- 創建一個基本的 OpenACC 程式,用於在設備上執行向量加法並從設備記憶體中檢索結果。
- 使用 OpenACC 指令和子句對代碼進行註釋,並指定並行區域、數據移動和優化選項。
- 使用 OpenACC API 查詢裝置資訊、設置設備編號、處理錯誤和同步事件。
- 使用 OpenACC 庫和互操作性功能將 OpenACC 與其他程式設計模型(如 CUDA、OpenMP 和 MPI)集成。
- 使用 OpenACC 工具分析和調試 OpenACC 程式,並確定性能瓶頸和機會。
- 使用數據局部性、迴圈融合、內核融合和自動調優等技術優化 OpenACC 程式。
課程形式
- 互動講座和討論。
- 大量的練習和練習。
- 在現場實驗室環境中動手實施。
課程定製選項
- 如需申請本課程的定製培訓,請聯繫我們安排。
課程簡介
介紹
- 什麼是 OpenACC?
- OpenACC 與 OpenCL 與 CUDA 與 SYCL
- OpenACC 功能和體系結構概述
- 設置開發環境
開始
- 使用 Visual Studio 代碼創建新的 OpenACC 專案
- 探索項目結構和檔
- 編譯和運行程式
- 使用 printf 和 fprintf 顯示輸出
OpenACC 指令和條款
- 瞭解 OpenACC 指令和子句在主機和設備代碼中的作用
- 使用 OpenACC 並行指令和子句創建並行區域並指定 gang、worker 和 vector 的數量
- 使用 OpenACC 內核指令和子句創建內核區域,並讓編譯器決定並行性
- 使用 OpenACC 循環指令和子句並行化迴圈並指定迴圈分佈、摺疊、縮減和平鋪
- 使用 OpenACC 資料指令和子句管理數據行動和資料區域
- 使用 OpenACC 更新指令和子句在主機和設備之間同步數據
- 使用 OpenACC 快取指令和子句來改進資料重用和局部性
- 使用 OpenACC 例程指令和子句創建設備函數並指定函數類型和向量長度
- 使用 OpenACC wait 指令和子句同步事件和依賴項
OpenACC 介面
- 瞭解 OpenACC API 在主機程式中的作用
- 使用 OpenACC API 查詢設備資訊和功能
- 使用 OpenACC API 設定裝置編號和設備類型
- 使用 OpenACC API 處理錯誤和異常
- 使用 OpenACC API 建立和同步事件
OpenACC 庫和互操作性
- 瞭解 OpenACC 庫和互操作性功能在設備程式中的作用
- 使用 OpenACC 庫(如數學庫、隨機庫和複數庫)執行常見任務和操作
- 使用 OpenACC 互操作性功能(如 deviceptr、use_device 和 acc_memcpy)將 OpenACC 與其他程式設計模型(如 CUDA、OpenMP 和 MPI)集成
- 使用 OpenACC 互操作性功能(如 host_data 和 declare)將 OpenACC 與 GPU 庫(如 cuBLAS 和 cuFFT)集成
OpenACC 工具
- 瞭解 OpenACC 工具在開發過程中的作用
- 使用 OpenACC 工具分析和調試 OpenACC 程式,並確定性能瓶頸和機會
- 使用 OpenACC 工具(如 PGI 編譯器、NVIDIA Nsight Systems 和 Allinea Forge)來衡量和提高執行時間和資源利用率
優化
- 了解影響 OpenACC 程式性能的因素
- 使用 OpenACC 指令和子句來優化數據局部性並減少數據傳輸
- 使用 OpenACC 指令和子句來優化迴圈並行性和融合
- 使用 OpenACC 指令和子句優化內核併行性和融合
- 使用 OpenACC 指令和子句優化矢量化和自動調整
摘要和後續步驟
最低要求
- 瞭解 C/C++ 或 Fortran 語言和並行程式設計概念
- 計算機體系結構和記憶體層次結構的基礎知識
- 具有命令行工具和代碼編輯器的經驗
觀眾
- 希望瞭解如何使用 OpenACC 對異構設備進行程式設計並利用其並行性的開發人員
- 希望編寫可在不同平臺和設備上運行的可移植且可擴展代碼的開發人員
- 希望探索異構程式設計的高級方面並優化其代碼生產力的程式師
Open Training Courses require 5+ participants.
GPU Programming with OpenACC培訓 - Booking
GPU Programming with OpenACC培訓 - Enquiry
GPU Programming with OpenACC - 咨詢詢問
咨詢詢問
Upcoming Courses
相關課程
Developing AI Applications with Huawei Ascend and CANN
21 時間:Huawei Ascend is a family of AI processors designed for high-performance inference and training.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI engineers and data scientists who wish to develop and optimize neural network models using Huawei’s Ascend platform and the CANN toolkit.
By the end of this training, participants will be able to:
- Set up and configure the CANN development environment.
- Develop AI applications using MindSpore and CloudMatrix workflows.
- Optimize performance on Ascend NPUs using custom operators and tiling.
- Deploy models to edge or cloud environments.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Huawei Ascend and CANN toolkit in sample applications.
- Guided exercises focused on model building, training, and deployment.
Course Customization Options
- To request a customized training for this course based on your infrastructure or datasets, please contact us to arrange.
Deploying AI Models with CANN and Ascend AI Processors
14 時間:CANN (Compute Architecture for Neural Networks) is Huawei’s AI compute stack for deploying and optimizing AI models on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI developers and engineers who wish to deploy trained AI models efficiently to Huawei Ascend hardware using the CANN toolkit and tools such as MindSpore, TensorFlow, or PyTorch.
By the end of this training, participants will be able to:
- Understand the CANN architecture and its role in the AI deployment pipeline.
- Convert and adapt models from popular frameworks to Ascend-compatible formats.
- Use tools like ATC, OM model conversion, and MindSpore for edge and cloud inference.
- Diagnose deployment issues and optimize performance on Ascend hardware.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab work using CANN tools and Ascend simulators or devices.
- Practical deployment scenarios based on real-world AI models.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
GPU Programming on Biren AI Accelerators
21 時間:Biren AI Accelerators are high-performance GPUs designed for AI and HPC workloads with support for large-scale training and inference.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level developers who wish to program and optimize applications using Biren’s proprietary GPU stack, with practical comparisons to CUDA-based environments.
By the end of this training, participants will be able to:
- Understand Biren GPU architecture and memory hierarchy.
- Set up the development environment and use Biren’s programming model.
- Translate and optimize CUDA-style code for Biren platforms.
- Apply performance tuning and debugging techniques.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Biren SDK in sample GPU workloads.
- Guided exercises focused on porting and performance tuning.
Course Customization Options
- To request a customized training for this course based on your application stack or integration needs, please contact us to arrange.
Cambricon MLU Development with BANGPy and Neuware
21 時間:Cambricon MLUs (Machine Learning 单元) 是专为边缘和数据中心场景中的推理和训练优化的AI芯片。
本次由讲师指导的培训(线上或线下)面向中级开发者,旨在帮助他们使用BANGPy框架和Neuware SDK在Cambricon MLU硬件上构建和部署AI模型。
通过本次培训,参与者将能够:
- 设置和配置BANGPy和Neuware开发环境。
- 开发和优化基于Python和C++的模型,适用于Cambricon MLUs。
- 将模型部署到运行Neuware运行时的边缘和数据中心设备。
- 将ML工作流与MLU特定的加速功能集成。
课程形式
- 互动式讲座和讨论。
- 动手实践,使用BANGPy和Neuware进行开发和部署。
- 指导练习,专注于优化、集成和测试。
课程定制选项
- 如需根据您的Cambricon设备型号或使用场景定制本次培训,请联系我们安排。
Introduction to CANN for AI Framework Developers
7 時間:CANN (Compute Architecture for Neural Networks) is Huawei’s AI computing toolkit used to compile, optimize, and deploy AI models on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at beginner-level AI developers who wish to understand how CANN fits into the model lifecycle from training to deployment, and how it works with frameworks like MindSpore, TensorFlow, and PyTorch.
By the end of this training, participants will be able to:
- Understand the purpose and architecture of the CANN toolkit.
- Set up a development environment with CANN and MindSpore.
- Convert and deploy a simple AI model to Ascend hardware.
- Gain foundational knowledge for future CANN optimization or integration projects.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with simple model deployment.
- Step-by-step walkthrough of the CANN toolchain and integration points.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN for Edge AI Deployment
14 時間:Huawei's Ascend CANN toolkit enables powerful AI inference on edge devices such as the Ascend 310. CANN provides essential tools for compiling, optimizing, and deploying models where compute and memory are constrained.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI developers and integrators who wish to deploy and optimize models on Ascend edge devices using the CANN toolchain.
By the end of this training, participants will be able to:
- Prepare and convert AI models for Ascend 310 using CANN tools.
- Build lightweight inference pipelines using MindSpore Lite and AscendCL.
- Optimize model performance for limited compute and memory environments.
- Deploy and monitor AI applications in real-world edge use cases.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab work with edge-specific models and scenarios.
- Live deployment examples on virtual or physical edge hardware.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Understanding Huawei’s AI Compute Stack: From CANN to MindSpore
14 時間:Huawei’s AI stack — from the low-level CANN SDK to the high-level MindSpore framework — offers a tightly integrated AI development and deployment environment optimized for Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level technical professionals who wish to understand how the CANN and MindSpore components work together to support AI lifecycle management and infrastructure decisions.
By the end of this training, participants will be able to:
- Understand the layered architecture of Huawei’s AI compute stack.
- Identify how CANN supports model optimization and hardware-level deployment.
- Evaluate the MindSpore framework and toolchain in relation to industry alternatives.
- Position Huawei's AI stack within enterprise or cloud/on-prem environments.
Format of the Course
- Interactive lecture and discussion.
- Live system demos and case-based walkthroughs.
- Optional guided labs on model flow from MindSpore to CANN.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Optimizing Neural Network Performance with CANN SDK
14 時間:CANN SDK (Compute Architecture for Neural Networks) is Huawei’s AI compute foundation that allows developers to fine-tune and optimize the performance of deployed neural networks on Ascend AI processors.
This instructor-led, live training (online or onsite) is aimed at advanced-level AI developers and system engineers who wish to optimize inference performance using CANN’s advanced toolset, including the Graph Engine, TIK, and custom operator development.
By the end of this training, participants will be able to:
- Understand CANN's runtime architecture and performance lifecycle.
- Use profiling tools and Graph Engine for performance analysis and optimization.
- Create and optimize custom operators using TIK and TVM.
- Resolve memory bottlenecks and improve model throughput.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with real-time profiling and operator tuning.
- Optimization exercises using edge-case deployment examples.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN SDK for Computer Vision and NLP Pipelines
14 時間:The CANN SDK (Compute Architecture for Neural Networks) provides powerful deployment and optimization tools for real-time AI applications in computer vision and NLP, especially on Huawei Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at intermediate-level AI practitioners who wish to build, deploy, and optimize vision and language models using the CANN SDK for production use cases.
By the end of this training, participants will be able to:
- Deploy and optimize CV and NLP models using CANN and AscendCL.
- Use CANN tools to convert models and integrate them into live pipelines.
- Optimize inference performance for tasks like detection, classification, and sentiment analysis.
- Build real-time CV/NLP pipelines for edge or cloud-based deployment scenarios.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on lab with model deployment and performance profiling.
- Live pipeline design using real CV and NLP use cases.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Building Custom AI Operators with CANN TIK and TVM
14 時間:CANN TIK (Tensor Instruction Kernel) and Apache TVM enable advanced optimization and customization of AI model operators for Huawei Ascend hardware.
This instructor-led, live training (online or onsite) is aimed at advanced-level system developers who wish to build, deploy, and tune custom operators for AI models using CANN’s TIK programming model and TVM compiler integration.
By the end of this training, participants will be able to:
- Write and test custom AI operators using the TIK DSL for Ascend processors.
- Integrate custom ops into the CANN runtime and execution graph.
- Use TVM for operator scheduling, auto-tuning, and benchmarking.
- Debug and optimize instruction-level performance for custom computation patterns.
Format of the Course
- Interactive lecture and demonstration.
- Hands-on coding of operators using TIK and TVM pipelines.
- Testing and tuning on Ascend hardware or simulators.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Migrating CUDA Applications to Chinese GPU Architectures
21 時間:中國的GPU架構,如Huawei Ascend、Biren和Cambricon MLU,提供了專為本地AI和HPC市場量身定制的CUDA替代方案。
這項由講師指導的培訓(線上或線下)旨在為高級GPU程式設計師和基礎設施專家提供遷移和優化現有CUDA應用程序,以便在中國硬件平台上部署的能力。
培訓結束後,參與者將能夠:
- 評估現有CUDA工作負載與中國芯片替代方案的兼容性。
- 將CUDA代碼庫移植到華為CANN、Biren SDK和Cambricon BANGPy環境中。
- 比較性能並識別跨平台的優化點。
- 解決跨架構支持和部署中的實際挑戰。
課程形式
- 互動式講座和討論。
- 實踐代碼翻譯和性能比較實驗。
- 專注於多GPU適應策略的指導練習。
課程定制選項
- 如需根據您的平台或CUDA項目定制培訓,請聯繫我們安排。
Performance Optimization on Ascend, Biren, and Cambricon
21 時間:Ascend、Biren 和 Cambricon 是中國領先的 AI 硬體平台,各自提供獨特的加速和性能分析工具,用於生產規模的 AI 工作負載。
這項由講師指導的培訓(線上或線下)針對高級 AI 基礎設施和性能工程師,旨在優化跨多個中國 AI 晶片平台的模型推理和訓練工作流程。
在培訓結束時,參與者將能夠:
- 在 Ascend、Biren 和 Cambricon 平台上進行模型基準測試。
- 識別系統瓶頸和記憶體/計算效率低下的問題。
- 應用圖層級、核心層級和操作層級的優化。
- 調整部署管道以提高吞吐量和減少延遲。
課程形式
- 互動式講座和討論。
- 在每個平台上實際使用性能分析和優化工具。
- 專注於實際調整情境的指導練習。
課程定制選項
- 如需根據您的性能環境或模型類型定制此課程,請聯繫我們安排。