Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Foundations of Audio Classification
- Types of sound events: environmental, mechanical, and human-generated.
- Overview of use cases: surveillance, monitoring, and automation.
- Differences between audio classification, detection, and segmentation.
Audio Data and Feature Extraction
- Types of audio files and formats.
- Considerations for sampling rate, windowing, and frame size.
- Extracting MFCCs, chroma features, and mel-spectrograms.
Data Preparation and Annotation
- Utilizing UrbanSound8K, ESC-50, and custom datasets.
- Labeling sound events and defining temporal boundaries.
- Dataset balancing and audio augmentation techniques.
Building Audio Classification Models
- Applying convolutional neural networks (CNNs) for audio analysis.
- Model inputs: raw waveforms versus extracted features.
- Loss functions, evaluation metrics, and managing overfitting.
Event Detection and Temporal Localization
- Detection strategies: frame-based and segment-based approaches.
- Post-processing detections using thresholds and smoothing.
- Visualizing predictions on audio timelines.
Advanced Topics and Real-Time Processing
- Transfer learning for scenarios with limited data.
- Deploying models using TensorFlow Lite or ONNX.
- Streaming audio processing and addressing latency concerns.
Project Development and Application Scenarios
- Designing a complete pipeline from ingestion to classification.
- Developing a proof-of-concept for surveillance, quality control, or monitoring.
- Implementing logging, alerting, and integration with dashboards or APIs.
Summary and Next Steps
Requirements
- A solid understanding of machine learning concepts and model training.
- Proficiency in Python programming and data preprocessing.
- Familiarity with the fundamentals of digital audio.
Target Audience
- Data scientists.
- Machine learning engineers.
- Researchers and developers specializing in audio signal processing.
21 Hours