Audio Processing

Extend deep learning workflows with audio and speech processing applications

Apply deep learning to audio and speech processing applications by using Deep Learning Toolbox™ together with Audio Toolbox™. For signal processing applications, see Signal Processing. For applications in wireless communications, see Wireless Communications.

App

Signal Labeler

Label signal attributes, regions, and points of interest, and extract features

Funzioni

espandi tutto

Data Management and Augmentation

`audioDatastore`	Datastore for collection of audio files
`audioDataAugmenter`	Augment audio data (Da R2019b)

Feature Extraction

`audioFeatureExtractor`	Streamline audio feature extraction (Da R2019b)
`openl3Embeddings`	Extract OpenL3 feature embeddings (Da R2022a)
`pitchnn`	Estimate pitch with deep learning neural network (Da R2021a)
`vggishEmbeddings`	Extract VGGish feature embeddings (Da R2022a)

Pretrained Networks

`yamnet`	(Not recommended) YAMNet neural network (Da R2020b)
`classifySound`	Classify sounds in audio signal (Da R2020b)
`crepe`	(Not recommended) CREPE neural network (Da R2021a)
`pitchnn`	Estimate pitch with deep learning neural network (Da R2021a)
`vggish`	(Not recommended) VGGish neural network (Da R2020b)
`vggishEmbeddings`	Extract VGGish feature embeddings (Da R2022a)
`openl3`	(Not recommended) OpenL3 neural network (Da R2021a)
`openl3Embeddings`	Extract OpenL3 feature embeddings (Da R2022a)
`vadnet`	(Not recommended) Voice activity detection (VAD) neural network (Da R2023a)
`detectspeechnn`	Detect boundaries of speech in audio signal using AI (Da R2023a)
`separateSpeakers`	Separate signal by speakers (Da R2023b)

Blocchi

espandi tutto

VGGish

VGGish	VGGish embeddings extraction network (Da R2022a)
VGGish Embeddings	Extract VGGish embeddings (Da R2022a)

YAMNet

YAMNet	YAMNet sound classification network (Da R2021b)
Sound Classifier	Classify sounds in audio signal (Da R2021b)

OpenL3

OpenL3	OpenL3 embeddings extraction network (Da R2022b)
OpenL3 Embeddings	Extract OpenL3 embeddings (Da R2022b)

CREPE

CREPE	CREPE deep pitch estimation neural network (Da R2023a)
Deep Pitch Estimator	Estimate pitch with CREPE deep learning neural network (Da R2023a)

Argomenti

Deep Learning for Audio Applications (Audio Toolbox)
Learn common tools and workflows to apply deep learning to audio applications.
Classify Sound Using Deep Learning (Audio Toolbox)
Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.
Adapt Pretrained Audio Network for New Data Using Deep Network Designer
This example shows how to interactively adapt a pretrained network to classify new audio signals using Deep Network Designer.
Audio Transfer Learning Using Experiment Manager
Configure an experiment that compares the performance of multiple pretrained networks applied to a speech command recognition task using transfer learning.
Compare Speaker Separation Models
Compare the performance, size, and speed of multiple deep learning speaker separation models.
Speaker Identification Using Custom SincNet Layer and Deep Learning
Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.
Dereverberate Speech Using Deep Learning Networks
Train a deep learning model that removes reverberation from speech.
Speech Command Recognition in Simulink
Detect the presence of speech commands in audio using a Simulink^® model.
Sequential Feature Selection for Audio Features
This example shows a typical workflow for feature selection applied to the task of spoken digit recognition.
Train Spoken Digit Recognition Network Using Out-of-Memory Audio Data
This example trains a spoken digit recognition network on out-of-memory audio data using a transformed datastore.
Train Spoken Digit Recognition Network Using Out-of-Memory Features
This example trains a spoken digit recognition network on out-of-memory auditory spectrograms using a transformed datastore.
Investigate Audio Classifications Using Deep Learning Interpretability Techniques
This example shows how to use interpretability techniques to investigate the predictions of a deep neural network trained to classify audio data.
Accelerate Audio Deep Learning Using GPU-Based Feature Extraction
Leverage GPUs for feature extraction to decrease the time required to train an audio deep learning model.

Informazioni complementari

Esempi in primo piano

Compress Machine Fault Recognition Neural Network Using Projection

Compress a pretrained acoustics-based machine fault recognition neural network using projection and principal component analysis.

Audio Processing

App

Funzioni

Data Management and Augmentation

Feature Extraction

Pretrained Networks

Blocchi

VGGish

YAMNet

OpenL3

CREPE

Argomenti

Informazioni complementari

Esempi in primo piano

Compress Machine Fault Recognition Neural Network Using Projection

Audio-Based Anomaly Detection for Machine Health Monitoring

3-D Speech Enhancement Using Trained Filter and Sum Network

3-D Sound Event Localization and Detection Using Trained Recurrent Convolutional Neural Network

Speaker Recognition Using x-vectors

Speaker Diarization Using x-vectors

Train Speech Command Recognition Model Using Deep Learning

Keyword Spotting in Noise Using MFCC and LSTM Networks

Denoise Speech Using Deep Learning Networks

Train Generative Adversarial Network (GAN) for Sound Synthesis

Voice Activity Detection in Noise Using Deep Learning

Speech Emotion Recognition

Acoustic Scene Recognition Using Late Fusion

End-to-End Deep Speaker Separation

Acoustics-Based Machine Fault Recognition

Audio Event Classification Using TensorFlow Lite on Raspberry Pi

Keyword Spotting in Noise Code Generation on Raspberry Pi

Speech Command Recognition Code Generation with Intel MKL-DNN

Acoustics-Based Machine Fault Recognition Code Generation

Speech Command Recognition on Raspberry Pi Using Simulink