Classifying Urban sounds using Deep Learning
This package includes 3 main files: SC1_preprocessing.mlx, SC2_extract_feature.mlx, SC3_train_network.mlx. Other files such as: SoundClassify.m and SoundClassifySample.m will be used for library compiler.
For this project we will use a dataset called Urbansound8K. The dataset contains 8732 sound excerpts (<=4s) of urban sounds from 10 classes, which are:
- Air Conditioner
- Car Horn
- Children Playing
- Dog bark
- Engine Idling
- Gun Shot
- Street Music
The accompanying metadata contains a unique ID for each sound excerpt along with it's given class name.
A sample of this dataset is included with the accompanying git repo and the full dataset can be downloaded from https://urbansounddataset.weebly.com/urbansound8k.html.
Audio sample file data overview
These sound excerpts are digital audio files in .wav format.
Sound waves are digitised by sampling them at discrete intervals known as the sampling rate (typically 44.1kHz for CD quality audio meaning samples are taken 44,100 times per second).
Each sample is the amplitude of the wave at a particular time interval, where the bit depth determines how detailed the sample will be also known as the dynamic range of the signal (typically 16bit which means a sample can range from 65,536 amplitude values).
Deep Learning Workflow
Access Data -> Pre-processing -> Extract signal feature (Spectrogram) -> Train neural netwrok -> Deployment (optional).
Step 1: Data preparation with SC1_preprocessing.mlx:
Create new folder based on Class ID name and move the files into their class folder.
Step 2: Feature extraction with SC2_extract_feature.mlx:
Pre-processing audio data and extract spectrogram feature.
Convert audio signal to spectrogram with sampling time as fs and save the spectrogram as original audio file directory.
Step 3: Create neural network and train it wit SC3_train_network.mlx
From spectrogram data which has been extracted, we will create the simple neural network for training and classifying. The images are stored in the folder Spectrograms. The data for each class is seperated in subfolders, labelled by the folder name.
Split the data so that 80% of the images are used for training, 10% are used for validation, and the rest are used for testing. With my limited time, I just used 25% of whole dataset for training.
The accuracy of training is: 92% as picture below:
The accuracy of testing is: 91% with the confusion matrix as below:
Step 4: Deployment (Optional)
In this steo, I used MATLAB Comlier SDK to create python library.
SoundClassify.m file is the main function for creating the libray
SoundClassifySample.m is the sample for creating python sample drivier file. You can change the script in this file for another sample image.
You can see the result after running sample.
*Note: For running the libray, it must contains the "trainednet.mat" as the neural network and test image.
Hope it would be useful for everyone.