Audio in machine learning

The first and most vital step in order to be able to use audio in machine learning, is to understand how raw audio is being perceived by humans, and try to present in electronic format this information, in a way that resembles this perception. For this task, we extract features out of the audio signal.

The most basic information about the perception of human ear is going to be explored, without giving an excessive explanation on audio theory.

Recording of a music piece is the first step in audio representation on computers and devices.
Sound is, in a more simplistic approach, vibrations, or else changes in the air pressure \cite{WesternElectricCo}.\newline

Hire a custom writer who has experience.
It's time for you to order amazing papers!


order now

In digital recording, the differences in air pressure at a particular time and space can be captured, and the sound is represented as a sequence of discrete numbers (a waveform). This is different from the analog signal, where the values are continuous \cite{Proakis1992DigitalSP}.
We call every one of these numbers a sample, and we call sample rate the amount of samples per second. In reverse, these numbers can be converted back to sound when we listen to them from a device, with the opposite procedure.\newline