Deep Learning: Difference between revisions

From EMC23 - Satellite Of Love
Jump to navigation Jump to search
(14 intermediate revisions by the same user not shown)
Line 8: Line 8:
* Music Tagging
* Music Tagging
* Music Generation
* Music Generation
= Definitions =
Deep learning is a class of machine learning algorithms that[12](pp199–200) uses multiple layers to progressively extract higher-level features from the raw input.
Learning can be supervised, semi-supervised or unsupervised
'''FFNN''': The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.<br />
'''CNN''': Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.<br />
'''RNN/LSTM''': Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.<br />
'''Transformer''': A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.<br />


= Python =
= Python =
Line 33: Line 23:


Negative predictive value Specificity
Negative predictive value Specificity
== Tutorials ==
=== The Sound of AI ===
* [[Deep Learning (for Audio) with Python]] - 19 videos
* [[Audio Signal Processing for Machine Learning]] - 23 videos
* [[Generating Sound with Neural Networks]] - 14 videos
[https://www.youtube.com/c/ValerioVelardoTheSoundofAI Valerio Velardo - The Sound of AI]


= Deep learning  architectures used for music generation =
= Deep learning  architectures used for music generation =
Line 46: Line 45:
• recurrent (RNN).
• recurrent (RNN).


= architectural patterns =  
= Architectural Patterns =  
which could be applied to them)
which could be applied to them)


Line 71: Line 70:
https://scikit-learn.org/stable/<br />
https://scikit-learn.org/stable/<br />
https://pythonrepo.com/repo/nerdyrodent-VQGAN-CLIP-python-deep-learning<br />
https://pythonrepo.com/repo/nerdyrodent-VQGAN-CLIP-python-deep-learning<br />
<evlplayer id="player1" w="480" h="360" service="youtube" defaultid="MwtVkPKx3RA" />


= Terminology =
= Terminology =
Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.
'''Deep learning''' is a class of machine learning algorithms that[12](pp199–200) uses multiple layers to progressively extract higher-level features from the raw input.
 
'''Learning''' can be '''supervised''', '''semi-supervised''' or '''unsupervised'''
 
'''Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.
Outliers are data points that are significantly different from others in the same sample.
Outliers are data points that are significantly different from others in the same sample.


Glossary
'''FFNN''': The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.<br />
'''CNN''': Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.<br />
'''RNN/LSTM''': Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.<br />
'''Transformer''': A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.<br />
   
   


Bag of words: A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset.
'''Bag of words''': A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset.


A categorical label has a discrete set of possible values, such as "is a cat" and "is not a cat."
'''A categorical label''' has a discrete set of possible values, such as "is a cat" and "is not a cat."


Clustering. Unsupervised learning task that helps to determine if there are any naturally occurring groupings in the data.
'''Clustering'''. Unsupervised learning task that helps to determine if there are any naturally occurring groupings in the data.


CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.
'''CNN''': Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.


A continuous (regression) label does not have a discrete set of possible values, which means possibly an unlimited number of possibilities.
A continuous (regression) label does not have a discrete set of possible values, which means possibly an unlimited number of possibilities.


Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.
'''Data vectorization''': A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.


Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week).
'''Discrete''': A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week).


FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.
'''FFNN''': The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.


Hyperparameters are settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify.
'''Hyperparameters''' are settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify.


Log loss is used to calculate how uncertain your model is about the predictions it is generating.
'''Log loss''' is used to calculate how uncertain your model is about the predictions it is generating.


Hyperplane: A mathematical term for a surface that contains more than two planes.
'''Hyperplane''': A mathematical term for a surface that contains more than two planes.


Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.
'''Impute''' is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.


label refers to data that already contains the solution.
'''label''' refers to data that already contains the solution.


loss function is used to codify the model’s distance from this goal
'''loss function''' is used to codify the model’s distance from this goal


Machine learning, or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data.
'''Machine learning''', or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data.


Model accuracy is the fraction of predictions a model gets right. Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week). Continuous: Floating-point values with an infinite range of possible values. The opposite of categorical or discrete values, which take on a limited number of possible values.
'''Model''' accuracy is the fraction of predictions a model gets right. Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week). Continuous: Floating-point values with an infinite range of possible values. The opposite of categorical or discrete values, which take on a limited number of possible values.


Model inference is when the trained model is used to generate predictions.
'''Model inference''' is when the trained model is used to generate predictions.


model is an extremely generic program, made specific by the data used to train it.
model is an extremely generic program, made specific by the data used to train it.


Model parameters are settings or configurations the training algorithm can update to change how the model behaves.
'''Model parameters''' are settings or configurations the training algorithm can update to change how the model behaves.


Model training algorithms work through an interactive process where the current model iteration is analyzed to determine what changes can be made to get closer to the goal. Those changes are made and the iteration continues until the model is evaluated to meet the goals.
'''Model training algorithms''' work through an interactive process where the current model iteration is analyzed to determine what changes can be made to get closer to the goal. Those changes are made and the iteration continues until the model is evaluated to meet the goals.


Neural networks: a collection of very simple models connected together. These simple models are called neurons. The connections between these models are trainable model parameters called weights.
'''Neural networks''': a collection of very simple models connected together. These simple models are called neurons. The connections between these models are trainable model parameters called weights.


Outliers are data points that are significantly different from others in the same sample.
'''Outliers''' are data points that are significantly different from others in the same sample.


Plane: A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line.
'''Plane''': A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line.


Regression: A common task in supervised machine learning.
'''Regression''': A common task in supervised machine learning.


In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal.
In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal.


RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.
'''RNN/LSTM''': Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.
 
Silhouette coefficient: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A
 
Stop words: A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.
 
In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values.
 
Test dataset: The data withheld from the model during training, which is used to test how well your model will generalize to new data.
 
Training dataset: The data on which the model will be trained. Most of your data will be here.


Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.
'''Silhouette coefficient''': A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A


In unlabeled data, you don't need to provide the model with any kind of label or solution while the model is being trained.
'''Stop words''': A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.


In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.
In '''supervised learning''', every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values.


'''Test dataset''': The data withheld from the model during training, which is used to test how well your model will generalize to new data.


'''Training dataset''': The data on which the model will be trained. Most of your data will be here.


'''Transformer''': A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.


In '''unlabeled data''', you don't need to provide the model with any kind of label or solution while the model is being trained.


 
In '''unsupervised learning''', there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.
 
 
https://youtu.be/CNNmBtNcccE
 
 


Machine learning is synthesizing death metal. It might make your death metal radio DJ nervous – but it could also mean music software works with timbre and time in new ways. That news – plus some comical abuse of neural networks for writing genre-specific lyrics in genres like country – next.
Machine learning is synthesizing death metal. It might make your death metal radio DJ nervous – but it could also mean music software works with timbre and time in new ways. That news – plus some comical abuse of neural networks for writing genre-specific lyrics in genres like country – next.
Peter Kirn http://cdm.link/2019/04/now-ai-takes-on-writing-death-metal-country-music-hits-more/
Peter Kirn http://cdm.link/2019/04/now-ai-takes-on-writing-death-metal-country-music-hits-more/
<evlplayer id="player1" w="480" h="360" service="youtube" defaultid="CNNmBtNcccE" />

Revision as of 22:44, 29 August 2021

Deep Learning is a subset of Machine Learning (which would also include Reinforcement learning)

Applications[edit]

  • Speech Recognition
  • Voice Based emotion classification
  • Noise recognition
  • Musical Genre Instrument Mood Classificatiob
  • Music Tagging
  • Music Generation

Python[edit]

Librosa is used to analyse and manipulate audio

Tensorflow is used to train models

Keras High Level Library for Tensorflow

Accuracy False positive rate Precision

Confusion matrix False negative rate Recall

F1 ScoreLog LossROC curve

Negative predictive value Specificity

Tutorials[edit]

The Sound of AI[edit]

Valerio Velardo - The Sound of AI

Deep learning architectures used for music generation[edit]

From this basic building block, we will describe in the following sections the main types of deep learning architectures used for music generation (as well as for other purposes):

• feedforward,

• autoencoder,

• restricted Boltzmann machine (RBM),

• recurrent (RNN).

Architectural Patterns[edit]

which could be applied to them)

• convolutional,

• conditioning,

• adversarial.

Links[edit]

Audio Handling Basics: Process Audio Files In Command-Line or Python

https://benhayes.net/projects/nws/#audio-examples

Do Androids Dream of Electric Beats?
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
Intro to Audio Analysis: Recognizing Sounds Using Machine Learning
https://magenta.tensorflow.org/music-vae
https://musicalmetacreation.org/mume2018/proceedings/Sturm.pdf
https://ccrma.stanford.edu/~blackrse/algorithm.html
https://magenta.tensorflow.org/music-vae
https://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset
https://docs.microsoft.com/en-us/cognitive-toolkit/
https://scikit-learn.org/stable/
https://pythonrepo.com/repo/nerdyrodent-VQGAN-CLIP-python-deep-learning

Terminology[edit]

Deep learning is a class of machine learning algorithms that[12](pp199–200) uses multiple layers to progressively extract higher-level features from the raw input.

Learning can be supervised, semi-supervised or unsupervised

Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset. Outliers are data points that are significantly different from others in the same sample.

FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.
CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.
RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.
Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.


Bag of words: A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset.

A categorical label has a discrete set of possible values, such as "is a cat" and "is not a cat."

Clustering. Unsupervised learning task that helps to determine if there are any naturally occurring groupings in the data.

CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.

A continuous (regression) label does not have a discrete set of possible values, which means possibly an unlimited number of possibilities.

Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.

Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week).

FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.

Hyperparameters are settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify.

Log loss is used to calculate how uncertain your model is about the predictions it is generating.

Hyperplane: A mathematical term for a surface that contains more than two planes.

Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.

label refers to data that already contains the solution.

loss function is used to codify the model’s distance from this goal

Machine learning, or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data.

Model accuracy is the fraction of predictions a model gets right. Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week). Continuous: Floating-point values with an infinite range of possible values. The opposite of categorical or discrete values, which take on a limited number of possible values.

Model inference is when the trained model is used to generate predictions.

model is an extremely generic program, made specific by the data used to train it.

Model parameters are settings or configurations the training algorithm can update to change how the model behaves.

Model training algorithms work through an interactive process where the current model iteration is analyzed to determine what changes can be made to get closer to the goal. Those changes are made and the iteration continues until the model is evaluated to meet the goals.

Neural networks: a collection of very simple models connected together. These simple models are called neurons. The connections between these models are trainable model parameters called weights.

Outliers are data points that are significantly different from others in the same sample.

Plane: A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line.

Regression: A common task in supervised machine learning.

In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal.

RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.

Silhouette coefficient: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A

Stop words: A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.

In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values.

Test dataset: The data withheld from the model during training, which is used to test how well your model will generalize to new data.

Training dataset: The data on which the model will be trained. Most of your data will be here.

Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.

In unlabeled data, you don't need to provide the model with any kind of label or solution while the model is being trained.

In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.

Machine learning is synthesizing death metal. It might make your death metal radio DJ nervous – but it could also mean music software works with timbre and time in new ways. That news – plus some comical abuse of neural networks for writing genre-specific lyrics in genres like country – next. Peter Kirn http://cdm.link/2019/04/now-ai-takes-on-writing-death-metal-country-music-hits-more/