Cheat sheet


Algorithms that use Backpropagation training approach

Trainig algorithms

Class name Name
GradientDescent Classic Gradient Descent
MinibatchGradientDescent Mini-batch Gradient Descent
ConjugateGradient Conjugate Gradient
QuasiNewton quasi-Newton
LevenbergMarquardt Levenberg-Marquardt
Hessian Hessian
HessianDiagonal Hessian diagonal
Momentum Momentum
Quickprop Quickprop
Adadelta Adadelta
Adagrad Adagrad
Adam Adam
Adamax AdaMax

Regularization methods

Class name Name
WeightDecay Weight decay
WeightElimination Weight elimination
MaxNormRegularization Max-norm regularization

Learning rate update rules

Class name Name
LeakStepAdaptation Leak Step Adaptation
ErrDiffStepUpdate Error difference Update
LinearSearch Linear search by Golden Search or Brent
SearchThenConverge Search than converge
StepDecay Minimize step monotonically after each epoch


Class name Name
mixture_of_experts Mixture of Experts
DynamicallyAveragedNetwork Dynamically Averaged Network (DAN)

Neural Networks with Radial Basis Functions (RBFN)

Class name Name
GRNN Generalized Regression Neural Network (GRNN)
PNN Probabilistic Neural Network (PNN)
RBFKMeans Radial basis function K-means

Autoasociative Memory

Class name Name
DiscreteBAM Discrete BAM Network
DiscreteHopfieldNetwork Discrete Hopfield Network

Competitive Networks

Class name Name
ART1 Adaptive Resonance Theory (ART1) Network
SOFM Self-Organizing Feature Map (SOFM or SOM)
LVQ Learning Vector Quantization (LVQ)
LVQ2 Learning Vector Quantization 2 (LVQ2)
LVQ21 Learning Vector Quantization 2.1 (LVQ2.1)
LVQ3 Learning Vector Quantization 3 (LVQ3)

Linear networks

Class name Name
Perceptron Perceptron
LMS LMS Network
ModifiedRelaxation Modified Relaxation Network


Class name Name
Kohonen Kohonen
Instar Instar
HebbRule Hebbian Neural Network

Boltzmann Machine

Class name Name
RBM Boolean/Bernoulli Restricted Boltzmann Machine


Layers with activation function

Class name Description
Linear Layer with linear activation function.
Sigmoid Layer with sigmoid activation function.
HardSigmoid Layer with hard sigmoid activation function.
Step Layer with step activation function.
Tanh Layer with tanh activation function.
Relu Layer with ReLu activation function.
LeakyRelu Layer with Leaky ReLu activation function.
Elu Layer with ELU activation function.
PRelu Layer with Parametric ReLu activation function.
Softplus Layer with softplus activation function.
Softmax Layer with softmax activation function.

Convolutional layers

Class name Description
Convolution Convolutional layer

Recurrent layers

Class name Description
LSTM Long-Short Term Memory (LSTM) layer
GRU Gated Recurrent Unit (GRU) layer

Pooling layers

Class name Description
MaxPooling Maximum pooling layer
AveragePooling Average pooling layer
Upscale Upscale layer
GlobalPooling Global pooling layer

Normalization layers

Class name Description
BatchNorm Batch normalization layer
LocalResponseNorm Local Response Normalization layer

Stochastic layers

Class name Description
Dropout Dropout layer
GaussianNoise Add gaussian noise to the input

Merge layers

Class name Description
Elementwise Merge multiple input layers in one with elementwise function
Concatenate Concatenate multiple input layers in one based on the specified axes.
GatedAverage Average multiple layers based on the output from the gate layer.

Other layers

Class name Description
Input Layer defines input value’s feature shape
Reshape Reshape tensor input
Embedding Embedding layer accepts indeces as an input and returns rows from the weight matrix associated with these indeces.


Class name Description
vgg16 VGG16 network
vgg19 VGG19 network
squeezenet SqueezeNet network
alexnet AlexNet network
mixture_of_experts Mixture of Experts

Parameter Initialization Methods

from neupy import algorithms, layers, init

gdnet = algorithms.GradientDescent(
        layers.Relu(100, weight=init.HeNormal(), bias=init.HeNormal()),
        layers.Softmax(10, weight=init.Uniform(-0.01, 0.01)),

Class name Description
Constant Initialize weights with constant values
Normal Sample weights from the Normal distribution
Uniform Sample weights from the Uniformal distribution
Orthogonal Initialize matrix with orthogonal basis
HeNormal Kaiming He parameter initialization method based on the Normal distribution.
HeUniform Kaiming He parameter initialization method based on the Uniformal distribution.
XavierNormal Glorot Xavier parameter initialization method based on the Normal distribution.
XavierUniform Glorot Xavier parameter initialization method based on the Uniformal distribution.

Error functions

Function name Description
mae Mean absolute error
mse Mean squared error
rmse Root mean squared error
msle Mean squared logarithmic error
rmsle Root mean squared logarithmic error
binary_crossentropy Cross entropy error function for the binary classification
categorical_crossentropy Cross entropy error function for the multi-class classification
binary_hinge Hinge error function for the binary classification
categorical_hinge Hinge error function for the multi-class classification


Dataset name Description
load_digits Load 10 discrete digit images with shape (6, 4)
make_digits Load discrete digits that has additional noise.
make_reber Generate list of words valid by Grammar rules.
make_reber_classification Generate random dataset for Reber grammar classification.