neupy.algorithms.GrowingNeuralGas

class neupy.algorithms.GrowingNeuralGas[source]

Growing Neural Gas (GNG) algorithm.

Current algorithm has two modifications that hasn’t been mentioned in the paper, but they help to speed up training.

  • The n_start_nodes parameter provides possibility to increase number of nodes during initialization step. It’s useful when algorithm takes a lot of time building up large amount of neurons.
  • The min_distance_for_update parameter allows to speed up training when some data samples has neurons very close to them. The min_distance_for_update parameter controls threshold for the minimum distance for which we will want to update weights.
Parameters:
n_inputs : int

Number of features in each sample.

n_start_nodes : int

Number of nodes that algorithm generates from the data during the initialization step. Defaults to 2.

step : float

Step (learning rate) for the neuron winner. Defaults to 0.2.

neighbour_step : float

Step (learning rate) for the neurons that connected via edges with neuron winner. This value typically has to be smaller than step value. Defaults to 0.05.

max_edge_age : int

It means that if edge won’t be updated for max_edge_age iterations than it would be removed. The larger the value the more updates we allow to do before removing edge. Defaults to 100.

n_iter_before_neuron_added : int

Each n_iter_before_neuron_added weight update algorithm add new neuron. The smaller the value the more frequently algorithm adds new neurons to the network. Defaults to 1000.

error_decay_rate : float

This error decay rate would be applied to every neuron in the graph after each training iteration. It ensures that old errors will be reduced over time. Defaults to 0.995.

after_split_error_decay_rate : float

This decay rate reduces error for neurons with largest errors after algorithm added new neuron. This value typically lower than error_decay_rate. Defaults to 0.5.

max_nodes : int

Maximum number of nodes that would be generated during the training. This parameter won’t stop training when maximum number of nodes will be exceeded. Defaults to 1000.

min_distance_for_update : float

Parameter controls for which neurons we want to apply updates. In case if euclidean distance between data sample and closest neurons will be less than the min_distance_for_update value than update would be skipped for this data sample. Setting value to zero will disable effect provided by this parameter. Defaults to 0.

show_epoch : int

This property controls how often the network will display information about training. It has to be defined as positive integer. For instance, number 100 mean that network shows summary at 1st, 100th, 200th, 300th … and last epochs.

Defaults to 1.

shuffle_data : bool

If it’s True than training data will be shuffled before the training. Defaults to True.

signals : dict, list or function

Function that will be triggered after certain events during the training.

verbose : bool

Property controls verbose output in terminal. The True value enables informative output in the terminal and False - disable it. Defaults to False.

Notes

  • Unlike other algorithms this network doesn’t make predictions. Instead, it learns topological structure of the data in form of the graph. After that training, structure of the network can be extracted from the graph attribute.
  • In order to speed up training, it might be useful to increase the n_start_nodes parameter.
  • During the training it happens that nodes learn topological structure of one part of the data better than the other, mostly because of the different data sample density in different places. Increasing the min_distance_for_update can speed up training ignoring updates for the neurons that very close to the data sample. (below specified min_distance_for_update value). Training can be stopped in case if none of the neurons has been updated during the training epoch.

References

[1] A Growing Neural Gas Network Learns Topologies, Bernd Fritzke

Examples

>>> from neupy import algorithms
>>> from sklearn.datasets import make_blobs
>>>
>>> data, _ = make_blobs(
...     n_samples=1000,
...     n_features=2,
...     centers=2,
...     cluster_std=0.4,
... )
>>>
>>> neural_gas = algorithms.GrowingNeuralGas(
...     n_inputs=2,
...     shuffle_data=True,
...     verbose=True,
...     max_edge_age=10,
...     n_iter_before_neuron_added=50,
...     max_nodes=100,
... )
>>> neural_gas.graph.n_nodes
100
>>> len(neural_gas.graph.edges)
175
>>> edges = list(neural_gas.graph.edges.keys())
>>> neuron_1, neuron_2 = edges[0]
>>>
>>> neuron_1.weight
array([[-6.77166299,  2.4121606 ]])
>>> neuron_2.weight
array([[-6.829309  ,  2.27839633]])
Attributes:
graph : NeuralGasGraph instance

This attribute stores all neurons and connections between them in the form of undirected graph.

errors : list

Information about errors. It has two main attributes, namely train and valid. These attributes provide access to the training and validation errors respectively.

last_epoch : int

Value equals to the last trained epoch. After initialization it is equal to 0.

n_updates_made : int

Number of training updates applied to the network.

Methods

train(X_train, epochs=100) Network learns topological structure of the data. Learned structure will be stored in the graph attribute.
fit(*args, **kwargs) Alias to the train method.
initialize_nodes(data) Network initializes nodes randomly sampling n_start_nodes from the data. It would be applied automatically before the training in case if graph is empty. Note: Node re-initialization can reset network.
after_split_error_decay_rate = None[source]
error_decay_rate = None[source]
format_input_data(X)[source]
initialize_nodes(data)[source]
max_edge_age = None[source]
max_nodes = None[source]
min_distance_for_update = None[source]
n_inputs = None[source]
n_iter_before_neuron_added = None[source]
n_start_nodes = None[source]
neighbour_step = None[source]
one_training_update(X_train, y_train=None)[source]

Function would be trigger before run all training procedure related to the current epoch.

Parameters:
epoch : int

Current epoch number.

options = {'after_split_error_decay_rate': Option(class_name='GrowingNeuralGas', value=ProperFractionProperty(name="after_split_error_decay_rate")), 'error_decay_rate': Option(class_name='GrowingNeuralGas', value=ProperFractionProperty(name="error_decay_rate")), 'max_edge_age': Option(class_name='GrowingNeuralGas', value=IntProperty(name="max_edge_age")), 'max_nodes': Option(class_name='GrowingNeuralGas', value=IntProperty(name="max_nodes")), 'min_distance_for_update': Option(class_name='GrowingNeuralGas', value=NumberProperty(name="min_distance_for_update")), 'n_inputs': Option(class_name='GrowingNeuralGas', value=IntProperty(name="n_inputs")), 'n_iter_before_neuron_added': Option(class_name='GrowingNeuralGas', value=IntProperty(name="n_iter_before_neuron_added")), 'n_start_nodes': Option(class_name='GrowingNeuralGas', value=IntProperty(name="n_start_nodes")), 'neighbour_step': Option(class_name='GrowingNeuralGas', value=NumberProperty(name="neighbour_step")), 'show_epoch': Option(class_name='BaseNetwork', value=IntProperty(name="show_epoch")), 'shuffle_data': Option(class_name='BaseNetwork', value=Property(name="shuffle_data")), 'signals': Option(class_name='BaseNetwork', value=Property(name="signals")), 'step': Option(class_name='GrowingNeuralGas', value=NumberProperty(name="step")), 'verbose': Option(class_name='Verbose', value=VerboseProperty(name="verbose"))}[source]
predict(*args, **kwargs)[source]
step = None[source]
train(X_train, epochs=100)[source]

Method train neural network.

Parameters:
X_train : array-like
y_train : array-like or None
X_test : array-like or None
y_test : array-like or None
epochs : int

Defaults to 100.

epsilon : float or None

Defaults to None.