Quasi-Newton algorithm. Every iteration quasi-Network method approximates inverse Hessian matrix with iterative updates. It doesn’t have step parameter. Instead, algorithm applies line search for the step parameter that satisfies strong Wolfe condition. Parameters that control wolfe search start with the wolfe_ prefix.

update_function : bfgs, dfp, sr1

Update function for the iterative inverse hessian matrix approximation. Defaults to bfgs.

  • bfgs - It’s rank 2 formula update. It can suffer from round-off error and inaccurate line searches.
  • dfp - DFP is a method very similar to BFGS. It’s rank 2 formula update. It can suffer from round-off error and inaccurate line searches.
  • sr1 - Symmetric rank 1 (SR1). Generates update for the inverse hessian matrix adding symmetric rank-1 matrix. It’s possible that there is no rank 1 updates for the matrix and in this case update won’t be applied and original inverse hessian will be returned.
h0_scale : float

Default Hessian matrix is an identity matrix. The h0_scale parameter scales identity matrix. Defaults to 1.

epsilon : float

Controls numerical stability for the update_function parameter. Defaults to 1e-7.

wolfe_maxiter : int

Controls maximun number of iteration during the line search that identifies optimal step size during the weight update stage. Defaults to 20.

wolfe_c1 : float

Parameter for Armijo condition rule. It’s used during the line search that identifies optimal step size during the weight update stage. Defaults 1e-4.

wolfe_c2 : float

Parameter for curvature condition rule. It’s used during the line search that identifies optimal step size during the weight update stage. Defaults 0.9.

network : list, tuple or LayerConnection instance

Network’s architecture. There are a few ways to define it.

  • List of layers. For instance, [Input(2), Tanh(4), Relu(1)].
  • Constructed layers. For instance, Input(2) >> Tanh(4) >> Relu(1).
loss : str or function

Error/loss function. Defaults to mse.

  • mae - Mean Absolute Error.
  • mse - Mean Squared Error.
  • rmse - Root Mean Squared Error.
  • msle - Mean Squared Logarithmic Error.
  • rmsle - Root Mean Squared Logarithmic Error.
  • categorical_crossentropy - Categorical cross entropy.
  • binary_crossentropy - Binary cross entropy.
  • binary_hinge - Binary hinge entropy.
  • categorical_hinge - Categorical hinge entropy.
  • Custom function which accepts two mandatory arguments. The first one is expected value and the second one is predicted value. Example:
def custom_func(expected, predicted):
    return expected - predicted
show_epoch : int

This property controls how often the network will display information about training. It has to be defined as positive integer. For instance, number 100 mean that network shows summary at 1st, 100th, 200th, 300th … and last epochs.

Defaults to 1.

shuffle_data : bool

If it’s True than training data will be shuffled before the training. Defaults to True.

signals : dict, list or function

Function that will be triggered after certain events during the training.

verbose : bool

Property controls verbose output in terminal. The True value enables informative output in the terminal and False - disable it. Defaults to False.

regularizer : function or None

Network’s regularizer.


  • Method requires all training data during propagation, which means it’s not allowed to use mini-batches.


[1] Yang Ding, Enkeleida Lushi, Qingguo Li,
Investigation of quasi-Newton methods for unconstrained optimization.
[2] Jorge Nocedal, Stephen J. Wright, Numerical Optimization.
Chapter 6, Quasi-Newton Methods, p. 135-163


>>> import numpy as np
>>> from neupy import algorithms
>>> from neupy.layers import *
>>> x_train = np.array([[1, 2], [3, 4]])
>>> y_train = np.array([[1], [0]])
>>> optimizer = algorithms.QuasiNewton(
...     Input(2) >> Sigmoid(3) >> Sigmoid(1),
...     update_function='bfgs'
... )
>>> optimizer.train(x_train, y_train, epochs=10)
errors : list

Information about errors. It has two main attributes, namely train and valid. These attributes provide access to the training and validation errors respectively.

last_epoch : int

Value equals to the last trained epoch. After initialization it is equal to 0.

n_updates_made : int

Number of training updates applied to the network.


predict(X) Predicts output for the specified input.
train(X_train, y_train, X_test=None, y_test=None, epochs=100) Train network. You can control network’s training procedure with epochs parameter. The X_test and y_test should be presented both in case network’s validation required after each training epoch.
fit(*args, **kwargs) Alias to the train method.
epsilon = None[source]
h0_scale = None[source]
options = {'epsilon': Option(class_name='QuasiNewton', value=NumberProperty(name="epsilon")), 'h0_scale': Option(class_name='QuasiNewton', value=NumberProperty(name="h0_scale")), 'loss': Option(class_name='BaseOptimizer', value=FunctionWithOptionsProperty(name="loss")), 'regularizer': Option(class_name='BaseOptimizer', value=Property(name="regularizer")), 'show_epoch': Option(class_name='BaseNetwork', value=IntProperty(name="show_epoch")), 'shuffle_data': Option(class_name='BaseNetwork', value=Property(name="shuffle_data")), 'signals': Option(class_name='BaseNetwork', value=Property(name="signals")), 'target': Option(class_name='BaseOptimizer', value=Property(name="target")), 'update_function': Option(class_name='QuasiNewton', value=ChoiceProperty(name="update_function")), 'verbose': Option(class_name='Verbose', value=VerboseProperty(name="verbose")), 'wolfe_c1': Option(class_name='WolfeLineSearchForStep', value=NumberProperty(name="wolfe_c1")), 'wolfe_c2': Option(class_name='WolfeLineSearchForStep', value=NumberProperty(name="wolfe_c2")), 'wolfe_maxiter': Option(class_name='WolfeLineSearchForStep', value=IntProperty(name="wolfe_maxiter"))}[source]
step = None[source]
update_function = None[source]