Predict prices for houses in the area of Boston

For this article we are going to predict house prices using Conjugate Gradient algorithm.

For the beginning we should load the data.

from sklearn import datasets

dataset = datasets.load_boston()
data, target = dataset.data, dataset.target

Let’s look closer into the data.

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD
0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1
0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2
0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2
0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3
0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3
TAX PTRATIO B LSTAT MEDV
296 15.3 396.90 4.98 24.0
242 17.8 396.90 9.14 21.6
242 17.8 392.83 4.03 34.7
222 18.7 394.63 2.94 33.4
222 18.7 396.90 5.33 36.2

Data contains 14 columns. The last column MEDV is a median value of owner-occupied homes in $1000’s. The goal is to predict this prices. Other columns we can use for Neural Network training. All columns description you can find below.

  • CRIM per capita crime rate by town
  • ZN proportion of residential land zoned for lots over 25,000 sq.ft.
  • INDUS proportion of non-retail business acres per town
  • CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  • NOX nitric oxides concentration (parts per 10 million)
  • RM average number of rooms per dwelling
  • AGE proportion of owner-occupied units built prior to 1940
  • DIS weighted distances to five Boston employment centres
  • RAD index of accessibility to radial highways
  • TAX full-value property-tax rate per $10,000
  • PTRATIO pupil-teacher ratio by town
  • B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  • LSTAT % lower status of the population

From data set description we can find that there are 13 continuous attributes (including “class” attribute “MEDV”) and 1 binary-valued attribute. There is no multiple categorical data, so we don’t need to change feature dimension. But we already have one problem. If you look closer, you will find that every column has its own data range. This situation is a bad thing for Neural Network training, because input values ​​make different contributions to the calculation of the output values. Bigger values will be more important for Network which can be perceived as invalid assumption based on data. For example in the first row, in the table above, column B contains value 396.90 and column CRIM - 0.00632. To fix this issue we should transfrom all columns to get similar ranges.

from sklearn import preprocessing

data_scaler = preprocessing.MinMaxScaler()
target_scaler = preprocessing.MinMaxScaler()

data = data_scaler.fit_transform(data)
target = target_scaler.fit_transform(target)

After transformation data looks like this.

CRIM ZN INDUS CHAS NOX ...
0.000000 0.18 0.067815 0 0.314815 ...
0.000236 0.00 0.242302 0 0.172840 ...
0.000236 0.00 0.242302 0 0.172840 ...
0.000293 0.00 0.063050 0 0.150206 ...
0.000705 0.00 0.063050 0 0.150206 ...

All the data is now in the range between 0 and 1.

Then we should split our data set into train and validation. We use 85% of data for train.

from sklearn.model_selection import train_test_split
from neupy import environment

environment.reproducible()

x_train, x_test, y_train, y_test = train_test_split(
    data, target, train_size=0.85
)

Now we are ready to build Neural Network which will predict house prices.

from neupy import algorithms, layers

cgnet = algorithms.ConjugateGradient(
    connection=[
        layers.Input(13),
        layers.Sigmoid(50),
        layers.Sigmoid(1),
    ],
    search_method='golden',
    show_epoch=25,
    verbose=True,
    addons=[algorithms.LinearSearch],
)
Conjgate Gradient train

We define network with one hidden layer. Input size for this layer is 50. This value is just a guess. For better and more accurate result we should choose it with other methods, but for now we can use this value. As the main algorithm we take Conjugate Gradient. This implementation of backpropagation is a little bit different from main interpretation of Conjugate Gradient. For GradientDescent implementation we can’t guarantee that we get the local minimum in n-th steps (where n is the dimension). To optimize it we should use linear search. It will fix and set up better steps for Conjugate Gradient.

Now we are going to train the network. For training we set up 100 epochs. Also we will add test data into training function to check validation error on every epoch.

cgnet.train(x_train, y_train, x_test, y_test, epochs=100)
Conjgate Gradient train

To make sure that all training processes go in a right way we can check erros updates while the training is in process.

from neupy import plots
plots.error_plot(cgnet)
Conjgate Gradient train

Error minimization procedure looks fine. The problem is, that last error doesn’t show us the full picture of prediction accuracy. Our output is always between zero and one and we count the results always into Mean Square Error. To fix it, we are going to inverse our transformation for predicted and actual values and for accuracy measurment we will use Root Mean Square Logarithmic Error (RMSLE).

from neupy.estimators import rmsle

y_predict = cgnet.predict(x_test).round(1)
error = rmsle(target_scaler.inverse_transform(y_test),
              target_scaler.inverse_transform(y_predict))
print(error)

Now we can see that our error approximately equals to 0.22 which is pretty small. In the table below you can find 10 randomly chosen errors.

Actual Predicted
31.2 27.5
18.7 18.5
20.1 18.5
17.2 9.5
8.3 9.5
50.0 41.0
42.8 32.0
20.5 18.5
16.8 23.0
11.8 9.5

The results are good for the first network implementation. There are a lot of things which we can do to improve network results, but we will discuss them in an another article.