Introduction
There are many Neural Network libraries available on the internet today. For example, you're familiar with Python, or you can take advantage of TensorFlow, sci-kit learn. Also in Java, you can use DeepLearning4J. From my mentor's recommendation, I'd switch to Torch, a good and easy to know framework for working with Neural Network models. In this tutorial, I will focus on how-to use Torch for beginners, by going through my Hello World program. In my opinion, learning from Hello World program is a good approach for getting started with a new programming language. Since this tutorial goes online, this has been the first to introduce working with Torch on Microsoft's Windows environment in an easy way so far.
Content
Prerequisites
In order to work with my example program, and also to know how Torch works, you need to have the following prerequisites:
- Microsoft Windows 10 (Build 1607), aka Windows 10 Anniversary Update or higher versions
- Bash on Ubuntu on Windows must be enabled [Readme]
- Torch in virtual Ubuntu on Windows must be installed [Readme]
- Basic understanding of Lua programming language [Readme]
- Basic understanding of machine learning, artificial neural network [ML | ANN]
Neural Network with Torch
In Torch, the nn is the main package to build and train from a simple to complex neural network model. To begin with Torch, you need to define the following contents:
- Model
- Training
- Data
- Evaluation
Model
The first thing in working with Torch is that you need to define a model. Which means whether you want a feed-forward network, a convolutional network, or a recurrent neural network. You also need to define the number of hidden layers, the number of hidden
units for each layer, and the activation function which you want to use on each layer.
The nn package defines the containers as follows:
nn.Sequential: plugs layers in a feed-forward fully connected mannernn.Parallel: plugs each element of input Tensor to different layersnn.Concat: concatenates in one layer several modules along dimensiondimnn.Bottle: allows any dimensionality input be forwarded through a module
nn.Sequential since it's the most easy approach to feed data to the network, thus the program will look like:
require "nn"
mlp = nn.Sequential()
Next, to tranform data between layers, we can simply use the linear transfer function (
nn.Linear) or the non-linear representation of input data (nn.Sigmoid, nn.Tanh,
nn.ReLU, ...). Let's assume that we want a network with two hidden layers, a 4-dimensional input, 4 units in each layer, and the transfer function is nn.Tanh.
input = 4
output = 2
hiddenLayer1 = 4
hiddenLayer2 = 4
mlp:add(nn.Linear(input, hiddenLayer1))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(hiddenLayer1, hiddenLayer2))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(hiddenLayer2, output))
mlp:add(nn.LogSoftMax())
The last two lines in the program above declare the output layer. In this case, we define an output in 2 dimensions, which that means, our task is binary classificaton, and thus the nn.LogSoftMax() is the most
common choice.We should print our model to check whether it fits our need, by insert the following line:
print(mlp)
At this point of the program, you can compile and run it. To execute this program, first, you have to save this as the .lua extension, for example, save as sample.lua. Then, open
Bash on Ubuntu on Windows, and type the command as follows (you can use cd command to change to the directory of your saved file).
th sample.lua
Now the program is executed, the following lines will be printed on the screen:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> output]
(1): nn.Linear(4 -> 4)
(2): nn.Tanh
(3): nn.Linear(4 -> 2)
(4): nn.LogSoftMax
}
To check if your model works properly , you can take the advantage of Module:forward.
preTest = mlp:forward(torch.randn(1,10))
print(preTest)
When execute the program, it will return a Tensor data of size 1 x 2, in which the preTest[1][i] is log probability belongs to class i of the input, respectively. Note that, in Lua,
the index of Tensor starts at 1. Below is the graphical representation of our NN model.
Training
In previous section, we had our NN model. Now, we have to define a training algorithm, as nn package provides us two approaches: (1) implement your own training algorithm, that means, you define from taking input, feeding to layers,
computing gradients, and update model's paremeters; or (2) you can take advantage of pre-implemented nn.StochasticGradient method. The nn.StochasticGradient takes the our defined model, and a loss function (Criterion).
There are many Criterion in nn package (e.g., CrossEntropyCriterion, MSECriterion, CosineEmbeddingCriterion), and we use the negative log-likelihood criterion, since
it usually goes with nn.LogSoftMax. Besides that, another important parameter for nn.StochasticGradient is learningRate.
criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(mlp, criterion)
trainer.learningRate = 0.1
Data
Until now, the last important thing is training dataset. Let's assume our dataset is stored in CSV format. Since Torch doesn't support splitting method, we
need to define such method as follows. Notice that Torch can't represent zero number (0) in dataset, thus if you want to represent binary class (positive/negative), you should use pair {1,2}.
-- Split input string at comma symbols
function string:splitAtCommas()
local sep, output = ",", {}
local pattern = string.format("([^%s]+)", sep)
self:gsub(pattern, function(c) output[#output+1] = c end)
return output
end
-- Read the dataFile and return a variable in Tensor data structure
function loadData(dataFile)
local dataset = {}
local i = 1
for line in io.lines(dataFile) do
local values = line:splitAtCommas()
local y = torch.Tensor(1)
y[1] = values[#values] -- the last number in line is class
values[#values] = nil
local x = torch.Tensor(values) -- all other numbers are input
dataset[i] = {x, y}
i = i + 1
end
function dataset:size() return (i - 1) end -- the requirement mentioned
return dataset
end
Then, the method nn.StochasticGradient:train is called to begin training process.
---- Load the dataset
dataset = loadData("train.csv")
---- Training model with given dataset
trainer:train(dataset)
Evaluation
After successfully training our NN model, we now need to evaluate and compute the accuracy of our model. This step can be done by using the two functions below. The first one, named argmax(v); and the second is evaluation(filePath),
which takes the test dataset (in CSV format) and returns the accuracy of the current model (in percentage).
---- argmax
function argmax(v)
local max = torch.max(v)
for i = 1, v:size(1) do
if v[i] == max then
return i
end
end
end
---- Evaluate and compute the accuracy
function evaluation(filePath)
local total = 0
local positive = 0
for line in io.lines(filePath) do
local values = line:splitAtCommas()
local y = torch.Tensor(1)
y[1] = values[#values]
values[#values] = nil
local x = torch.Tensor(values)
local prediction = argmax(mlp:forward(x))
if math.floor(prediction) == math.floor(y[1]) then
positive = positive + 1
end
total = total + 1
end
return (positive / total) * 100
end
---- Read the testset and compute the accuracy
accuracy = evaluation("test.csv")
print("Accuracy(%) is " .. accuracy)
If we want to view the weight matrices of our model, we can use the command below.
---- Print the weight matrix
print("Weights of saved model: ")
print(mlp:get(1)) -- Get the first module of our model, i.e. nn.Linear(4 -> 4)
print(mlp:get(1).weight) -- Get the weight matrix of that layer
In case we want to save the model and load it later, we can use the following lines of code.
---- Save the model to file
torch.save("file.th", mlp)
---- Load the saved model
mlp2 = torch.load("file.th")
print(mlp2:get(1).weight)
Contributors
- Phuc Duong - [email protected]
- Duy Nguyen (Student) - [email protected]
- Huy Nguyen (Student) - [email protected]
Contact
This tutorial may contain mistakes, please feel free to send us your feedback via email!