Batch Normalized Deep Boltzmann Machines

Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo, Dinh Phung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Training Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters β but learning scale parameters γ; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the
model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The ex- perimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature represen tation and classification accuracy (3.98% and 5.84% average improvement for pretrainingand no pretraining respectively).
Original languageEnglish
Title of host publicationProceedings of Asian Conference on Machine Learning 2018
EditorsJun Zhu, Ichiro Takeuchi
Place of PublicationUSA
PublisherProceedings of Machine Learning Research (PMLR)
Pages359-374
Number of pages16
Publication statusPublished - 2018
EventAsian Conference on Machine Learning 2018 - Beijing, China
Duration: 14 Nov 201816 Nov 2018
Conference number: 10th
http://www.acml-conf.org/2018/

Publication series

NameProceedings of Machine Learning Research
PublisherProceedings of Machine Learning Research (PMLR)
Volume95
ISSN (Print)1938-7228

Conference

ConferenceAsian Conference on Machine Learning 2018
Abbreviated titleACML 2018
CountryChina
CityBeijing
Period14/11/1816/11/18
Internet address

Cite this

Vu, H., Nguyen, T. D., Le, T., Luo, W., & Phung, D. (2018). Batch Normalized Deep Boltzmann Machines. In J. Zhu, & I. Takeuchi (Eds.), Proceedings of Asian Conference on Machine Learning 2018 (pp. 359-374). (Proceedings of Machine Learning Research; Vol. 95). USA: Proceedings of Machine Learning Research (PMLR).
Vu, Hung ; Nguyen, Tu Dinh ; Le, Trung ; Luo, Wei ; Phung, Dinh. / Batch Normalized Deep Boltzmann Machines. Proceedings of Asian Conference on Machine Learning 2018. editor / Jun Zhu ; Ichiro Takeuchi. USA : Proceedings of Machine Learning Research (PMLR), 2018. pp. 359-374 (Proceedings of Machine Learning Research).
@inproceedings{c2b8190ddb5549feaf48fcbf132405e7,
title = "Batch Normalized Deep Boltzmann Machines",
abstract = "Training Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters β but learning scale parameters γ; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The ex- perimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature represen tation and classification accuracy (3.98{\%} and 5.84{\%} average improvement for pretrainingand no pretraining respectively).",
author = "Hung Vu and Nguyen, {Tu Dinh} and Trung Le and Wei Luo and Dinh Phung",
year = "2018",
language = "English",
series = "Proceedings of Machine Learning Research",
publisher = "Proceedings of Machine Learning Research (PMLR)",
pages = "359--374",
editor = "Jun Zhu and Takeuchi, {Ichiro }",
booktitle = "Proceedings of Asian Conference on Machine Learning 2018",

}

Vu, H, Nguyen, TD, Le, T, Luo, W & Phung, D 2018, Batch Normalized Deep Boltzmann Machines. in J Zhu & I Takeuchi (eds), Proceedings of Asian Conference on Machine Learning 2018. Proceedings of Machine Learning Research, vol. 95, Proceedings of Machine Learning Research (PMLR), USA, pp. 359-374, Asian Conference on Machine Learning 2018, Beijing, China, 14/11/18.

Batch Normalized Deep Boltzmann Machines. / Vu, Hung; Nguyen, Tu Dinh; Le, Trung; Luo, Wei; Phung, Dinh.

Proceedings of Asian Conference on Machine Learning 2018. ed. / Jun Zhu; Ichiro Takeuchi. USA : Proceedings of Machine Learning Research (PMLR), 2018. p. 359-374 (Proceedings of Machine Learning Research; Vol. 95).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Batch Normalized Deep Boltzmann Machines

AU - Vu, Hung

AU - Nguyen, Tu Dinh

AU - Le, Trung

AU - Luo, Wei

AU - Phung, Dinh

PY - 2018

Y1 - 2018

N2 - Training Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters β but learning scale parameters γ; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The ex- perimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature represen tation and classification accuracy (3.98% and 5.84% average improvement for pretrainingand no pretraining respectively).

AB - Training Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters β but learning scale parameters γ; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The ex- perimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature represen tation and classification accuracy (3.98% and 5.84% average improvement for pretrainingand no pretraining respectively).

M3 - Conference Paper

T3 - Proceedings of Machine Learning Research

SP - 359

EP - 374

BT - Proceedings of Asian Conference on Machine Learning 2018

A2 - Zhu, Jun

A2 - Takeuchi, Ichiro

PB - Proceedings of Machine Learning Research (PMLR)

CY - USA

ER -

Vu H, Nguyen TD, Le T, Luo W, Phung D. Batch Normalized Deep Boltzmann Machines. In Zhu J, Takeuchi I, editors, Proceedings of Asian Conference on Machine Learning 2018. USA: Proceedings of Machine Learning Research (PMLR). 2018. p. 359-374. (Proceedings of Machine Learning Research).