DiSAN

directional self-attention network for RNN/CNN-free language understanding

Tao Shen, Jing Jiang, Tianyi Zhou, Shirui Pan, Guodong Long, Chengqi Zhang

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

Original languageEnglish
Title of host publicationProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 )
Subtitle of host publicationNew Orleans, Louisiana USA — February 2–7, 2018
EditorsSheila McIlraith, Kilian Weinberger
Place of PublicationPalo Alto California USA
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Pages5446-5455
Number of pages10
ISBN (Electronic)9781577358008
Publication statusPublished - 2018
Externally publishedYes
EventAAAI Conference on Artificial Intelligence 2018 - New Orleans, United States of America
Duration: 2 Feb 20187 Feb 2018
Conference number: 32nd
https://aaai.org/Conferences/AAAI-18/

Conference

ConferenceAAAI Conference on Artificial Intelligence 2018
Abbreviated titleAAAI 2018
CountryUnited States of America
CityNew Orleans
Period2/02/187/02/18
Internet address

Cite this

Shen, T., Jiang, J., Zhou, T., Pan, S., Long, G., & Zhang, C. (2018). DiSAN: directional self-attention network for RNN/CNN-free language understanding. In S. McIlraith, & K. Weinberger (Eds.), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 ): New Orleans, Louisiana USA — February 2–7, 2018 (pp. 5446-5455). Palo Alto California USA: Association for the Advancement of Artificial Intelligence (AAAI).
Shen, Tao ; Jiang, Jing ; Zhou, Tianyi ; Pan, Shirui ; Long, Guodong ; Zhang, Chengqi. / DiSAN : directional self-attention network for RNN/CNN-free language understanding. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 ): New Orleans, Louisiana USA — February 2–7, 2018. editor / Sheila McIlraith ; Kilian Weinberger. Palo Alto California USA : Association for the Advancement of Artificial Intelligence (AAAI), 2018. pp. 5446-5455
@inproceedings{6f6754318b4e4231bf33915bb0ec8b01,
title = "DiSAN: directional self-attention network for RNN/CNN-free language understanding",
abstract = "Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02{\%} on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.",
author = "Tao Shen and Jing Jiang and Tianyi Zhou and Shirui Pan and Guodong Long and Chengqi Zhang",
year = "2018",
language = "English",
pages = "5446--5455",
editor = "McIlraith, {Sheila } and Weinberger, {Kilian }",
booktitle = "Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 )",
publisher = "Association for the Advancement of Artificial Intelligence (AAAI)",
address = "United States of America",

}

Shen, T, Jiang, J, Zhou, T, Pan, S, Long, G & Zhang, C 2018, DiSAN: directional self-attention network for RNN/CNN-free language understanding. in S McIlraith & K Weinberger (eds), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 ): New Orleans, Louisiana USA — February 2–7, 2018. Association for the Advancement of Artificial Intelligence (AAAI), Palo Alto California USA, pp. 5446-5455, AAAI Conference on Artificial Intelligence 2018, New Orleans, United States of America, 2/02/18.

DiSAN : directional self-attention network for RNN/CNN-free language understanding. / Shen, Tao; Jiang, Jing; Zhou, Tianyi; Pan, Shirui; Long, Guodong; Zhang, Chengqi.

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 ): New Orleans, Louisiana USA — February 2–7, 2018. ed. / Sheila McIlraith; Kilian Weinberger. Palo Alto California USA : Association for the Advancement of Artificial Intelligence (AAAI), 2018. p. 5446-5455.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - DiSAN

T2 - directional self-attention network for RNN/CNN-free language understanding

AU - Shen, Tao

AU - Jiang, Jing

AU - Zhou, Tianyi

AU - Pan, Shirui

AU - Long, Guodong

AU - Zhang, Chengqi

PY - 2018

Y1 - 2018

N2 - Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

AB - Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

UR - http://www.scopus.com/inward/record.url?scp=85058208744&partnerID=8YFLogxK

M3 - Conference Paper

SP - 5446

EP - 5455

BT - Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 )

A2 - McIlraith, Sheila

A2 - Weinberger, Kilian

PB - Association for the Advancement of Artificial Intelligence (AAAI)

CY - Palo Alto California USA

ER -

Shen T, Jiang J, Zhou T, Pan S, Long G, Zhang C. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In McIlraith S, Weinberger K, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18 ): New Orleans, Louisiana USA — February 2–7, 2018. Palo Alto California USA: Association for the Advancement of Artificial Intelligence (AAAI). 2018. p. 5446-5455