Scratch

Attention (almost) from Scratch Paraphrase Identification using Attention Amir Hadifar Polytechnic University of Tehran

Overview • What problem attention solve? • What is Attention? • Some Applications

[Recurrent Models of Visual Attention by Mnih et.al 2014] [Photo from videoblocks.com]

A A A A I am a Sentence [Cho et.al 2014; Sutskever et.al 2014; Goldberg 2017; Olah2015] [RNN & Attention diagrams derived from distil.pub]

A A A 1 3 1.2 This is Good Softmax 0.10 0.77 0.13

A [Cho et.al 2014; Sutskever et.al 2014; Goldberg 2017; Olah2015]

سلام دنیا B A B A World Hello

[Photo from youtube.com]

سلام دنیا B A B A World Hello [Bahdanao et.al 2015]

B B A A A A … [Bahdanao et.al 2015]

B B A A A A … [Loung et.al 2015]

B B A A A A …

[Bahdanao et.al 2015]

Entails Premise: دیروز باران آمد Contradict Hypothesis: هوا ابری بود Neither [Images are from blog.fastforwardlabs.com; KDNuggets.com]

… Word Attention A A A A Word Encoder … … [Li et.al 2015; Yang et.al 2016]

Softmax Sentence Attention B B B B Sentence Encoder … … [Li et.al 2015; Yang et.al 2016]

[Hierarchical Attention Networks for Documents Classification by yang et.al 2016]

A A A A A A چربیهای کمکردن مقدار کاهش بدن BMI

B B B B بدن چربیهای کمکردن 0 مقدار BMI کاهش 0 Pooling Pooling چربیهای بدن کمکردن مقدار BMI کاهش 0 0

Duplicate: Yes/No Feed Forward Network Attention Layer B B B B بدن چربیهای کمکردن 0 مقدار BMI کاهش 0

Performance for paraphrase identification on the Qoura dataset [Rows 2 to 8 are taken from Tomaret.al 2017]

According to Quora: ground truth labels contains some amount of noise Error analysis for paraphrase identification on the Qoura dataset [www.data.quora.com/First-Quora-Dataset-Release-Question-Pairs]

Last words • Still needed to explore • Skip-Connection • Other family members • Neural Turing Machine • Adaptive Computation Time [distill.pub/2016/augmented-rnns/]

References • Y. Goldberg. (2017). Neural Network Methods for Natural Language Processing • D. Bahdanau, K. Cho, and Y. Bengio. (2015). Neural Machine Translation by Jointly Learning to Align and Translate • Z. Yang, D. yang, C. Dyer, X. He, A. Smola, and E. Hovy. (2016). Hierarchical Attention Networks for Document Classification • I. Sutskever, O. Vinyals, and Q. Le. (2014). Sequence to Sequence Learning with Neural Networks • K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. (2014). Learning Phrase Representation using RNN Encoder-Decoder for Statistical Machine Translation • CrisOlah. (2015). colah.github.io/posts/2015-08-Understanding-LSTMs • M. Loung, H. Pham, and C. Manning. (2015). Effective Approaches to Attention-based Neural Machine Translation • J. Li, M. Loung, and D. Jurafsky. (2015). A Hierarchical Neural Auto-encoder for Paragraphs and Documents • Z. Wang, W. Hamza, and R. Florian. (2017). Bilateral Multi-perspective Matching for Natural Language Sentences • G. Tomar, T. Duque, O. Tackstrom, J. Uszkoreit, and D. Das. (2017). Neural Paraphrase Identification of Questions with Noisy Pretraining

Any Question? A A A A Attention Thanks for your

Scratch

Scratch

Presentation Transcript

SCRATCH

Scratch

Scratch

Scratch Game

Scratch

Scratch

Scratch

Scratch

Scratch

Scratch

SCRATCH

What is Scratch? Scratch as Logo

SCRATCH

Scratch

SCRATCH

SCRATCH

Scratch

Scratch Robots

Scratch