Chapter 1 Analyzing Neural Network Optimization with Gradient Tracing

semanticscholar(2018)

引用 0|浏览0
暂无评分
摘要
Neural networks have led to state-of-the-art results in fields such as natural language processing [3] and computer vision. However, their lack of interpretability remains a persistent and serious shortcoming. Even though neural networks often outperform other machine learning techniques, it is typically impossible to come up with an intuitive explanation for the solution that the network learns from its inscrutable network of connections. Research has been increasingly focused on developing specialized neural network architectures which include components that impose an inductive bias on the model that is particularly suited to the task at hand, and additionally may lead to more interpretable solutions. For instance, recent neural machine translation models have made heavy use of “attention” mechanisms, whereby a model learns to translate a sentence word-by-word by focusing attention on select source words at each step [1]. Adding specialized components to a network typically has one or more of the following purposes: (a) to increase the network’s modeling power, (b) to make the network more feasible to train, and (c) to make some aspect of it inherently interpretable. A recurrent neural network (RNN) is an extension to neural networks that can operate on sequences of variable length. Stack RNNs, Queue RNNs, and Neural Turing Machines are all examples of adding stack, queue, or tape data structures to a recurrent neural network (RNN) in order to increase its modeling power, allowing it to learn algorithmic solutions that generalize to sequences much longer than those encountered during training [4, 7, 5]. Long short-term memory networks (LSTMs) and gated recurrent units (GRUs) are specifically designed to address trainability issues of RNNs [6, 2]. Finally, the aforementioned attention mechanism produces readily interpretable alignments between words in a source and target sentence. In this chapter, we explore the effect of neural network components on trainability with a graph-traversal technique dubbed “gradient tracing.” The standard technique for training neural networks, called backpropagation, involves the flow of “erro,” or gradient, backward through the components of the network. Gradient tracing is a technique of analyzing the flow of gradient through those components; its goal is to analyze the extent to which certain components influence parameter updates and facilitate learning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要