Learning Hierarchical Syntactic Transformations with Encoder-Decoder Networks

Published in Bridging AI and Cognitive Science Workshop Papers at ICLR, 2020

Max Nelson

It is hypothesized that in first language acquisition the space of possible grammars considered by children is restricted to only those which are expressed over hierarchical positions (Chomsky, 1980). This work builds on previous work probing the extent to which encoder-decoder networks are able to learn and generalize over hierarchical structure when trained on syntactic transformations (Frank & Mathis, 2007; McCoy et al., 2018). The primary contribution is training of the networks on multiple artificial languages which differ in the extent to which the target transformation can be expressed as a function of the linear or hierarchical position of words. Results suggest that GRU encoder-decoders reliably behave consistently with hierarchical generalization, while SRNNs and LSTMs do not. Contrary to earlier claims, no network behaves as if it has learned a linear generalization even in a language in which all training data are consistent with one.

Download paper here