Featuritis in NLP: [...]The primary issue here doesn't seem to be the representation that's making things so much slower to train, but the fact that it seems (from experimental results) that you really have to do the multitask learning (with tons of auxiliary problems) to make this work. This suggests that maybe what should be done is just to fix an input representation (eg., the word identities) and then have someone train some giant multitask network on this (perhaps a few of varying sizes) and then just share them in a common format. [...] At the end of the day, you're going to still have to futz with something. You'll either stick with your friendly linear model and futz with features, or you'll switch over to the neural networks side and futz with network structure and/or auxiliary problem representation. (Via natural language processing blog.)
Well, duh. Why do you think NN approaches in NLP never get to a whatever level of performance first, but they come only afterwards with the claim that they can achieve similar performance "without feature engineering"? Could it be because the feature engineering for linear models was what taught the NN practitioner how to choose their non-linear function class?
The situation seems very different in image processing/vision, where NN methods have achieved superior results first for at least some tasks. I don't think it's just a matter of there being more or smarter NN practitioners in those areas (although that might be argued), but also that images have a natural neighborhood structure (hence the success of convolutional nets, for example), unlike the discrete, heavy-tailed language domain.