Recently, word embedding representations have been investigated for slot filling in Spoken Language Understanding, along with the use of Neural Networks as classifiers. Neural Networks, especially Recurrent Neural Networks, that are specifically adapted to sequence labeling problems, have been applied successfully on the popular ATIS database. In this work, we make a comparison of this kind of models with the previously state-of-the-art Conditional Random Fields (CRF) classifier on a more challenging SLU database. We show that, despite efficient word representations used within these Neural Networks, their ability to process sequences is still significantly lower than for CRF, while also having a drawback of higher computational costs, and that the ability of CRF to model output label dependencies is crucial for SLU.
In this work, we're analyzing whether RNNs are a good candidate for the task of slot tagging in spoken language understanding and whether they really outperform the old state of the art method, CRF. We analyze two things:
To gain insight for the first question, we use a classifier that can use both symbolic and numeric (counties) representations as input: boosting over decision trees. We then evaluate how do the two different input affect a same type of classifier:
Representation | Precision (%) | Recall (%) | F-measure (%) |
---|---|---|---|
ATIS | |||
symbolic | 93.00 | 93.43 | 93.21 |
numeric | 93.50 | 94.54 | 94.02 |
MEDIA | |||
symbolic | 71.09 | 75.48 | 73.22 |
numeric | 73.61 | 78.85 | 76.12 |
We see that using numerical (continuous) representations brings significant improvement and that this is definitely a significant factor aiding RNNs.
To try to answer the second question, we evaluate both CRF and RNNs on two datasets, namely:
For RNNs, we also evaluate both retrained representations (Word2Vec) and word representations that are trained jointly with the RNN (a lookup table at the beginning of the network that is updated with backpropagation). The results are as follows:
Algorithm | Representation | F-measure (%) |
---|---|---|
ATIS | ||
Bonzaiboost | numeric (Word2Vec) | 94.02 |
Bonzaiboost | symbolic | 92.97 |
CRF | symbolic | 95.23 |
Elman RNN | numeric (joint) | 96.16 |
MEDIA | ||
Bonzaiboost | numeric (Word2Vec) | 76.14 |
Bonzaiboost | symbolic | 73.22 |
CRF | symbolic | 86.00 |
Elman RNN | numeric (joint) | 81.76 |
Elman RNN | numeric (Word2Vec) | 81.94 |
Jordan RNN | numeric (joint) | 83.25 |
Jordan RNN | numeric (Word2Vec) | 83.15 |
We conclude that continuous representation spaces allow for a better generalization (better accuracy) and make the classification algorithm to converge faster. Moreover, continous representations decrease the possibility for a classifier to produce noise fitted decision rules and thus are more robust to noise than symbolic ones. Despite this conclusion, algorithms able to exploit them, like RNNs are not able to compete with CRF. Although CRF is trained solely on symbolic features, its ability to model output label dependencies appears crucial for the task. CRF with symbolic features thus remains the best classification algorithm for SLU, in term of prediction.