![]() ![]() ![]() Springer, Berlin, Heidelberg, pp 227–236Ĭhen D, Manning CD (2014) A fast and accurate dependency parser using neural networks. In: Fogelman Soulié F, Hérault J (eds) Neurocomputing. Trans Assoc Comput Linguist 5:135–146īridle JS (1989) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 656–663īojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. In: Proceedings of workshop at the international conference on learning representations (ICLR)īergsma S, Kondrak G (2007) Alignment-based discriminative string similarity. In: Proceedings of 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283īahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. The results obtained with the dataset used in the experiments were better than the state-of-the-art baselines.Ībadi M, Barham P, Chen J, Chen Z et al (2016) Tensorflow: a system for large-scale machine learning. The results obtained with the best embedding combination and best neural network architecture were compared with state-of-the-art approaches. Once the best combination of embeddings at different levels was determined, different architectures of multi-input neural networks were compared. The developed architecture achieves an improvement with different combinations of text encoding techniques depending on the different characteristics of the datasets. Two out of seven datasets originated from real commercial scenarios: (1) classifying ingredients into their corresponding classes by means of a corpus provided by Northfork and (2) classifying texts according to the English level of their corresponding writers by means of a corpus provided by ProvenWord. Some of those languages contain agglutinations and grammatical cases. Experiments were conducted on seven datasets from different language families: English, German, Swedish and Czech. The text can be represented at different levels of tokenised input text such as the sentence level, word level, byte pair encoding level and character level. To do this, we implemented a multi-input neural network that is able to encode input text using several text encoding techniques such as BERT, neural embedding layer, GloVe, skip-thoughts and ParagraphVector. In this paper, we focus on how the combination of different methods of text encoding may affect classification accuracy. The improvement of text classification can be done at different levels such as a preprocessing step, network implementation, etc. The problem of automatic text classification is an essential part of text analysis. ![]()
0 Comments
Leave a Reply. |