A novel algorithm of two-stage fine-tuning of a BERT-based language model for more effective named entity recognition is proposed. The first stage is based on training BERT as a Siamese network using a special contrastive loss function, and the second stage consists of fine-tuning the NER as a "traditional" sequence tagger. Inclusion of the contrastive first stage makes it possible to construct a high-level feature space at the output of BERT with more compact representations of different named entity classes. Experiments have shown that this fine-tuning scheme improves the generalization ability of named entity recognition models fine-tuned from various pre-trained BERT models.
DOI: 10.28995/2075-7182-2022-21-70-80
This paper attempts to analyze the effectiveness of deep learning for tabular data processing. It is believed that decision trees and their ensembles is the leading method in this domain, and deep neural networks must be content with computer vision and so on. But the deep neural network is a framework for building gradient-based hierarchical representations, and this key feature should be able to provide the best processing of generic structured (tabular) data, not just image matrices and audio spectrograms. This problem is considered through the prism of the Weather Prediction track in the Yandex Shifts challenge (in other words, the Yandex Shifts Weather task). This task is a variant of the classical tabular data regression problem. It is also connected with another important problem: generalization and uncertainty in machine learning. This paper proposes an end-to-end algorithm for solving the problem of regression with uncertainty on tabular data, which is based on the combination of four ideas: 1) deep ensemble of self-normalizing neural networks, 2) regression as parameter estimation of the Gaussian target error distribution, 3) hierarchical multitask learning, and 4) simple data preprocessing. Three modifications of the proposed algorithm form the top-3 leaderboard of the Yandex Shifts Weather challenge respectively. This paper considers that this success has occurred due to the fundamental properties of the deep learning algorithm, and tries to prove this.
DOI: https://doi.org/10.48550/arXiv.2112.03566
For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.
DOI: 10.1007/978-3-030-63000-3_5
This paper presents new methods for entity recognition and relation extraction tasks on partially labeled and unlabeled datasets. The proposed methods are based on techniques of semi-supervised, unsupervised and the transfer learning. We use the few-shot learning technique to construct specific algorithms for the new data sources without manual retraining. To compare the results with other studies, we conducted experiments on two benchmark datasets for the Russian language. The results for named entity recognition demonstrate significant improvement and outperform the state-of-the-art results. Our results for relation extraction are comparable to other research. We assume that a longer BERT fine-tuning will help to improve them, and we also plan to experiment with other few-shot learning methods in the near future.
DOI: 10.1109/S.A.I.ence50533.2020.9303192
The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, ie they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model
подробнееThis paper presents an overview of rule-based system for automatic accentuation and phonemic transcription of Russian texts for speech connected tasks, such as Automatic Speech Recognition (ASR). Two parts of the developed system, accentuation and transcription, use different approaches to achieve correct phonemic representations of input phrases. Accentuation is based on “Grammatical dictionary of the Russian language” of A.A. Zaliznyak and wiktionary corpus. To distinguish homographs, the accentuation system also utilises morphological information of the sentences based on Recurrent Neural Networks (RNN). Transcription algorithms apply the rules presented in the monograph of B.M. Lobanov and L.I. Tsirulnik “Computer Synthesis and Voice Cloning”. The rules described in the present paper are implemented in an open-source module, which can be of use to any scientific study connected to ASR or Speech To Text (STT) tasks. Automatically marked up text annotations of the Russian Voxforge database were used as training data for an acoustic model in CMU Sphinx. The resulting acoustic model was evaluated on cross-validation, mean Word Accuracy being 71.2%. The developed toolkit is written in the Python language and is accessible on GitHub for any researcher interested.
DOI: 10.1007/978-3-319-99579-3_78
This study considers the problem of using small text datasets for learning of neural networks. We explore the method used for image and sound datasets that augments data in order to increase the performance of models trained on it. We propose a method for augmenting that is based on synonymy.
подробнее