About
Hi! I am Tatiana Shavrina and I am a fascinated linguist who has moved to data science.
In this blog we will touch upon various interesting questions that link artificial intelligence and theoretical linguistics, interpretation of the results of machine learning, and also consider frequent NLP-developement errors that could be obvious to linguists (if they would read the code).
So…let’s profit from both of the fields and dive into it!
About me:
MY PROJECTS
- 2022 - mGPT project team lead, GPT-3-like text generation model in 61 languages, open-source link
- 2021 - Team Leader of ruDALL-E project, text-to-image generation in Russian, open-source link
- 2021 - Team Leader of ruGPT-3, text generation model in Russian, open-source link
- 2020 - Team Leader of Russian SuperGLUE Benchmark, see russiansuperglue.com Russian leaderboard of universal language models and transformers aiming to solve Natural Language Understanding task
- 2019 - Team leader of Automatic Exam Passing Passing Final Russian Language Exam with BERT on the level of an average pupil See Demo and Code and LREC 2020 paper
- 2018 - creator of Omnia Russica Corpus - 33b words. Combining all available sources for the needs of machine learning See omnia-russica
- 2017 - sole creator of Taiga Corpus, an open-source corpus for machine learning tasks for Russian
WHERE I DO IT
- from 2021 - Research Project Director in NLP and Multimodality, Aritificial Intelligence Research Institute
- from 2018 - Research Lead in NLP and Multimodality Research at Sber
- 2017 - 2018 - Data Scientist at 1C company
- from 2013 - Linguist and Project Coordinator at General Internet-Corpus of Russian
- 2011 - 2013 - Linguist at Russian National Corpus
EDUCATION
- 2018 – 2021 - Ph.D. at Computational Linguistics Department, National Research University “Higher School of Economics”
- 2016 – 2018 Master student at National Research University “Higher School of Economics”, Faculty of humanities, School of linguistics, program “Computational linguistics”
- 2012 – 2016 Bachelor at Lomonosov Moscow State University, Faculty of Philology, Department of Theoretical and Applied Linguistics
ADDITIONAL EDUCATION
- 2018 – LXMLS summer school on Deep Learning in NLP, Instituto Superior Técnico, Lisbon
- 2016 – 2017 – 1-year course “Practical Data Analysis and Machine Learning” at National Research University “Higher School of Economics”, Faculty of Computer Science
- 2017 – UCREL Summer School in Computational linguistics and other digital methods, University of Lancaster, Lancaster, UK
- 2017 – The New York - St. Petersburg Institute of Linguistics, Cognition and Culture (NYI), Stony Brook University; St. Petersburg State University.
- 2017 – The actual problems of competitive intelligence at National Research University “Higher School of Economics”, Institute of security problems
- 2016 - The New York - St. Petersburg Institute of Linguistics, Cognition and Culture (NYI), Stony Brook University; St. Petersburg State University.
- 2013 – CIMO Summer School in the University of Turku, Finland