About me

I am a computational corpus linguist at the Hungarian Research Centre for Linguistics in Budapest. I specialize in data-driven research of Hungarian preverb constructions (see my Ph.D. dissertation). I am interested in Natural Language Processing, especially in fine-tuning existing tools and developing new ones for corpus building and dictionary editing. I was among the creators of the Old Hungarian Corpus, the Parallel Bible Reader and the UraLUID database. Apart from my academic work, I am a freelance language technologist and a language enthusiast.

Download my CV

Interests
  • Corpus linguistics
  • Usage-based models of language
  • Natural Language Processing
  • Creation of language resources
Education
  • Ph.D. summa cum laude in linguistics, 2021

    Pázmány Péter Catholic University

  • M.A. in digital humanities, 2016

    Pázmány Péter Catholic University

  • B.A. in Slavonic (Russian) studies, 2014

    Eötvös Loránd University

  • B.A. in German (Scandinavian) studies, 2014

    Eötvös Loránd University

Publications

(2022). PrevDistro: An open-access dataset of Hungarian preverb constructions. Acta Linguistica Academica 69/4: 549–563. https://doi.org/10.1556/2062.2022.00578.

PDF Dataset

(2022). Igekötő-kapcsolás [Connecting preverbs and their associated verbs]. In: Berend, Gábor – Gosztolya, Gábor – Vincze, Veronika (eds.): XVIII. Magyar Számítógépes Nyelvészeti Konferencia. Szegedi Tudományegyetem, Informatikai Intézet. Szeged. 77–91.

PDF

(2021). Igekötős szerkezetek a magyarban [Preverb constructions in Hungarian]. Ph.D. dissertation. Pázmány Péter Catholic University, Faculty of Humanities and Social Sciences, Doctoral School of Linguistics. Budapest.

Project

See all publications »

Presentations

(2022). A magyar igekötős szerkezetek korpuszvezérelt vizsgálata [Corpus-driven studies of Hungarian preverb constructions]. A Magyar Nyelvtudományi Társaság felolvasó ülése. Budapest, 15 November, 2022.

Slides

(2022). Building a dependency treebank from the Hungarian Gigaword Corpus. 15th International American Association for Corpus Linguistics Conference (AACL 2022). Flagstaff, 9–11 September, 2022.

Poster

(2022). Always far from perfect, yet always good enough. IMM20 Workshop: The imperfectability of morphology: From analogy to anomaly (and back again). Budapest, 1–4 September, 2022.

Slides

See all presentations »

Datasets

(2021). PrevDistro: Preverb Distributions.

Dataset

(2020). PrevCons: Preverb Constructions.

Dataset

(2019). PrevLex: Preverb Lexicon.

Dataset

Corpora

(2017). Uralic Languages under the Influence database.

URL

(2017). Párhuzamos Bibliaolvasó [Parallel Bible Reader].

URL

(2013). Ómagyar Korpusz [The Old Hungarian Corpus].

URL