About me

I am a computational corpus linguist at the Hungarian Research Centre for Linguistics in Budapest. I specialize in data-driven research of Hungarian preverb constructions (see my Ph.D. dissertation). I am interested in Natural Language Processing, especially in fine-tuning existing tools and developing new ones for corpus building and dictionary editing. I was among the creators of the Old Hungarian Corpus, the Parallel Bible Reader and the UraLUID database. Apart from my academic work, I am a freelance language technologist and a language enthusiast.

Download my CV

Interests
  • Corpus linguistics
  • Usage-based models of language
  • Natural Language Processing
  • Creation of language resources
Education
  • Ph.D. summa cum laude in linguistics, 2021

    Pázmány Péter Catholic University

  • M.A. in digital humanities, 2016

    Pázmány Péter Catholic University

  • B.A. in Slavonic (Russian) studies, 2014

    Eötvös Loránd University

  • B.A. in German (Scandinavian) studies, 2014

    Eötvös Loránd University

Publications

(2025). Magyar szerkezettár demó [Hungarian Constructicon demo]. In: Berend, Gábor – Gosztolya, Gábor – Vincze, Veronika (eds.): XXI. Magyar Számítógépes Nyelvészeti Konferencia. Szegedi Tudományegyetem TTIK, Informatikai Intézet. Online edition. 247–256.

(2024). Hungarian auxiliaries revisited. Acta Linguistica Academica 71/1–2: 202–218. https://doi.org/10.1556/2062.2023.00701.

PDF Dataset

(2024). „A fatens felelt pedig…” – A Történeti Magánéleti Korpusz igei szerkezeteinek mozaik n-gram alapú feldolgozása [„And the witness answered…” – Mosaic n-gram-based processing of the verb constructions found in the Old and Middle Hungarian corpus of informal language use]. In: Berend, Gábor – Gosztolya, Gábor – Vincze, Veronika (eds.): XX. Magyar Számítógépes Nyelvészeti Konferencia. Szegedi Tudományegyetem, online edition. 43–58.

PDF

See all publications »

Presentations

(2025). Magyar szerkezettár demó [Hungarian Constructicon demo]. XXI. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, 6 February, 2025.

(2025). The unusual diachronic development of a typologically uncommon desiderative construction. Linguistic Society of America (LSA) Annual Meeting 2025. Philadelphia, 9–12 January, 2025.

Slides

(2024). „A fatens felelt pedig…” – A Történeti Magánéleti Korpusz igei szerkezeteinek mozaik n-gram alapú feldolgozása [„And the witness answered…” – Mosaic n-gram-based processing of the verb constructions found in the Old and Middle Hungarian corpus of informal language use]. XVIII. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, 25–26 January, 2024.

See all presentations »

Datasets

(2021). PrevDistro: Preverb Distributions.

Dataset

(2020). PrevCons: Preverb Constructions.

Dataset

(2019). PrevLex: Preverb Lexicon.

Dataset

Corpora

(2023). Moldvai magyar korpusz – részletek Tánczos Vilmos gyűjtéséből [Corpus of Moldavian Hungarian dialects – recordings from Vilmos Tánczos' collection].

URL

(2017). Uralic Languages under the Influence database.

URL

(2017). Párhuzamos Bibliaolvasó [Parallel Bible Reader].

URL

(2013). Ómagyar Korpusz [The Old Hungarian Corpus].

URL