As Quantitative Linguistics group there are several projects we are currently working on. A short description of the projects associated with our group follows. For a more detailed view of the research ideas we are investigating look at Harald Baayens website. Projects that have been finished or run out of funding can be found in Previous Projects.
DFG-Cwic
Complex words in context
Details to DFG-Cwic
Project Cwic: Complex words in context
Recent years have seen impressive advances in the fields of natural language processing (NLP) and artificial intelligence (AI). State-of-the-art language technologies have been made possible by advances in machine learning utilising many-layered 'deep' learning artificial neural networks. However, understanding what deep learning networks detect in language use, and what probabilistic information they exploit to generate predictions for computational language tasks, often remains unclear (but see Linzen & Baroni, 2021, for recent advances). For engineering purposes, this is not a problem, but for understanding language and the cognition of language processing, this state of affairs is highly unsatisfactory. The discriminative lexicon model (DLM) (Baayen, R. H. et al., 2019; Chuang & Baayen, R. H., 2021) is an attempt to combine the strengths of the mathematics of error-driven learning with the new possibilities offered by word embeddings for the computational modeling of the mental lexicon and lexical processing. Word embeddings, which we will also refer to as 'semantic vectors', represent word meanings as points in a high-dimensional space calculated from word usage in large text corpora.
Members
- R. Harald Baayen (Principal Investigator)
- Konstantin Sering (Postdoctoral researcher)
ERC-SUBLIMINAL
Subliminal learning in the Mandarin lexicon
Details to ERC-SUBLIMINAL
Project aims
Central to this research project is the observation that there are regularities and systematicities in the spoken language that escape our awareness, that are shielded from us by linguistic traditions and cultural conventions embodied in writing systems, but that nevertheless are detected by our brains, albeit subliminally, and used to optimize lexical processing.
Philosophers such as Emmanual Kant, Edmund Husserl, and Maurice Merleau-Ponty, and more recently the cognitive scientist Hoffman, have called attention to how our perception of reality is shaped by and filtered through our minds and bodies. According to Hoffman, mathematically, fitness beats truth: our perceptions of the world are tuned to our survival. Writing systems are culturally evolved technologies that also hide from our eyes and ears the truth about what we really hear and say. Obviously, in order to work, writing systems must abstract away from the full richness of the spoken word. However, many features of our speech that are masked by writing systems, are nevertheless exploited by our cognitive system when we listen or speak. For native speakers, mismatches between speech and writing are relatively unproblematic. For second language acquisition, however, mismatches can render learning unnecessarily difficult.
The research programme addresses this issue for Mandarin Chinese. Two kinds of mismatches will be investigated, using state-of-the-art methods in computational modeling, distributional semantics, and statistical analysis: subliminal mismatches between what written words are supposed to sound like, and how they are actually spoken, and subliminal mismatches between how the writing system is supposed to work, and how it actually functions and, as a semiotic system of its own, influences thought. These investigations will inform the applied goal of this project: developing ways to enhance vocabulary learning of Mandarin Chinese as a second language.
Presentations
Baayen, R. H., Modeling Mandarin tones on two-word compounds, Colloquium English Language and Linguistics, Düsseldorf, Germany, January 19, 2024.
Baayen, R. H., Frequency-Informed Learning, colloquium Out of Our Minds, Birmingham, United Kingdom, October 11, 2023.
Baayen, R. H., Computational modeling of lexical processing, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 7, 2023.
Yang, Y., Measure words in Mandarin, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023
Jin, X., Retroflex realization in the ShangHai dialect, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023
Tseng, Y.-H., Lian, D.-C., and Watty, D., Modeling diachronic semantic change of (Pre-Modern) Mandarin Chinese with contextualized embeddings & Word2Vec, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023
Chuang, Y.-Y., Baayen, R. H., and Bell, M., Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English, 20th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic, August 7, 2023 (poster presentation).
Baayen, R. H., Chuang, Y.-Y., and Heitmeier, M., Discriminative learning and the lexicon: NDL and LDL, STEP2023 – CCP Spring Training in Experimental Psycholinguistics, Edmonton, Canada, June 14, 16, 2023 (virtual).
Members
-
R. Harald Baayen (Professor, Principal Investigator)
-
Yu-Ying Chuang (Postdoctoral researcher)
-
Xiaoyun Jun (Doctoral researcher)
-
Yuxin Lu (Doctoral researcher)
-
Kun Sun (Postdoctoral researcher)
-
Yu Hsiang Tseng (Postdoctoral researcher)
-
Weiting Wang (Research assistant)
-
Yi Yang (Postdoctoral researcher)
-
Runzhi Zhang (Research assistant)
DFG-EML
Machine Learning for Science
Cluster of Excellence - Machine Learning for Science (Cluster speaker: Philipp Berens, Cluster speaker: Ulrike von Luxburg)
Details to DFG-EML
Innovation Fund Project 1 in research area A - Beyond Prediction, Towards Understanding
In research area A, we will design algorithms that reveal complex structure and causal relationships from data in order to integrate machine learning into the scientific discovery process. Project 1 investigates "Enhancing Machine Learning of Lexical Semantics with Image Mining".
Members
- Hendrik Lensch (Principal investigator)
- R. Harald Baayen (Principal investigator)
- Zohreh Ghaderi (Phd student)
- Hassan Shahmohammadi (Phd student)