Seminar für Sprachwissenschaft

Coming from a background in physics and psychology, my main focus is on the production of natural human speech by the human speech system. I am fascinated by the fact that the vast majority of people learn to use spoken language efficiently and purposefully as a means of communication in a wide variety of environments and situations within the first few years of life. Our human speech system displays fascinating coordination and dynamics. For me, the discriminating, information-carrying character of language is more obvious than the meaning-carrying character. As far as I understand the language system so far, the "transfer of information" seems to me to be the essential characteristic. In concrete terms, this means that language can, for example, signal whether I find a situation pleasant or unpleasant or that I have or have not yet understood something. However, this signaling always requires the right context. as the same signal has different meanings in different contexts.

Methodologically, I am approaching the topic of spoken language in humans from the computer-based modeling and simulation side. I am trying to control a computer model of the human speech apparatus, which was developed by Peter Birkholz in Dresden (VocalTractLab), so that it can produce human spontaneous speech. To find the correct trajectories of the control parameters that drive the vocal tract model, I combine simple artificial neural networks with a "predictive forward" approach that minimizes the error in different target spaces through situational planning. These target spaces are at least a meaning target space and an acoustic target space. A substantial part of my work resulted in the PAULE (Predictive Articulatory speech synthesis Utilzing Lexical Embeddings) model (PhD thesis, Python code). Here I benefit from the resources and expertise of the cognitive modeling group around Martin V. Butz and the linguistic and statistical expertise of our quantitative linguistics group around Harald Baayen.

Publications

Konstantin Sering. Speech/non-speech classification slightly improves synthesis quality in PAULE. In Elektronische Sprachsignalverarbeitung 2024, pages 173–180, 2024.

Konstantin Sering. Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE). PhD thesis, Universität Tübingen, 2023.

Konstantin Sering and Paul Schmidt-Barbo. Somatosensory feedback in PAULE. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2023, pages 119–126, 2023.

Karen V Beaman and Konstantin Sering. Measuring change in lectal coherence across real-and apparent-time. In The Coherence of Linguistic Communities, pages 87–105. Routledge, 2022.

Paul Schmidt-Barbo, Sebastian Otte, Martin V. Butz, R. Harald Baayen, and Konstantin Sering. Using semantic embeddings for initiating and planning articulatory speech synthesis. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2022, pages 32–42, 2022.

Konstantin Sering and Paul Schmidt-Barbo. Articubench - an articulatory speech synthesis benchmark. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2022, pages 43–50, 2022.

Karen V Beaman, Fabian Tomaschek, and Konstantin Sering. The cognitive coherence of sociolects across the lifespan: A case study of swabian german. 2021.

Jakob Fink-Lamotte, Andreas Widmann, Konstantin Sering, Erich Schröger, and Cornelia Exner. Attentional processing of disgust and fear and its relationship with contamination-based obsessive–compulsive symptoms: Stronger response urgency to disgusting stimuli in disgust-prone individuals. Frontiers in psychiatry, 12, 2021.

Paul Schmidt-Barbo, Elnaz Shafaei-Bajestan, and Konstantin Sering. Predictive articulatory speech synthesis with semantic discrimination. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2021, pages 177–184, 2021.

Konstantin Sering, Fabian Tomaschek, and Motoki Saito. Anticipatory coarticulation in predictive articulatory speech modeling. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2021, pages 208–215, 2021.

Fabian Tomaschek, Denis Arnold, Konstantin Sering, and Friedolin Strauss. A corpus of schlieren photography of speech production: potential methodology to study aerodynamics of labial, nasal and vocalic processes. Language Resources and Evaluation, 55(4):1127–1140, 2021.

Konstantin Sering, Paul Schmidt-Barbo, Sebastian Otte, Martin V Butz, and Harald Baayen. Recurrent gradient-based motor inference for speech resynthesis with a vocal tract simulator. In 12th International Seminar on Speech Production, 2020.

Konstantin Sering and Fabian Tomaschek. Comparing KEC recordings with re-synthesized EMA data. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2020, pages 77–84, 2020.

Fabian Tomaschek, Denis Arnold, Konstantin Sering, Benjamin V Tucker, Jacoline van Rij, and Michael Ramscar. Articulatory variability is reduced by repetition and predictability. Language and speech, 2020.

Konstantin Sering, Niels Stehwien, Yingming Gao, Martin V Butz, and Harald Baayen. Resynthesizing the GECO speech corpus with VocalTractLab. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2019, pages 95–102, 2019.

Konstantin Sering, Petar Milin, Harald Baayen. Language comprehension as a multi-label classification problem. Statistica Neerlandica, 2018.

Denis Arnold, Fabian Tomaschek, Konstantin Sering, Florence Lopez, and Harald Baayen. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLoS ONE, 2017.

Konstantin Sering. Dispersion Forces – Numerical Methods for Casimir-Polder Potentials in Complex Geometries. Diploma thesis, 2014.

Konstantin Sering. Gaze Coherence – Improving and evaluating a spatio-temporal normalised scan path saliency approach. Diploma thesis, 2013.

Florian Wickelmaier, Nora Umbach, Konstantin Sering, and Sylviain Choisel. Comparing three methods for sound quality evaluation with respect to speed and accuracy. Convention Paper 7783 of the Audio Engineering Society, 2009.

Gudrun Tisch, Hedwig Seelentag, Leon Sering, Thomas Hinke, Konstantin Sering, Katharina Mölter, Phillip Urbanik, and Ole Schmidt. Fümo - Das Buch. Tenea, 2007.

Software

articubench (Author and Maintainer), Python package: An articulatory speech synthesis benchmark publishing publicly available data and own measurements with electromagnetic articulography and ultra sound tongue movement to compare different articulatory speech synthesis control models, https://github.com/quantling/articubench,
since 2022.

paule (Author and Maintainer), Python package: Predictive Articulatory speech synthesis Utelising Lexical Embeddings (PAULE), a control model for the VocalTractLab speech synthesizer, https://github.com/quantling/paule, since 2021.

create_vtl_corpus (Co-Author and Maintainer), Python scripts to create and synthesize a speech corpus with VocalTractLab. https://github.com/quantling/create_vtl_corpus, since 2019.

pyndl (Co-Author and Maintainer), Python package that re-implements learning and classification models
based on the Rescorla-Wagner equations. https://github.com/quantling/pyndl, since 2016.

ndl2 (Maintainer), R package that implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations. Mail me for copy, since 2016.

ndl (Maintainer), R package that implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations., https://cran.r-project.org/web/packages/ndl/index.html, since 2016.

synchronicity (Author), Python package to calculate gaze synchronicity and coherence values for a group of viewers of a dynamic scene out of raw eye tracking data, https://github.com/derNarr/synchronicity, 2015.

segmag (Contributer), R package in order to determine event boundaries in event segmentation experiments, http://cran.r-project.org/web/packages/segmag/index.html, 2016.

achrolab (Co-Author), Python package to control and calibrate hardware in a achromatic color laboratory, https://github.com/derNarr/achrolab, 2012–2013

Presentations

Konstantin Sering. Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE), 2021. Spoken Morphology Colloquium, Düsseldorf, Germany. Konstantin Sering. Learning vocal tract control parameters to synthesize speech, 2019. MoProc Workshop, Tübingen, Germany.

Konstantin Sering. Learning vocal tract control parameters to synthesize speech, 2019. Neural Information Processing Group, Tübingen, Germany.

Ingmar Steiner, Fabian Tomaschek, Timo Bolkart, Alexander Hewer, Stefanie Wuhrer, and Konstantin Sering. Head and tongue model: Simultaneous dynamic 3d face scanning and articulography, 2018. Simphon.net Meeting, Stuttgart, Germany.

Konstantin Sering. Mimicking to speak with a vocal tract model -- first ideas. Poster, MLSS, 2017.

Denis Arnold, Florence Lopez, Konstantin Sering, Fabian Tomaschek, and Harald Baayen. Acoustic speech learning without phonemes: Identifying words isolated from spontaneous speech as a validation for a discriminative learning model for acoustic speech learning. Talk, TeaP, 2016.

Jakob Fink, Andreas Widmann, Konstantin Sering, and Cornelia Exner. Attentional bias triggers disgust-specific habituation problems in subclinical contamination based obsessive-compulsive disorder. Poster, TeaP, 2016.

Konstantin Sering, Nora Umbach, and Dominik Wabersich. Achrolab – using non-python supported device for a vision lab. Poster, Euro SciPy in Paris, 2011.

Teaching

Please refer to the German version of this site to see an overview of my teaching.

since WS 2015
Please refer to the German version of this site to see an overview of my teaching.

This can be changed at the top right of the website.