We use cookies on this site to enhance your user experience. Do You agree?

Read more

LST Summer School

The 1st Summer School in Language Science and Technology will take place on September 16-20 at the University of Warsaw, Poland and online.

The school will include specialised courses on various aspects of language science and natural language processing tools and techniques. We will host experts from several universities and from the industry who will share their knowledge and skills with the attendees during lectures and hands-on workshop sessions.

The school is aimed at undergraduate and graduate students of linguistics and related fields. Students will be able to sign up for particular courses or the whole summer school en bloc.

Below, you will find our teachers together with the descriptions of the courses.

The course schedule is available here.

 

Volker Dellwo, University of Zurich

Professor Volker Dellwo is a leading international scholar in speech science and in its forensic applications in particular. He has held government and other research grants for work in this field and has published widely in peer-reviewed academic journals. He has held posts and visiting positions in seven universities and is currently Group Leader for Phonetics & Speech Sciences Group at the University of Zurich.

Course offered:

Module 1: What do linguists need to know about voice? (1,5 hrs)

The human voice carries not only linguistic information but also a rich array of speaker-specific information, such as their gender, age, accent, physical and mental state. Although these aspects are typically studied separately in linguistic research, they are (1) cued by a common set of multidimensional acoustic cues (e.g. F0, formant frequencies and other spectral characteristics, or voice quality) and (2) deeply intertwined in speech communication (e.g. the familiarity with the speaker enhance speech processing and viceversa). This module aims to introduce students to the (1) fundamental mechanisms of speech and voice production, (2) instrumental methods for studying language- and speaker-specific information in speech and voice (e.g. EMA, EGG), (3) information in acoustic signals (spectral and temporal) that convey linguistic and speaker-specific information, and (4) understand when linguistic information processing relies on voice information.

Module 2: Voice and speaker identity and forensic voice analysis applications (4,5 hrs)

The perception of speaker identity through voice is multidimensional, and listeners can utilize any relevant acoustic cue to decode speaker-specific details. The first part of the module on Voice and Speaker Identity aims to show the students a diverse spectrum of ‘rational dimensions’ in human voice (e.g. F0, formants, harmonicity, jitter, shimmer, rate), which are measurable from the acoustic and electroglottography (EGG) signals. The theoretical part will be followed by hands-on sessions focused on extracting and interpreting relevant acoustic features from both acoustic and EGG signals.

The second part of the “Voice and Speaker Identity” module aims to familiarize students with various ‘abstract dimensions’ in human voice, such as MFCCs, I-vectors, and x-vectors. It will cover dimension reduction techniques for data visualization and acoustic modelling, including PCA, t-SNE, and UMAP. These techniques will be demonstrated and applied in practical sessions to analyze speaker-specific information in large speech datasets.

The third part of the “Voice and Speaker Identity” module will familiarize students with the forensic applications of the aforementioned techniques.

 

 

Eleanor Chodroff, University of Zurich

Eleanor Chodroff is an SNF Assistant Professor in the Department of Computational Linguistics at the University of Zürich. Her research focuses on the phonetics–phonology interface, cross-talker and cross-linguistic phonetic variation, speech prosody, and speech perception. A recurring theme in her research is the use of large spoken corpora to advance linguistic theory. Prior to her current position, she was a Lecturer in Phonetics and Phonology at the University of York and a postdoctoral researcher in the Department of Linguistics at Northwestern University. She received her PhD in Cognitive Science from Johns Hopkins University.

Courses offered: 

Introduction to Forced Alignment with the Montreal Forced Aligner (3 hrs)

Segmentation of the speech stream into meaningful subintervals such as words or phones (speech segments) greatly facilitates acoustic-phonetic analysis and basic navigation in a sound file containing speech. Forced alignment describes an automatic process to align a speech transcript (text) to the corresponding audio (wav or mp3 file). In this tutorial, you will learn how to use forced alignment, and specifically the Montreal Forced Aligner, to align a transcript to the audio at the phone level. We will go through a series of examples using English and non-English examples. The tutorial has no prerequisites beyond the willingness to use a computer! 

Speech analysis in Praat (1.5 hrs)

In this workshop, you will the learn the basics of Praat, an acoustic analysis program. We will go over how to read in and view audio files, create text annotations of audio files with TextGrids, and extract basic acoustic-phonetic measurements from a sound file. 

Introduction to Praat Scripting (3 hrs)

In this workshop you will learn the basics of Praat scripting. We will begin with a high-level overview to the grammar of Praat and then proceed to script writing using a modifiable template. You’ll learn how to perform tasks like modifying audio files, creating TextGrids for a series of audio files, and taking temporal and spectral measurements. If you have not used Praat extensively before, it is recommended that you first attend the session on Speech Analysis in Praat. The focus of the workshop will be on Praat scripting itself. 

 

 

Łukasz Stolarski, Jan Kochanowski University of Kielce

Dr Łukasz Stolarski is a linguist specializing in phonetics and natural language processing. He has published numerous papers on the acoustic analysis of speech, corpus linguistics and processing of text data. He is also an author of several software applications designed for language research, including the “Phonetic Corpus of Audiobooks”, which is currently the largest language corpus offering audio recordings in English. He has over 15 years of experience teaching numerous courses to students enrolled in linguistics programs.

Course offered:

Speech Analysis in Python (3 hrs)

Advanced course tailored to students and researchers who want to expand their skill set in the acoustic analysis of spoken language. Participants are expected to have basic knowledge of Python, acoustic phonetics, and some experience in Praat scripting. The course will focus on two Python libraries: “librosa” and “Parselmouth”. The former is a general package for audio analysis, while the latter provides a direct interface to the internal Praat code. These packages unlock research possibilities that extend far beyond what is possible in traditional desktop tools.

 

 

Dorota Klimek-Jankowska, University of Wrocław

Dr Klimek-Jankowska is a faculty member at the Department of English and Comparative Linguistics, the Institute of English Studies, University of Wrocław and a member of the Center for Corpus and Experimental Research on Slavic and Baltic languages „Slavicus”. She is  also a director of the Center for Experimental Research on Natural Language (as part of which we have an EEG and an eye-tracking lab). Principal Investigator at the SONATA BIS-11 grant from the National Science Center in Poland (2021/42/E/HS2/00143). The title of the project is From a multilingual parallel corpus to the micro-typology of the PERFECT in Baltic and Slavic. Her main interests are syntax-semantics interface, formal semantics, semantic micro-typology of Slavic languages and psycholinguistics with the main focus on the organization of human linguistic knowledge and its interaction with other cognitive systems in the brain.

Course offered:

Psycholinguistics of natural language semantics (4,5 hrs)

In this course we will look at how issues in theoretical linguistics are investigated with psycholinguistic and neurolinguistic methods. We will look at some of the most recent experimental research in semantics, and discuss the relevance of empirical evidence from the domain of (neuro-)psychological behavior for formal models of syntactic and semantic computation. We will also discuss the motivation for the choice of various experimental methods such as acceptability rating tasks, scenario-based questionnaires, self-paced reading, eye-tracking (during reading and visual world paradigm) as well as, event-related brain potentials (ERP) in studies on lexical and sentential semantics. The course will consist of three meetings focusing on the following topics:

Day 1: Processing lexical, structural and scopal ambiguity 

Day 2:  Processing nouns and verbs and in between categories

Day 3: Processing selected functional categories: tense, aspect, number, gender, negation

 

 

Adam Dąbrowski, Robotec.ai

Adam is the CTO at Robotec.ai, leading developments of products for development and validation of robots, including simulations and generative AI tooling. He also chairs the simulation group for Open 3D Engine, acts in the role of Technical Steering Committee member of the Open Source Robotics Alliance, and serves as an independent expert in robotics and AI for the EU Commission.

Course offered:

The Why and How of Large Language Models (1,5 hrs)

Large Language Models (LLMs) such as the one behind ChatGPT are now commonplace. But what exactly is a LLM? Why do they work at all, when all the previous attempts failed to produce human-like conversations?

In this lecture, you will learn about key concepts such as transformers and self-attention, and understand strengths and limitations of generative AI. I will explain why hallucinations happen, how some models can improve with the process called fine-tuning, and what tools and methods are commonly used to enhance LLMs performance.

Finally, you will learn about current best models, how to compare their scores, and which are best suited for specific applications.

 

 

Patryk Hubar, University of Warsaw

M.A., specialist in digital humanities at the Digital Humanities Centre of the Institute of Literary Research of the Polish Academy of Sciences. As part of the Digital Humanities PhD programme run by IBL PAN and the Polish-Japanese Academy of Computer Science, he is preparing his dissertation on bibliographic data processing using machine learning algorithms. He is mainly interested in data analysis, natural language processing, artificial intelligence and Semantic Web technologies.

Course offered:

Using LLMs in Academia (3 hrs)

A a summer course for MSc students interested in using Large Language Models (LLMs) in their research. Participants will learn effective prompting techniques and the use of LLM-based applications to enhance their research process. The course covers information retrieval (including literature review), working with academic texts in PDF formats, and the use of chatbots for article reading and reference searching. Through hands-on sessions, students will acquire the practical knowledge essential for academic work and conducting research for their master’s thesis.

 

 

Jesael Jorge Sosa, marketing expert at SpanishClassesLive

Jesael is the chief marketing officer and co-founder of a successful startup engaging in online Spanish courses and classes, SpanishClassesLive. He specialises in digital marketing and optimising everyday operations in marketing, sales and customer service via digital applications and AI solutions.

Course offered:

Using LLMs in Business (1,5 hrs)

This course will consist in the development of an education-related project using generative AI, focused on optimising data analysis and prompt generation for business purposes.

 

 

Joanna Dolińska, University of Warsaw

Joanna Dolińska, PhD, is an Assistant Professor at the University of Warsaw, Faculty of “Artes
Liberales”, Center for Research and Practice in Cultural Continuity. Joanna carries out research in the field of multilingualism and currently leads the project “Interdependence of multilingualism and biodiversity in the Chiang Mai and Satun provinces in Thailand”. She gained her current academic expertise during research stays at the Mahidol University (Thailand), University of Groningen (The Netherlands), Smithsonian Institution (USA), University of Cambridge (UK) and University of Strasbourg (France). Furthermore, Joanna possesses 5-years long technological experience gained in the corporate environment of a leading hi-tech research & development company. In her research on minority languages, Joanna Dolińska combines fieldwork, computational linguistics and voice technology development.

Courses offered:

Automatic Speech Recognition (2 hrs)

The initial part of this module will be devoted to the history of of automatic speech recognition
(ASR), most important components of ASR technologies and the presentation of essential problems in the current ASR development. In the second part of the meeting, we will conduct a workshop concerning the ASR solutions for several languages and genres. The transcription exercises will be carried out with the help of free versions of a currently available transcription software (the name of the software, as well as the preparation tasks preceding the workshop, will be shared with the registered users before the workshop). The goal of this workshop is to present the opportunities and challenges in reference to various genres, varieties of a language, as well as the disparity between the dominant and lesser-resourced languages from the perspective of the current development of ASR technologies.

Language documentation and revitalization with digital tools (3 hrs)

The aim of this module is to discuss methodologies applied by researchers when working with
endangered languages and/or languages that are at risk of being endangered. In our meeting we will concentrate on two aspects of such work: language documentation and language revitalization. In the second part of the meeting we will discuss currently available digital tools which support language documentation and revitalization in terms of their potential benefits and challenges. We will discuss legal and ethical aspects of data collection carried out in language documentation and revitalization. Furthermore, we will practice drafting our own data management plan and ethical statement. Last but not least, we will discuss the most important components of project application devoted to language documentation and revitalization.

 

 

Agnieszka Kałdonek Crnjaković, University of Warsaw

Agnieszka Kałdonek-Crnjaković, PhD, is an assistant professor at the Institute of English Studies, (Faculty of Modern Languages, University of Warsaw), where she heads the Department of Applied Linguistics and Translation Studies and teaches courses related to her research interests, including teaching languages to students with special educational needs. She is the founder and coordinator of the faculty research group Neurodiversity in Language Education. Before joining the academia in 2018, she worked as a foreign/second language teacher and special needs teacher in Croatia, Poland, and the UK.

Courses offered:

Dyslexia and foreign/second language acquisition (1,5 hrs)

In this workshop, the students will learn about what dyslexia is, how it manifests, how it is diagnosed in the first language, how it differs from other language disorders, and what potential effect it may have on foreign/second language acquisition. We will also discuss eye-tracking studies that focus on individuals with dyslexia to initiate a discussion on future research studies in the field of applied linguistics.

Attention-Deficit/Hyperactivity Disorder and foreign/second language acquisition (1,5 hrs)

In this workshop, the students will learn about what ADHD is, how it manifests, how it is diagnosed, how it differs from other specific learning difficulties, and what potential effect it may have on foreign/second language acquisition. We will also discuss eye-tracking studies that focus on individuals with ADHD to initiate discussion on future research studies in the field of applied linguistics. During the workshop, students will also have a chance to try on eye-tracking glasses that were used by the lecturer in one of her studies involving adult individuals with ADHD.

 

Małgorzata Szupica Pyrzanowska, University of Warsaw

Małgorzata Szupica-Pyrzanowska, PhD, is an Assistant Professor at the Institute of Applied Linguistics, University of Warsaw where she oversees a neurolinguistics laboratory. Her research focuses on non-native language learning and teaching, eye tracking/pupillometry studies, neurolinguistics, neurodidactics, and aphasia. Previously she was a member of the Neurolinguistics Laboratory at the Speech-Language-Hearing Sciences Department, Graduate School and University Center of City University of New York. Additionally, she collaborated with the Speech Pathology Department at Lehman College (CUNY), the Washington Square Institute in New York, and the NY Aphasia Support Group at St. Vincent’s Hospital, where she worked with patients with Broca’s aphasia.

Course offered:

Topics in Clinical Linguistics

The module of the Summer School, Topics in Clinical Linguistics, will concentrate on aphasia, an acquired neurogenic language disorder. The aim of the course is twofold: 1. to provide a theoretical background for understanding how the brain processes language  and 2. to present morphological, syntactic, phonological, and semantic deficits in aphasia within the context of representation, access or integration problems.

Seminar 1

Morphological and syntactic deficits in aphasia

Seminar 2

Phonological and semantic deficits in aphasia