10.30 - 12.00: Panel 2 - Transforming Localisation: Can Smart Machines Allow For Greater Human Creativity?

Caroline Baines, MESA

As the pace of technological innovation continues to accelerate across the media and entertainment industry, the localisation sector is finding itself in the midst of rapid change. AI is becoming more prolific and is being embedded in more and more workflows, so what does this mean for localisation as we know it? How can humans and technology work together to enable greater efficiencies whilst also maintaining the essence of localisation - creativity, so we can ensure that content remains compelling for global audiences.    

Caroline Baines, Senior Director of Client Services, MESA. Caroline joined MESA Europe as Director of Operations in 2018, following a 20-year career in research and consulting within the media, entertainment and technology sector. Caroline was previously with Futuresource Consulting where she led the group’s consumer research division, directing custom projects across entertainment content, consumer electronics, professional IT and storage media.  In her role as Senior Director of Client Services, Caroline oversees the European community, ensuring the special interest groups continue to work together to address key industry challenges, to raise the profile of members via MESA’s events and to offer varied channels of communication to share news, thought leadership articles and more. She also runs the Content Localisation Council and is committed to providing a platform for shared learning, networking and collaboration. Caroline holds a BSc (Hons) in Psychology from Goldsmiths College, University of London.

12.00 - 13.30: Lunch


13.30 - 15.00: Session 8 - Live subtitling quality

The quality of live captioning in the USA: caption quality task force

Łukasz Dutka, Kimberly Shea, Jennifer Schuck

After decades of advocacy in America, the Federal Communications Commission (FCC) ordered the top 25 U.S. markets to provide viewers with equal and effective access to communications in the form of live captioning. While this was a successful step toward equal access, the order failed to require captioning providers to meet certification requirements that would ensure a benchmark quality standard. In fact, there is anecdotal evidence that the captioning quality is decreasing in various TV markets. The Global Alliance of Speech-to-Text Captioning, a non-profit based in Washington, DC, created a Caption Quality Task Force in August of 2022. This task force is gathering samples of closed captioning on TV channels across the United States. This data will be evaluated for accuracy level and consistency of caption quality based on the NER Model created by Pablo Romero-Fresco. The intention is to gather live captioning samples via all methods of captioning: human-generated, via voice or steno, and automatic speech recognition. By gathering two-to-five-minute video samples of live captioning, the task force has the potential to produce an enormous amount of data for review by organizations, commissions, legislators, regulators, researchers as well as the captioning users themselves. Metrics results will be shared to aid in national as well as global advocacy efforts. 

The goals of the Task Force is to: 

  • Determine what accuracy percentage is produced by each method; 
  • Determine which methods produce the highest accuracy rate; 
  • Build a record of tangible proof to show caption quality - good and bad; 
  • Develop unbiased metrics on which policymakers can rely on implementing standards. 

We will report on the work done to date, discuss the challenges involved and share the results as well as recommendations. The Global Alliance of Speech-to-Text Captioning is an advocacy group of volunteers dedicated to its mission of universal accessibility to the spoken word via all forms of captioning. The members of the Global Alliance of Speech-to-text Captioning are made up of captioners, consumers, and advocates. The mission of the Global Alliance is simple: Universal accessibility to the spoken word via all forms of captioning.

Łukasz Dutka (presenter), PhD, is a member of the Global Alliance of Speech-to-Text Captioning, one of the founders of AVT Masterclass, and a member of the Management Board of Dostepni.eu, an accessibility services provider. As an expert on accessibility and audiovisual translation, Łukasz specializes in live subtitling and SDH. Throughout his career, he has held various roles, including an in-house subtitler for a public broadcaster, a freelance subtitler and quality control specialist for high-profile titles, and a live subtitler at significant social events. Łukasz has also been a trainer at numerous European universities and organizations, as well as an investigator in multiple accessibility-related projects. As a trailblazer on the Dostepni.eu team, Łukasz introduced live subtitling through respeaking in Poland and was instrumental in establishing a live subtitling unit for a leading broadcaster. As a member of the ILSA project team, he contributed to the development of an open course on live subtitling. With over a decade of teaching experience, Łukasz has led training programs for businesses and taught university-level courses on audiovisual translation, interlingual subtitling, live subtitling, subtitling for the Deaf and Hard of Hearing, and interpreting. He is a member of the University of Warsaw Audiovisual Translation Lab (AVT Lab), the European Society for Translation Studies (EST), and the European Association for Studies in Screen Translation (ESIST).   

Kimberly Shea (presenter), NCSP, CRC,  is a Founding Member of the Global Alliance of Speech-to-Text Captioning. She believes that captioners have needed an organization of their own for a very long time, one that represents them separate and apart from other industries. She believes that consumer representation is vital when it comes to setting standards for high-quality, accurate captions. Kimberly is passionate about advocating for her profession and for the communities she serves. She believes that we still have much to do regarding education and awareness related to equal and effective access to communications. She is excited to do that alongside those she serves.   

Jennifer Schuck (presenter), NCSP, RDR, CRC, is a Founding Member of the Global Alliance of Speech-to-Text Captioning.  She is a 20-year stenographic captioner.  From the very beginning of her career, she has advocated for consumers of captions to receive the highest quality possible.  By serving on various committees in different organizations over the years, she has encouraged captioners to enhance their skills and remain up-to-date on technology advances.  As the current chair of the Global Alliance, Jen advocates for captioning consumers by educating not only industry professionals but the general public as well about captioning and living with hearing loss.


AI and live captioning: comparing the quality of automatic and human live captions in English

Pablo Romero Fresco, Dr. Fresno-Cañada

Closed captions play a vital role in making live broadcasts/events accessible to many viewers. Traditionally, stenographers and respeakers have been in charge of their production, but this scenario is changing due to the steady improvements that automatic speech recognition (ASR) has experienced in recent years. Broadcasters and service providers are beginning to roll out this technology to produce intralingual live captions in different contexts. Human and automatic captions co-exist now in different settings and, while some research has focused on the accuracy of human live captions, comprehensive assessments of the accuracy and quality of automatic captions are still needed. This presentation will tackle this issue by introducing the main findings of the largest study comparing the accuracy of automatic and human live captions conducted to date. Through five case studies including approximately 17.000 live captions analysed with the NER model from 2018 to 2023 in the UK, the U.S. and Canada (Romero-Fresco and Fresno-Cañada, forthcoming), this presentation will track the recent developments of automatic captions, including the very latest generation of AI tools, to compare their accuracy to that achieved by humans. Beyond this, and within the framework of the Spanish-government-funded Qualisub project, the presentation will end by addressing the potential full automation of the NER model (given the issues caused by the use of the WER model) and by reflecting on what the future of live captioning looks like for both human and automatic captions.

Pablo Romero Fresco (presenter) is a Senior Lecturer at Universidade de Vigo (Spain) and Honorary Professor of Translation and Filmmaking at the University of Roehampton (London, UK). He is the author of the books Subtitling through Speech Recognition: Respeaking (Routledge), Accessible Filmmaking (Routledge) and Transformative Media Accessibility (Routledge, forthcoming). He is on the editorial board of JAT and is the leader of the international research group GALMA, for which he is currently coordinating several international projects on media accessibility and accessible filmmaking and where he works as a consultant for institutions and companies such as the European Parliament or Netflix. Pablo is also a filmmaker. His first short documentary, Joining the Dots (2012), was used by Netflix as well as film schools around Europe to raise awareness about audio description. He has just released his first feature-length documentary, Where Memory Ends (2022), which has been selected for the London Spanish Film Festival and the Seminci (Spain), and whose accessible version has been screened at special events in New York and Montreal. 

Dr. Fresno-Cañada (presenter), Assistant Professor, is the Executive Consultant of the Translation & Interpreting Office in the areas of Translation Technologies and Audiovisual Translation. She holds a BA in Translation and Interpreting, an MA in Comparative Literature and Literary Translation, and a PhD which focuses in Audiovisual Translation. Before joining UTRGV, she taught Audiovisual Translation and Accessibility to the Media at several universities in Spain, where she also worked as a freelance translator and audio describer. At UTRGV, she is the Director of the Translation and Interpreting programs, in which she teaches Translation Theory, Translation Technologies, Literary Translation, Medical Terminology, Interpreting, and Audiovisual Translation. Her research interests include Audiovisual Translation and Accessibility to the Media, mainly Subtitling, Closed Captioning, and Audio Description (accessibility for the Blind and Visually Impaired). She has delivered communications at national and international conferences and her research has been published in some of the most relevant T&I journals and books.


Human-machine performances in interlingual live subtitling with different 'Englishes'

Alice Pagano

Live subtitling (LS) finds its foundations in subtitling for the d/Deaf and Hard of Hearing, i.e. the need to produce subtitles from and to the same language with specific audio features (intralingual), and from oral into written content (intersemiotic). The interlingual variation of LS, referred to as Interlingual Live Subtitling (ILS), combines all this with the urgency of guaranteeing multilingual accessibility (interlingual) to provide full accessibility for all. 

Situated at the crossroads between Audio-visual Translation (AVT) and Simultaneous Interpreting (SI), ILS draws on both human-mediated translation and automatic language processing systems and it is currently being achieved by using different approaches and techniques, each of them requiring different degrees of human-machine interaction (HMI). One of the more human oriented ILS modes this research focuses on is interlingual respeaking, via Automatic Speech Recognition (ASR), and the combination of intra and interlingual respeaking and SI as currently used workflows to create real-time subtitles at live events. 

The proposal aims at presenting a new case study developed following the results from a Doctoral research by the University of Genoa, Italy (Pagano, 2022) and other two parallel studies (Dawson, 2021; Romero-Fresco & Alonso-Bacigalupe, 2022) on five different ILS workflows pointing out their strengths and weaknesses. 

This follow-up case study tests and compares the same five ILS methods from English to Italian on a scale with different degrees of HMI: interlingual respeaking, simultaneous interpreting + intralingual respeaking, and simultaneous interpreting + ASR being more subject to human agency in terms of input, editing and review of the transcription, while in the last two – intralingual respeaking + Machine Translation (MT), and the fully-automatic ASR + MT – the final say is given to machines, without any human monitoring. Four SI students were involved in the study and trained in intra and interlingual respeaking to perform with different roles in one source video of a real-life speech in English given by a foreign speaker. The final outputs for each of the five methods are assessed concerning linguistic accuracy and delay. Analyses are carried out drawing upon the word-based NTR Model calculation (Romero-Fresco & Pöchhacker, 2017) together with a more conceptual and semantic-oriented level of analysis, and calculating the synchronicity of the subtitles – how many seconds it takes for their broadcasting. 

In addition, the speeches used in the study are delivered in English by EFL speakers, using more of a Globish language, English as a lingua franca which is not always well pronounced and articulated as one by mother tongues. In this particular but at the same time more and more common situation of speech delivery in public speaking, it is hoped that the contribution can induce further reflection on the importance of human interaction with machine systems in providing high quality media accessibility for live events, when EFL speakers are involved.

Alice Pagano is Lecturer Professor in Spanish Language and Translation and Intercultural Communication at the University of Genoa and at University of Modena and Reggio Emilia, Italy. She holds a Ph.D. in Digital Humanities (Digital Languages, Cultures and Technologies) and a degree in conference interpreting from the University of Strasbourg, France. She has worked as an interpreter, translator and post-editor and her research areas are interpreting studies, AVT, media accessibility, live subtitling, respeaking, Machine Translation.


Live subtitling quality on Polish news tv across six quality metrics

Łukasz Dutka, Monika Szczygielska, Karolina Szeląg

The past few years saw a growth in how much live subtitling is available in Poland across television, live events and online streaming. As 108 Polish TV channels regulated by National Broadcasting Council increase their provision of subtitles, with the overall mandatory quota for access services set to rise to 50% by 2024, many channels will have to start providing live subtitles and continue doing so at scale, sustaining and improving quality so as to make sure that this service addresses the needs of the viewers. Now that the quantity of live subtilting is getting quite high, the focus should turn to ensuring fit-for-purpose quality. 

The objective of this research was to investigate the quality of live subtitling in Polish on news TV channels. The following research questions have been posed: (1) Do the Deaf and the hard of hearing viewers have effective access to news television thanks to live subtitling, that is, whether subtitles reflect the content accurately and are intelligible? (2) Does the quality of live subtitling differ between broadcasters? (3) Does the quality differ between live subtitling and semi-live subtitling? (4) Does the quality increase or decrease over time? It was hypothesised that semi-live subtitling delivers better quality than live subtitling and that the quality of live subtitling is increasing over time as broadcasters gain more experience in providing live subtitles. The research was carried out by the Media Accessibility Observatory at the University of Warsaw in partnership with Dostepni.eu and the Polish National Broadcasting Council. 

The quality of live and semi-live subtitles was analysed based on samples collected from 96 TV shows from three Polish news TV channels (TVP Info, Polsat News and TVN24) across three quarters (Q2 2021, Q3 2021 and Q1 2022). The total duration of all samples was 970 minutes and they included 13,620 subtitles. The study used established metrics of live subtitling quality: NER score (as a measure of accuracy), latency, subtitle speed and reduction rate. Two new metrics were proposed as well: segmentation score and gaps between consecutive subtitles. The speech rate in the subtitled TV shows was also measured. The results were analysed with various statistical methods including analysis of variance, analysis of covariance, correlation and linear mixed models. 

The study has found that the quality of live subtitling is quite diverse and varies between broadcasters. While the results have shown that respeaking with parallel correction is an effective method of creating live subtitles in Polish and it can be used in the broadcasting settings to obtain acceptable quality results, live subtitles for some of the TV shows had substandard quality. The subtitles were presented at excessive subtitle speeds and with high latency, Also, they were inaccurate and at times unintelligible. The study has confirmed that semi-live subtitling produces better quality results than live subtitling. However, the quality of both live and semi-live subtitling is decreasing over time in the case of all three news TV channels. On a 10-point scale, across the three quarters analysed in the study, the accuracy of live subtitling has decreased from acceptable 5/10 to substandard 2/10 for both TVP Info and Polsat News, while it has stayed at 0/10 in the case of TVN24. The latency has increased; so have the subtitle speeds as well as the number of text segmentation errors. As a comprehensive review of multiple dimensions of live subtitling quality on Polish news TV channels, the study may be the first step towards a regular monitoring of the quality of live subtitling on Polish TV. It can also serve as an inspiration on how multiple dimensions of live subtitling quality can be measured.

Łukasz Dutka (presenter), PhD, is a member of the Global Alliance of Speech-to-Text Captioning, one of the founders of AVT Masterclass, and a member of the Management Board of Dostepni.eu, an accessibility services provider. As an expert on accessibility and audiovisual translation, Łukasz specializes in live subtitling and SDH. Throughout his career, he has held various roles, including an in-house subtitler for a public broadcaster, a freelance subtitler and quality control specialist for high-profile titles, and a live subtitler at significant social events. Łukasz has also been a trainer at numerous European universities and organizations, as well as an investigator in multiple accessibility-related projects. As a trailblazer on the Dostepni.eu team, Łukasz introduced live subtitling through respeaking in Poland and was instrumental in establishing a live subtitling unit for a leading broadcaster. As a member of the ILSA project team, he contributed to the development of an open course on live subtitling. With over a decade of teaching experience, Łukasz has led training programs for businesses and taught university-level courses on audiovisual translation, interlingual subtitling, live subtitling, subtitling for the Deaf and Hard of Hearing, and interpreting. He is a member of the University of Warsaw Audiovisual Translation Lab (AVT Lab), the European Society for Translation Studies (EST), and the European Association for Studies in Screen Translation (ESIST). 

Monika Szczygielska, PhD, is a specialist in legal and practical aspects of accessibility She is also a partner of the Culture without Barriers Foundation, organizer of Week of Culture without Barriers and founder of Dostepni.eu - leading accessibility services provider in Poland. Monika has experience in managing the organization of access services as well as whole events for people with sensorial disabilities. She has been instrumental in implementing live subtitling in Poland. 

Karolina Szeląg is an experienced subtitler, a respeaker and an accessibility manager. She has worked as a live subtitler both in the broadcasting settings and at live events.

15.00 - 15.30: Break


15.30 - 17.00: Session 9 - Industry presentations AVT

Examining the impact of different workflows on quality: automatic speech recognition and post-editing in intralingual subtitling

Kaisa Vitikainen

In Finland, the law requires that the public broadcaster provides intralingual subtitles for all its programming in Finnish, except for live sports and music events. While automatic speech recognition is not yet sufficiently accurate to produce fully automatic subtitles in Finnish, it could prove to be a valuable tool for both broadcasters and subtitlers. The productivity of post-editing automatically produced subtitles has been studied, but little research exists on the impact the post-editing process has on subtitle quality. This paper examines the quality of post-edited intralingual subtitles produced in experiments conducted as part of the MeMAD project, comparing them to subtitles produced from scratch by the same participants, using an adapted version of the FAR model. Results suggest that the quality of post-edited subtitles suffers compared to subtitles prepared from scratch.

Kaisa Vitikainen is a doctoral candidate at the University of Helsinki. Their research focuses on the use of automatic speech recognition and machine translation in subtitling, particularly live subtitling and respeaking. They are also a professional live subtitler at the Finnish Broadcasting Company (Yle).


How to use AI for Subtitling Short-form video 

Maarten Verwaest

'To stand out in the creative economy, content creators are obliged to create videos with complementary captions or subtitles. For accessibility, to maximise reach, and to improve Search Engine behaviour (SEO). Because the cost and the turnaround time of conventional subtitling may be prohibitive for short-form producers, Limecraft developed a solution that automates the process of creating subtitles. Developed in collaboration with the VRT and supported by the STADIEM programme of the European Commission, the solution uses Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) to create properly styled and broadcast-quality subtitles in less than a minute. By making it available as a plugin in Adobe, Avid or Ooona, users don't need to manage exports and file transfers, nor do they waste time by copying and pasting data between apps. The presentation will include a live demo and a benchmark of the Subtitle Edit Rate (SubER)."

Maarten Verwaest is founder and CEO of Limecraft. Limecraft helps media professionals improve collaboration, to eradicate manual work and to spend more time on creative story telling. Prior to incorporating Limecraft, in his capacity as a programme manager for the R&D department of VRT (the Belgian public service broadcaster), he was involved in the introduction of several innovative technologies, including digitisation of production operations, computer assisted editing, and automatic indexing of audiovisual media. Author of several distinguished publications, Maarten is an acknowledged subject matter expert on a range of topics including the practical use of Artificial Intelligence.


At Home Signing: Increasing Accessibility for Audiovisual Content by Enabling the Recording of Signing Tracks in a non-studio environment.

Simon Hailes, Cristian Pacurar

Stellar provides a full range of features for Subtitling, Audio Description and Dubbing creation. Just like in the case of dubbing where the original audio is replaced with a translated audio version in the target language, or subtitling where the audio is being transposed into on screen text, signing provides an extra layer of accessibility by enhancing the original video with a sign language overlay.

Stellar is based on a pluggable architecture which easily allows for the addition of video recording capabilities. As a result, signers would be able to record signing tracks from the comfort of their own home without the need of for a highly specialized studio environments. Once the original video signing snippets have been recorded, they can then be adapted based on the broadcaster’s requirements.

Signapse provide a new technology based on Generative Adversarial Networks (GANs) which allows the use of the original home-made signing videos and transform them into videos which are based on  professionally recorded signers. (sign language to sign language ‘translation’).

Once the signing videos have been converted, Stellar would allow the users to adjust, preview and mix into an output as a full studio quality image.

Together we believe that this approach would increase accessibility to audio-visual media for the deaf community both by allowing more quality signed content for current budgets, but also would allow almost any signer from almost anywhere in the world to produce a studio quality signed video.

17.00 - 18.30: Session 10 - Live subtitling varia

Automatic bilingual subtitles in Spanish tv broadcast: the case of l’Informatiu – Comunitat Valenciana

Irene de Higes Andino

The improvements in automatic speech recognition (ASR) software have led to its integration in live subtitling environments. As part of QuaLiSub (The Quality of Live Subtitling: A regional, national and international study, funded by the Spanish Ministry of Science and led by Universidade de Vigo), this paper presents results on the quality of subtitles broadcasted by Spanish public television (CRTVE) in the regional news for Comunitat Valenciana. These subtitles are live, intralinguistic, not human-edited, and bilingual. 

The analysis undertaken has measured accuracy by two methods. Under the framework of a research contract between Universitat Jaume I and CRTVE, the percentage of errors in transcription (substitution, deletion, and insertion) was calculated following the Word Error Rate (WER, US National Institute of Standards and Technology). In the context of QuaLiSub, accuracy was estimated using the NER model (Romero Fresco & Martínez, 2015). We considered, on the one hand, to which extent errors affect the coherence of the subtitled text and, on the other hand, how many correct editions (recognitions, in the case of automatic subtitles) were included. 

Preliminary results show that, from the perspective of user comprehension, these automatic bilingual subtitles are not yet acceptable. Although some subtitles have excellent linguistic accuracy (better in Spanish than in Catalan), the coherence of the subtitled text is generally unsatisfactory. Code switching is so frequent in the corpus that the software needs about 5 seconds to transcribe in the correct language - this increases subtitle delay. Besides, subtitle speed exceeds Spanish UNE 153010 (AENOR, 2012) recommendation of 15 CPS, dual speakers in a subtitle are not indicated, and text segmentation is poor. In consequence, our conclusion is that the understanding of users may be compromised. 

AENOR (Agencia Española de Normalización). (2012). Subtitulado para personas sordas y personas con discapacidad auditiva (UNE 153010:2012). 

Romero Fresco, Pablo & Juan Martínez. (2015). Accuracy Rate in Live Subtitling: the NER Model. In Jorge Díaz Cintas & Rocío Baños Piñero (Eds.), Mapping an Ever-changing Landscape (pp. 28-50). Palgrave Macmillan.

Irene de Higes Andino. Bachelor’s degree on Translation and Interpreting by the Universitat Jaume I (Castelló de la Plana, Spain) and PhD on Translation and Interpreting by this same university with a thesis on dubbing and subtitling multilingual films into Spanish (http://hdl.handle.net/10803/144753). She has worked as a production assistant in a dubbing studio and as a freelance translator specialised in articles about cinema, dubbing and voiceover for TV, subtitling and audiodescription. She has also taught at the Valencian International University (VIU), the ISTRAD and the European University of Valencia. She is now a full-time lecturer and researcher of the Translation and Communication department at Universitat Jaume I and member of the research group TRAMA (Translation and Communication in Audiovisual Media). She mainly teaches audiovisual translation (voice-over, dubbing and subtitling) and audiovisual accessibility (audio description for the Blind and Visually-impaired and Subtitling for the Deaf and Hard-of-Hearing). Her research interests focus on multilingualism, identity, audiovisual translation and accessibility.


Designing an upskilling course in interlingual respeaking for language professionals. Insights and recommendations from the SMART project.

Elena Davitti and Annalisa Sandrelli

The ESRC-funded SMART Project at the University of Surrey (Shaping Multilingual Access through Respeaking Technology, ES/T002530/1, 2020-2023), explored the human (cognitive, procedural and interpersonal) competences involved in interlingual respeaking, a human-centric workflow for live speech-to-text communication across languages via speech recognition (SR) software (Davitti & Sandrelli 2020).

A core part of our research objectives was the design and refinement of an upskilling course in interlingual respeaking for language professionals from a variety of backgrounds (interpreting, translating and/or (live) subtitling). As part of our experiment, we provided (n=51) language professionals with an opportunity to add interlingual respeaking to their body of expertise through an advanced introduction to this practice. This enabled us to test our innovative approach and measure its impact on participants’ skills acquisition, cognitive abilities, performance, and satisfaction. In this presentation, we will characterise our upskilling design and share some key findings on the progress and final output achieved by our participants along with their feedback on the course. This will inform recommendations for future interlingual respeaking upskilling.

Run during the Covid pandemic, the four-week course was fully remote and totalled 25 hours of learning, followed by a week of testing in both intralingual and interlingual respeaking. A rigid recruitment process ensured that all the selected participants would have a solid baseline of skills and experience on which our upskilling course was designed to build.

Unlike other approaches to training, where intralingual respeaking is taught in full before interlingual respeaking is introduced, we adopted a blended approach, where participants learned both techniques as they proceeded through different modules, alongside a third strand where they developed their knowledge and use of SR as the course progressed. Using scaffolding, we broke down the respeaking process into core procedural components that could be taught progressively. Each exercise introduced a new component for the participants to develop, allowing them to gradually and incrementally progress from listening and speaking/translating simultaneously to the full technique of respeaking.

Davitti, E. and A. Sandrelli (2020) ‘Embracing the complexity: a pilot study on interlingual respeaking’, Journal of Audiovisual Translation 3(2): 103-139, European Association for Studies in Screen Translation.

Romero-Fresco, P., and Pöchhacker, F. (2017). Quality assessment in interlingual live subtitling: The NTR Model. Linguistica Antverpiensia,16,149–167.

Elena Davitti is Associate Professor at the Centre for Translation Studies, University of Surrey (UK). Her research interests include hybrid modalities of spoken language transfer, methods for real-time interlingual speech-to-text and how increasing automation of these processes would modify human-led workflows. Elena is leading the ‘SMART’ project (Shaping Multilingual Access with Respeaking Technology, 2020-2023, ESRC UK, ES/T002530/1) on interlingual respeaking with an international consortium of collaborators and advisors from academia (UNINT Rome, University of Vigo, University of Roehampton, University of Vienna, University of Antwerp, Macquarie University) and from the industry (Ai-Media, SUB-TI, Sky). Elena has also published on interactional and multimodal dynamics of interpreter-mediated interaction, and she has been co-investigator on several EU-funded projects on video-mediated interpreting (AVIDICUS 3, SHIFT in Orality) and innovations in interpreter education (EVIVA, WEB-PSI). Elena has served on the boards of projects and organisations in her fields of research (e.g. ILSA Advisory Board, GALMA, IATIS).

Annalisa Sandrelli teaches Dialogue Interpreting and Interlingual Respeaking (English> Italian) in the Faculty of Interpreting and Translation of UNINT; before UNINT, she taught at the universities of Hull (UK), Trieste and Bologna/Forlì. She has published widely on Audiovisual Translation, Interpreting Studies, Computer Assisted Interpreter Training (CAIT), Legal Interpreting/Translation and Legal English. She has participated in national and international projects on AVT (DubTalk, TVTalk, ¡Sub!: Localisation Workflows th(at) Work) and on respeaking (LTA- Live Text Access, as consultant Quality Manager on behalf of Sub-Ti Ltd; ILSA - Interlingual Live Subtitling for Access, Member of the Advisory Board). Current projects: ¡Sub!: Localisation Workflows that Work! 2 (Lead Investigator) and SMART Shaping Multilingual Access Through Respeaking Technology (International Co-investigator). A professional interpreter and subtitler, member of EST (European Society for Translation Studies), ESIST (European Association for Studies in Screen Translation), GALMA (Galician Observatory for Media Accessibility) and AIA (Associazione Italiana di Anglistica).


Shortcomings in automatic closed captions and subtitles in academic video presentations

Mª Azahara Veroz-González, Pilar Castillo Bernal

The application of machine translation to subtitling has been put to test in recent years with the controversy around Netflix’s series Squid Game, while ongoing projects on technology and audiovisual translation (European Commission, 2022; Díaz-Cintas and Massidda, 2020) show that the topic is taken more and more seriously by researchers, professional translators, and European Institutions. In their most recent conference, the European Society for Translation Studies (EST, 2022) advised virtual presenters to record their papers and use automatic captioning to ensure accessibility. As to academic translation, post-editing is now a reality both for professional translators and for academics in general (Parra and Goulet, 2021). In view of these developments and of the increasing number of academic events being recorded or held online since the onset of the COVID19 pandemic, the present work combines automation processes in audiovisual translation (speech recognition software, automatic and machine-translated subtitles, and captions) and academic texts, more specifically, video presentations. The research questions are whether the automatic generation of captions and subtitles is functional to ensure accessibility in academic events such as the EST22 conference and how much post-editing effort would such contents require in case a translation of the subtitles is to be applied. The research method comprises several phases. Firstly, in a corpus of video presentations of specialised content in English, subtitles were generated automatically using Youtube Studio in order to ascertain the general quality and the type of errors generated in the automatic transcription. These subtitles were corrected and annotated considering the following parameters: a) post-editing time; b) type of error; and c) severity of the error. In this way, we were able to determine whether the quality of the subtitles originating in SL was adequate. Secondly, the subtitles generated by Youtube Studio and corrected were machine-translated into Spanish. Furthermore, errors detected in the machine translation of the subtitles (English - Spanish) were analysed following Multidimensional Quality Metrics (MQM), a translation quality assessment framework that allows researchers to adapt their own parameters for their quality assessment purposes. Furthermore, we studied the reception by a potential audience, as evaluated by academics from the same field of expertise. For this, a mixed-method approach was used: a) with the tool SDL Trados Studio 2011, and specifically with the Qualitivity plugin, to manage the data related to productivity and quality and for measuring post-editing effort and b) with the human evaluation of subtitles in the form of comments, following the recommendations of Läubli et al. (2020). The results of this study have multiple applications: as proof of the shortcomings of automation processes in the accessibility of academic video presentations and as an indication to post-editors (both professional and non-professional translators) of the type of errors the use of such processes may generate. 

* The research presented in this study has been (partially) carried out in the framework of the research project “Training app for post-editing neural machine translation using gamification in professional settings” (reference number TED2021-129789B-I00).​

Mª Azahara Veroz-González (presenter) holds a PhD in Languages and Cultures (2014), specialising in Translation, from the University of Córdoba (Spain). She is currently a lecturer in the Department of Language Sciences at the University of Cordoba, where she teaches French as a foreign language. She also teaches Translation and ICTs in the Online Master's Degree in English Studies (OMIES). Her research interests focus on new technologies applied to translation, specialised translation and foreign language teaching. Her work has been published in prestigious journals (Meta, Panace@, Delta, etc.) and publishing houses (Peter Lang, Comares, Dykinson, etc.). She has participated in numerous projects, including UCOTerm: a website funded by the University of Cordoba (Spain) dedicated to resources for scientific and technical translation, of which she is the coordinator. She is a member of the Oriens HUM-940 research group, and co-director of the journal Hikma: Estudios de Traducción. 

Pilar Castillo Bernal (presenter) is an Associate Professor of Translation at the Universidad de Córdoba (Spain). Her research focuses on specialised translation, machine translation and didactic audiovisual translation. She is the chief editor of the journal Panace@. She has been a visiting scholar at the University of Wolverhampton (UK), the University of Hildesheim (Germany), Lingnan University (Hong Kong) and Smith College (EE. UU.).


Subtitling on Spanish TV: a quality assessment

María Rico-Vázquez

Throughout history, live subtitling has been produced with multiple techniques, and respeaking has become one of the most popular methods worldwide. This has been influenced, primarily, by considerable advancements in speech recognition software (SRS) technologies, which constitute the basis of respeaking as they enable the real-time conversion of speech to text on the screen.

This evolution of new technologies along with the prevailing immediacy in our everyday lives has contributed to the full automation of live subtitling. Despite the complexities inherent to real-time subtitling production, artificial intelligence has enabled significant progress in the field of automatic speech-to-text software, and this has revolutionized the industry. Although there is still no completely automatic SRS that can generate flawless subtitles, the quality of some of them is quite outstanding, and this raises great expectations. Automatic technologies have gradually become an expanding tool in subtitling, and, as a manifestation of this new reality, fully automatic subtitles are being broadcast on Spanish TV.

Just as there are several techniques to generate real-time subtitles, numerous accuracy models may be used to evaluate their quality. The problem is, however, that they are not all founded on the same principles, and not all companies or countries operate according to the same standards. Consequently, non-comparable outcomes are actually compared and, while figures are provided, subtitling quality might actually take a backseat.

Traditionally, quality analyses have been conducted using the WER Model, which is based on the precept 'difference from the original equals error'. Therefore, the assessment is built upon the literalness of the final text in comparison to the original, neglecting error severity and users’ comprehension. In contrast, the NER Model was developed with a viewer-centered perspective, focusing on the extent to which errors impair the subtitles’ coherence or the original meaning. Owing to such disparities in perspective, the results obtained with these models are indeed different, and this must be considered when interpreting the data.

This presentation is framed within QuaLiSpain, the first large-scale study about live subtitling quality on Spanish TV, and QuaLiSub, a project funded by the Spanish Ministry for Science and Innovation that aims to analyze live subtitling quality in the US (English) and Spain (Spanish and some co-official languages, namely Galician, Catalan and Basque). More than 1,000 minutes of audiovisual material in Spanish and Galician with respoken or automatic subtitles have been analyzed so far. For the majority of samples, the NER Model was used, whereas about 300 minutes were analyzed with the WER Model.

The aim of this research is to offer an overview of the quality of current intralingual live subtitles on Spanish TV, specifically in Spanish and Galician. In addition, a comparison between the results obtained with the WER and NER model for some samples will be offered. This comparison will reveal potential discrepancies in the accuracy calculation between both models for the samples under consideration, and it will expose the importance of selecting the most appropriate quality assessment tool to improve media access.

María Rico-Vázquez is a PhD student in Communication at the Universidade de Vigo and her research focuses on intralingual live subtitling for television in Spain, including both respoken and automatic subtitling. She holds a Bachelor’s degree in Translation and Interpreting (2013-2017) and a Master’s degree in Multimedia Translation (2017-2018), both from the aforementioned university. She has worked along with the Spanish Centre of Subtitling and Audiodescription (CESyA) and the Spanish public TV broadcaster (TVE) assessing the quality of respoken and automatic live subtitling, respectively. She is part of two Spanish-government-funded projects dealing with the quality of real-time subtitling on television: QuaLiSpain and QuaLiSub. Her studies have also allowed her to work as part of the UVigo team for ILSA, a EU-funded project focused on interlingual respeaking. Currently, she is a member of the international research group GALMA. She has been recipient of predoctoral funding from the Universidade de Vigo and the Xunta de Galicia, and she has received two Academic Excellence awards granted by these two institutions.