Hi there 😊, we are planning to organize this special session at ASRU 2025, in beautiful Honolulu, Hawaii! 😊
The rapid progress in speech and audio generative AI [1,2] brings exciting new opportunities, but also raises important ethical concerns, addressing responsible-use questions more critical than ever. Recent years have witnessed great developments in synthetic speech, voice cloning, singing voice synthesis, music, and sound effect generation, profoundly changing domains [3] like media production, accessibility, and human-computer interaction. However, this progress raises critical concerns about responsibility [4], including bias, misuse, accountability, and the lack of transparency in generative model operations. This session aims to bring together the speech and audio research community to address emerging challenges in responsible generative technologies.
We welcome submissions that tackle critical issues in neural watermarking [5-7], controllability [8], fairness [9], explainability [10,11], and measurement [12,13]. Ensuring precise control is vital for nuanced speech, while fairness is essential to prevent harmful biases. Enhancing explainability builds trust, and robust measurement methodologies are indispensable for evaluating system quality. Each of these areas demands rigorous attention to ensure the ethical and beneficial deployment of speech, music, and audio generative technologies. The session will highlight technical innovations, evaluation methodologies, and frameworks that contribute to safer, fairer, and more trustworthy generative AI systems. It aims to foster broader discussions around standards, best practices, and accountability in real-world deployments of speech and audio generative technologies. Ultimately, this session seeks to establish collaborative pathways and guidelines that will shape the future of responsible speech and audio generative AI.
This session seeks to advance state-of-the-art research on responsible speech and audio generation, focusing on accountability, controllability, fairness, and interpretability. It aims to bridge communities working on generative modeling and responsible AI while promoting collaboration across machine learning, ethics, human-computer interaction, and speech technology. In doing so, the session will support the development of tools, benchmarks, and evaluation protocols, fostering transparency, robustness, and inclusiveness in generative systems for speech, singing voice, music, and general audio. By encouraging dialogue around risks, governance, and the societal impact of speech and audio generation technologies, this session hopes to lay the groundwork for a more responsible future in this fast-evolving field.
Please follow the official ASRU 2025 Author Instructions for paper formatting and submission. When submitting, be sure to select our special session “SS1. Responsible Speech and Audio Generative AI” as the primary subject area to ensure your paper is considered for inclusion.
Tips: This session will focus on Responsible Speech and Audio Generative AI, including accountability, controllability, fairness, robustness of generative models, and corresponding evaluation methods. Please note that another ASRU 2025 special session, “Frontiers in Deepfake Voice Detection and Beyond”, will focus more specifically on deepfake voice detection and related techniques.
All deadlines are Anywhere on Earth (AoE), unless otherwise specified.
This special session invites work on responsible speech and audio generation AI, including TTS, voice conversion, singing synthesis, music/effects, and emerging speech/audio large-language models, to advance safety, fairness, and transparency.
We welcome submissions on a broad range of topics, including but not limited to the following:
We invite researchers and practitioners to contribute papers that explore the intersection of generative models, responsible AI, and speech/audio technologies. Let’s build a trustworthy and inclusive generative speech future—together!
If you have any questions, feel free to contact us at: respsa-genai@googlegroups.com
(The above organizers are sorted in alphabetical order by last name.)
Don’t miss it — see you in beautiful Honolulu, Hawaii! 😊
We would like to thank Dr. Lin Zhang, Prof. Xin Wang and Prof. Junichi Yamagishi for their valuable comments on neural watermarking.