基本信息
浏览量:584
职业迁徙
个人简介
Professor Gales’ research aims to make speech systems simple and intuitive to use; achieving high levels of accuracy and naturalness. His research interests include automatic speech recognition, converting the audio waveform into text, and speech synthesis, converting text into an audio waveform. In addition he investigates various downstream tasks that these technologies enable, such as spoken language learning and assessment.
Though the deployment of speech recognition systems is becoming increasingly common, the domains in which they operate are quite limited; for example spoken search terms and transcribing broadcast news. To broaden the range of applications it is necessary to develop techniques that handle the diversity of spoken communication and the broad range of environments that these systems are required to operate in.
Speech synthesis systems have been deployed for many years. Systems are now able to deliver clear, understandable speech but they lack the ability to convey the full range of expressions found in human speech. To achieve human levels of information transfer by speech, Professor Gales' is investigating expression rich, controllable synthesis.
A fundamental aspect of both of these tasks is the need to add and exploit structure in the modeling of speech. For example by explicitly factoring a synthesis model into speaker characteristics, sentence pronunciation and sentence expression it is possible to control the exact nature of how the sentence is uttered, for example happy or angry and the speaker voice.
Though the deployment of speech recognition systems is becoming increasingly common, the domains in which they operate are quite limited; for example spoken search terms and transcribing broadcast news. To broaden the range of applications it is necessary to develop techniques that handle the diversity of spoken communication and the broad range of environments that these systems are required to operate in.
Speech synthesis systems have been deployed for many years. Systems are now able to deliver clear, understandable speech but they lack the ability to convey the full range of expressions found in human speech. To achieve human levels of information transfer by speech, Professor Gales' is investigating expression rich, controllable synthesis.
A fundamental aspect of both of these tasks is the need to add and exploit structure in the modeling of speech. For example by explicitly factoring a synthesis model into speaker characteristics, sentence pronunciation and sentence expression it is possible to control the exact nature of how the sentence is uttered, for example happy or angry and the speaker voice.
研究兴趣
论文共 524 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
arxiv(2024)
引用0浏览0引用
0
0
arXiv (Cornell University) (2024)
Exploring AI in Applied Linguisticspp.96-117, (2024)
Guanfeng Wu,Abbas Haider,Xing Tian, Erfan Loweimi, Chi Ho Chan,Mengjie Qian, Awan Muhammad,Ivor Spence,Rob Cooper,Wing W. Y. Ng,Josef Kittler,Mark Gales,Hui Wang
IET Computer Visionno. 7 (2024): 1017-1033
Interspeech 2024pp.3774-3778, (2024)
Interspeech 2024pp.3375-3379, (2024)
arXiv (Cornell University) (2024)
CoRR (2024)
加载更多
作者统计
#Papers: 523
#Citation: 23816
H-Index: 62
G-Index: 140
Sociability: 6
Diversity: 2
Activity: 72
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn