Publications
PhD Thesis
Deep Cross-Modal Alignment in Audio-Visual Speech Recognition
http://www.tara.tcd.ie/handle/2262/96649
Journals
George Sterpu, Christian Saam, Naomi Harte. How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.\
PDF Accepted version,
IEEE version,
bib fileGeorge Sterpu, Naomi Harte. Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech.
Computer Speech & Language, 2022.\
Open Access
Conferences
George Sterpu, Christian Saam, Naomi Harte. Learning to Count Words in Fluent Speech enables Online Speech Recognition
IEEE Spoken Language Technology Workshop (SLT 2021).
[[https://arxiv.org/pdf/2006.04928.pdf preprint]]
[[https://github.com/georgesterpu/Taris code]]George Sterpu, Christian Saam, Naomi Harte. Should we hard-code the recurrence concept or learn it instead ?
Exploring the Transformer architecture for Audio-Visual Speech Recognition
Interspeech 2020.
[[https://arxiv.org/pdf/2005.09297.pdf pdf]]
[[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/interspeech2020.bib bibtex]]
[[https://github.com/georgesterpu/Taris code]]George Sterpu, Christian Saam, Naomi Harte. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
2018 International Conference on Multimodal Interaction (ICMI 2018).
Boulder, CO, USA, October 2018.
[[https://arxiv.org/pdf/1809.01728.pdf pdf]]
[[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/icmi2018.bib bibtex]]
[[https://github.com/georgesterpu/Sigmedia-AVSR code]]
[[./pics/full_av_fusion_diagram_icmidc.pdf diagram]]George Sterpu, Christian Saam, Naomi Harte. Can DNNs Learn to Lipread Full Sentences?
The 2018 IEEE International Conference on Image Processing (ICIP 2018).
Athens, Greece, October 2018.
[[https://arxiv.org/pdf/1805.11685.pdf pdf]]
[[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/icip2018.bib bibtex]]
[[https://github.com/georgesterpu/avsr-tf1 code]]
[[./pics/seq2seq2.png diagram]]George Sterpu and Naomi Harte. Towards Lipreading Sentences using Active Appearance Models
International Conference on Auditory-Visual Speech Processing (AVSP 2017).
Stockholm, Sweden, August 2017.
[[https://arxiv.org/pdf/1805.11688.pdf pdf]]
[[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/avsp2017.bib bibtex]]
[[https://github.com/georgesterpu/pyVSR code]]