Publications

Publications

PhD Thesis

Deep Cross-Modal Alignment in Audio-Visual Speech Recognition

http://www.tara.tcd.ie/handle/2262/96649

Journals

  1. George Sterpu, Christian Saam, Naomi Harte. How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition.
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.\
    PDF Accepted version,
    IEEE version,
    bib file

  2. George Sterpu, Naomi Harte. Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech.
    Computer Speech & Language, 2022.\
    Open Access

Conferences

  1. George Sterpu, Christian Saam, Naomi Harte. Learning to Count Words in Fluent Speech enables Online Speech Recognition
    IEEE Spoken Language Technology Workshop (SLT 2021).
    [[https://arxiv.org/pdf/2006.04928.pdf preprint]]
    [[https://github.com/georgesterpu/Taris code]]

  2. George Sterpu, Christian Saam, Naomi Harte. Should we hard-code the recurrence concept or learn it instead ?
    Exploring the Transformer architecture for Audio-Visual Speech Recognition

    Interspeech 2020.
    [[https://arxiv.org/pdf/2005.09297.pdf pdf]]
    [[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/interspeech2020.bib bibtex]]
    [[https://github.com/georgesterpu/Taris code]]

  3. George Sterpu, Christian Saam, Naomi Harte. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
    2018 International Conference on Multimodal Interaction (ICMI 2018).
    Boulder, CO, USA, October 2018.
    [[https://arxiv.org/pdf/1809.01728.pdf pdf]]
    [[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/icmi2018.bib bibtex]]
    [[https://github.com/georgesterpu/Sigmedia-AVSR code]]
    [[./pics/full_av_fusion_diagram_icmidc.pdf diagram]]

  4. George Sterpu, Christian Saam, Naomi Harte. Can DNNs Learn to Lipread Full Sentences?
    The 2018 IEEE International Conference on Image Processing (ICIP 2018).
    Athens, Greece, October 2018.
    [[https://arxiv.org/pdf/1805.11685.pdf pdf]]
    [[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/icip2018.bib bibtex]]
    [[https://github.com/georgesterpu/avsr-tf1 code]]
    [[./pics/seq2seq2.png diagram]]

  5. George Sterpu and Naomi Harte. Towards Lipreading Sentences using Active Appearance Models
    International Conference on Auditory-Visual Speech Processing (AVSP 2017).
    Stockholm, Sweden, August 2017.
    [[https://arxiv.org/pdf/1805.11688.pdf pdf]]
    [[https://raw.githubusercontent.com/georgesterpu/georgesterpu.github.io/master/bibs/avsp2017.bib bibtex]]
    [[https://github.com/georgesterpu/pyVSR code]]

Trending Tags