VITHEA - INESC-ID
Transcrição
VITHEA - INESC-ID
VITHEA An online system for distance treatment of aphasia Annamaria Pompili, Alberto Abad, Isabel Trancoso, Jose Fonseca, Isabel P. Martins, Gabriela Leal, Luisa Farrajota Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory Outline Introduction Aphasia language disorder Classic therapeutic approaches Motivations and goals The Vithea System Architectural overview Client side Server side Evaluations and future work Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 2 Aphasia language disorder Broca's area Wernicke's area Types of Aphasias Non-fluent (a.k.a. Broca's aphasia): Fluent (a.k.a. Wernicke's aphasia): Example ”Walk dog”, meaning: “I will take the dog for a walk” Example “You know that smoodle pinkered and that I want to get him round like you want before”, meaning: ”The dog needs to go out so I will take him for a walk” Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 3 Aphasia Language Disorder Major causes: ➔ CVA, brain tumors, brain infections, car or work accidents Increasingly frequent: ➔ estimated 200.000 new cases in UE each year Economical impact: communication disorders cost the US from $154 to $186 billion per year ➔ 2.5% to 3% of the G.N.P. ➔ Social impact: ➔ interpersonal relationships alteration, loss of autonomy, social restrictions Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 4 Classical therapeutical approaches Common disorder in all aphasia syndromes: ➔ Word-retrieval problem Figure 1 Figure 2 Word–picture matching exercises Figure 1: Some images from the original Snodgrass & Vanderwart set Figure 2: Example of the object– colour decision task Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 5 Motivations and goals Frequency of therapy is essential, but... high costs of therapy ➔ reaching therapy centers can be uncomfortable and/or time-consuming ➔ Development of a Virtual Therapist for Aphasia Treatment focused on word-retrieval problem improve patients' quality of life ➔ lessen the cost for cares ➔ Main challenges: People with physical impairments ➔ simple and intuitive User Interface Complexity of ASR is exacerbated with aphasic speech: hesitation ➔ repetitions ➔ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 6 The Vithea System: architectural overview Web Browser/ Flash Application Client computer TOMCAT Server Internet AUDIMUS Engine Web Application Server (JSP/Servlet) Automatic Speech Recognition System MySql Database Management System Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 7 The Vithea System: Patient side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 8 The Vithea System: Patient side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 9 The Vithea System: Patient side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 10 The Vithea System Web Browser/ Flash Application Client computer TOMCAT Server Internet AUDIMUS Engine Web Application Server (JSP/Servlet) Automatic Speech Recognition System MySql Database Management System Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 11 The Vithea System Web Browser/ Flash Application Client computer TOMCAT Server Internet AUDIMUS Engine Web Application Server (JSP/Servlet) Automatic Speech Recognition System MySql Database Management System Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 12 The Vithea System Web Browser/ Flash Application Client computer TOMCAT Server Internet AUDIMUS Engine Web Application Server (JSP/Servlet) Automatic Speech Recognition System MySql Database Management System Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 13 The Vithea System Web Browser/ Flash Application Client computer TOMCAT Server Internet AUDIMUS Engine Web Application Server (JSP/Servlet) Automatic Speech Recognition System MySql Database Management System Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 14 The Vithea System Web Browser/ Flash Application Client computer TOMCAT Server Internet AUDIMUS Engine Web Application Server (JSP/Servlet) Automatic Speech Recognition System MySql Database Management System Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 15 The Vithea System: Patient side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 16 The Vithea System: Patient side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 17 The Vithea System: Clinician side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 18 The Vithea System: Clinician side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 19 The Vithea System: Clinician side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 20 The Vithea System: Clinician side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 21 The Vithea System: Clinician side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 22 The Vithea System: Clinician side Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 23 AUDIMUS, the speech recognition module Structure of the Audimus recognizer hybrid recognizer: combines Hidden Markov Models (HMM) with Multilayer Perceptrons (MLP) ➔ trained on 3 distinct feature sets (PLP, Rasta, MSG) ➔ acoustic models trained with ➔ 57 hours of Broadcast News downsampled at 8 kHz 58 hours of mixed mobile and fixed telephone data Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 24 Keyword Spotting approaches Acoustic match of the audio data with keyword models in contrast to a background (BG) model Large vocabulary continuous speech recognition (LVCSR) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 25 Keywords model in contrast to BG model BG model must provide: low recognition likelihoods for keywords ➔ high likelihoods for out-of-vocabulary words ➔ 2 possible acoustic matches: phoneme loop network ➔ a-posteriori probability ➔ phoneme classification network posterior probability of other phones Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 26 Large vocabulary continuous speech recognition Search for the target keyword in the recognition result ➔ it is possible to search in several hypothesis in parallel (n-bests lists, lattices, confusion networks) allows improved performance compared to searching in the raw output result training process requires large amounts of data ➔ use fixed large vocabularies, when a keyword is not in the dictionary it is never detected ➔ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 27 Preliminary evaluations 2 sub-sets of the Portuguese Speech Dat II corpus: Development set – 3334 utterances ➔ Evaluation set – 481 utterances ➔ N. of keywords is 27 ➔ ➔ promising performance indicators achieved by 1 approach in terms of Equal Error Rate (EER), False Alarm (FA), False Rejection (FR) False Rejection probability (in %) Experimental results: Detection Error Trade-off (DET) curves False Alarm probability (in %) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 28 Evaluations - corpus Evaluation data: collected from therapy sessions ➔ 8 patients ➔ each session consists of naming exercises with 103 objects per patient ➔ 2 inexpensive microphones: built-in headset and table-top microphone ➔ only the sessions recorded with the headset were considered segmentation and word-level transcriptions manually produced, totaling 996 segments ➔ the complete evaluation corpus has a duration of approximately 1 hour and 20 minutes ➔ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 29 Evaluations - criteria Correctness: word naming exercise is considered to be completed correctly whenever the target word is spoken ➔ no matter of its position or amount of silence before the valid answer ➔ Extended word list in addition to the canonical valid answer ➔ contains most frequent synonyms and diminutives ➔ total KWS vocabulary of 252 words ➔ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 30 Evaluations - results 1 ➔ Average word naming score Preliminary evaluations Global evaluation Pearson’s coefficient between human and automatic evaluation: 0.9043 Human 0.9 Auto 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ➔ Individual evaluation Remarkable performance variability in terms of FA, FR depending from the specific patient most common cause for FA: presence of many nonexistent words phonetically close to the target ones, the stressed syllable often pronounced right False alarm / false rejection rate 1 2 3 4 Patient 5 0.5 6 7 8 False alarm 0.45 False rejection 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 Patient 5 6 7 8 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 31 Evaluations - results 1 Average word naming score Customized approach: based on the user profile ➔ word detector calibrated following a 5-fold cross-validation strategy ➔ Global evaluation Pearson’s coefficient between human and automatic evaluation: 0.9652 ➔ Individual evaluation More balanced performance (in terms of FA and FR ratios) is observed for most patients Auto 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 False alarm / false rejection rate ➔ Human 0.9 2 3 4 Patient 5 0.5 6 7 8 False alarm 0.45 False rejection 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 Patient 5 6 7 8 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 32 Conclusions Speech recognition technology contributed to build up a system designed to support the recovery from a particular communication disorder. The virtual therapist has been designed following relevant accessibility principles tailored to the particular category of users targeted by the system. Early experiments conducted to evaluate ASR performance with speech from aphasic patients yielded quite promising results. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 33 Future work Implement new exercises, incorporate tools like goodness of pronunciation Providing help to the patient, both semantic and phonological Integrating Text To Speech synthesis Incorporating intelligent animated agent Extend the system for the treatment of other forms of speech disorders Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 34 Thanks for the attention http://vithea.l2f.inesc-id.pt Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa L2 F - Spoken Language Systems Laboratory http://vithea.l2f.inesc-id.pt 35