Spoken Language Systems Lab (L2F)
Transcrição
Spoken Language Systems Lab (L2F)
technology from seed" Spoken Language Systems Lab (L2F) Isabel Trancoso Research Unit: Interactive Intelligent Systems 2 L2F Spoken/ multimodal dialog systems E-Health E-Learning Rich transcription of multimedia documents Speech-tospeech machine translation 3 Multimodal dialog systems Entrainment in Bus Information Systems 4 L2F Spoken/ multimodal dialog systems E-Learning E-Health Rich transcription of multimedia documents Speech-tospeech machine translation 5 Rich transcription [anchor 150] Boa tarde o governo considera que as medidas de austeridade aprovadas e em vigor. Só para já adequadas às necessidades Binanceiras de Portugal. O ministro das Finanças mostra-‐se conBiante com as metas traçadas no programa de Estabilidade e Crescimento. Apesar de não fechar as portas à hipótese de medidas adicionais de controlo orçamental, em dois mil e doze. É desta forma que Teixeira dos Santos responde a pressão dos países da moeda única, querem que Portugal e Espanha avança com mais medidas de austeridade, dentro de ano e meio. boa tarde o governo considera que as medidas de austeridade aprovadas e em vigor só para já adequadas às necessidades Binanceiras de portugal o ministro das Binanças mostra-‐se conBiante com as metas traçadas no programa de estabilidade e crescimento apesar de não fechar as portas à hipótese de medidas adicionais de controlo orçamental em dois mil e doze é desta forma que teixeira dos santos responde a pressão dos países da moeda única querem que portugal e espanha avança com mais medidas de austeridade dentro de ano e meio ainda em mês passou diz que o governo decidiu apertar o cinto aos portugueses e já europa vem pedir mais para depois de dois mil e onze o ministro das Binanças não fecha a porta, mas defende cada ano a seu tempo acho que estamos de em condições de alimentar digamos confessa estar conBiantes de que o objectivo para dois mil e dez vai ser conseguido com as medidas adicionais que foram entretanto já decididas [spk 2000] Ainda em mês passou diz que o Governo decidiu apertar o cinto aos portugueses e já Europa vem pedir mais para depois de dois mil e onze. O ministro das Finanças não fecha a porta, mas defende cada ano, a seu tempo. [spk 1000] Acho que estamos de em condições de alimentar, digamos confessa estar conBiantes, de que o objectivo para dois mil e dez, vai ser conseguido com as medidas adicionais que foram entretanto já decididas. Tópicos: Política; Economia; Nacional Língua: Português (Europeu) • • • • • • On line captioning at RTP since March 2008 WER = 12% for displayed subtitles Latency: 3.5s + 3s Meeting browser, Lecture browser, Courtroom transcriptions Other languages: English, Spanish Other varieties: Brazilian and African Portuguese 6 Rich transcription European projects User environment All Feeds Topic User Collection 7 L2F Spoken/ multimodal dialog systems E-Health E-Learning Rich transcription of multimedia documents Speech-tospeech machine translation 8 Speech-‐to-‐speech machine translation Fig. How to Use Multili Cooperation with Carnegie Mellon university ! 9 L2F Spoken/ multimodal dialog systems E-Health E-Learning Rich transcription of multimedia documents Speech-tospeech machine translation 10 REAP.PT Cooperation with Carnegie Mellon Univ. 11 Serious games A B C D E 12 L2F Spoken/ multimodal dialog systems E-Health E-Learning Rich transcription of multimedia documents Speech-tospeech machine translation 13 E-‐Health AVOZ Elderly Speech Recognition IC4U Decision support system for preventing Intensive Care Unit readmissions 14 VITHEA Virtual Therapist for Aphasia Treatment 15 Other projects • Voice coaching for reduced stress • Enhancing the European Linguistic Infrastructure • MISNIS -‐ Intelligent Mining of Public Social Networks’ InBluence in Society (NEW) • Music Information retrieval – FADO identiBication: 95.8% 16 L2F Spoken/ multimodal dialog systems E-Health E-Learning Rich transcription of multimedia documents Speech-tospeech machine translation 17 Activities related with COST 1206 • Master Thesis (Joana Correia) – Anti-‐spooBing: speaker veriBication vs. voice conversion – Joint work with Alberto Abad & Gopala Anumanchipalli – Ack: Haizhou Li & ZhiZheng Wu, Infocomm Research, Singapore • PhD thesis (José Portêlo) – Privacy preserving speech processing – Co-‐supervision: Bhiksha Raj • Suspect – Secure Speech Technologies – Funded by National Science Foundation (FCT) – 2012-‐2014 • JOINT Carnegie Mellon/INESC-‐ID Activities in privacy preserving speech processing 18 Motivation 19 Your Voice Recordings are Forever! • Can you imagine the following happening 20 years from now? – Finding recordings of yourself saying things you never spoke? – Your (authentic) voice saying incriminating things you never really said – You voiceprints being used to impersonate you – Or even questions you posed to remote systems returning to embarrass you decades later • All of this is possible – Each time you use a voice-‐based service, the service stores your voice recordings – There is no time limit on when your recordings can be abused • Tomorrow, or 20 years from now.. 20 Privacy-‐preserving voice processing • The system never sees clear-text version of your voice – All prior risks eliminated • While still performing voice-processing tasks – Mining – Recognition – Authentication.. • How? • Work so far: Privacy-preserving speaker authentication 21 Assumptions • Speaker possesses a smartphone or computation-‐capable device • Communication channel between system and user is secure – Eavesdroppers not a concern – Goal is to protect the user from the system 22 Privacy Preserving Speaker Authentication • To protect the user we require the following: – The system should not access the user’s audio, or features derived from it. – The system should not possess a model of the user’s speech. • These almost-‐paradoxical sounding requirements may be assumed for other forms of secure biometrics as well. 23 Privacy Preserving Speaker Authentication Proposed Solutions: • Secure Multiparty Computation (SMC) – Homomorphic encryption based protocols – Garbled circuits • Locality Sensitive Hashing (LSH) • Secure Binary Embeddings (SBE) Bold items = on-‐going work 24 SMC with homomorphic encryption • Employ conventional speaker authentication algorithms – Bayesian classiBier with Gaussian mixture distributions for speaker and imposters – ClassiBier trained from enrollment recordings • “Secure” algorithm through SMC protocol – User and system repeatedly exchange partial results through elaborate protocols – Partial results are obscured from one another • Via partially homomorphic encryption, additive masking, oblivious transfer, etc. • Problem: Highly inefBicient – 10,000 x slower than clear-‐text operation 25 Locality Sensitive Hashing • Convert authentication to a nearest neighbor search – Compare test recordings to previously stored enrollment recordings • Perform nearest-‐neighbor search using LSH – All data obscured by user through a combination of LSH and symmetric-‐key encryption prior to sending them to the system • BeneBits: – Very efBicient • Less than 10x slowdown – Computationally inexpensive • Problem: Inaccurate – Nearest neighbor solutions not accurate enough for robust authentication 26 Secure Binary Embeddings • Scheme for converting vectors to bit sequences (or hashes) using band-‐quantized random projections • Produces an LSH-‐like method with interesting properties: – If dE(x, x′) ≤ f (∆), then dH (q(x), q(x′))∝dE (x, x′) – If dE(x,x′) > f(∆), then q(x) and q(x′) provide no information regarding dE(x, x′) • Based on the concept of Universal Quantization: 27 Secure Binary Embeddings • SBE behavior (L -‐ vector dimension, M -‐ number of bits): • SBE are uninformative about vectors that are far apart • But can compute distance between close vectors – The Hamming distance between SBEs of vectors approximates Euclidean distance between vectors 28 Authentication using SBE • Convert features derived from audio recordings to SBEs – SBEs are uninformative • User only transmits SBEs to system – Parameters A and w used to compute SBEs are user’s private keys • Binary classiBier trained from enrollment recordings – SVM classiBier – Replace the conventional RBF Kernel with modiBied kernel • k(x,x′) = e−γ·dH2(q(x),q(x′)) – Employs Hamming distance between SBEs • Authentication phase: system works on SBEs from test data 29 Experiments using SBE • Small corpus (Yoho, 138 speakers) • Features: Gaussian mean “supervectors” based on MFCCs (39 coeffs) – A supervector is a concatenation of means from a GMM – SBEs are computed from supervectors (on user’s client device) 30 Speaker authentication with SBE • InsigniBicant degradation w.r.t. conventional (“public”) authentication • But user’s privacy is retained – System can only engage with user using SBEs generated with user’s own keys – Security == security of storage of user’s keys (A, w) 31 Continuing the Work: Garbled Circuits • SBEs are efBicient, but do not generalize – All classiBier training data (positive and negative enrollment data) provided by user – Not appropriate for other speech processing tasks • E.g. Keyword spotting or recognition, where the system trains models • Garbled circuits – Enable computation of conventional models privately – Cast all computation as Boolean circuits, “privatize” circuit through “garbling” – Challenges: EfBicient design of circuit – Current work: GCs for authentication 32 Conclusions and current work • With increasing use of voice services comes the need for protecting user privacy – Protecting user’s voice data from abuse • Can be achieved through privacy-‐preserving voice processing – For a marginal reduction in performance – The reduced performance is a small price to pay for keeping a user’s identity secret. • Continuing work: addressing challenges – Design of appropriate mechanisms for different tasks – EfBiciency, efBiciency, efBiciency • Most tasks feasible, but computationally challenged • Some tasks such as full-‐scale recognition may remain impossible 33 34