Automatic Speech Recognition in Telephone Relay Services and in Captioning Services
WFD and IFHOH Joint Statement:
The World Federation of the Deaf (WFD) and the International Federation of Hard of Hearing (IFHOH) have made a joint Statement on the use of Automatic Speech Recognition (ASR) with telecommunication relay services (TRS) and also captioning services. Artificial Intelligence (AI) software options are becoming more common these days where it is used as a 'speech to text' tool in the home (e.g.; Alexa and Google home), at work and also for searching information (e.g.; Siri with iPhones). ASR shows promise in having a positive impact on communication and accessibility for deaf and hard of hearing people; however there are strong concerns about ASR being used without human support in the provision of TRS and caption services.
TRS and Captions, allows deaf and hard of hearing people, who cannot understand voices on the telephone, to access telephone services, which is part of the important social infrastructure supporting people’s daily life. TRS is important for enhancing employment potential and social integration of deaf and hard of hearing people. The same applies to access to information via captioning.
Although WFD and IFHOH welcome and encourage research and development on ASR to support deaf and hard of hearing persons, we believe that at the current stage, ASR technologies are not yet ready for replacing human operators and more research and development needs to be done before it becomes truly useable. It is advised therefore that a TRS and captioning by human should be given a priority.
WFD and IFHOH advice Governments of countries where telephone relay services is being considered, that they should not introduce ASR without human relay operator support at this time. Although ASR in the future may improve to the point that it can replace human relay operators and captioners, this would perhaps take many years and would require verifiable evidence that ASR accuracy and service quality attains high success rate as well as significant user community support.
Released 29 April 2019
T-Meeting’s response to the WFD and IFHOH joint statement
T-Meeting is driven by the philosophy that people who are Deaf, DeafBlind, have speech disabilities or hearing impairments should have the same communication independence as everyone else and be integrated into the global telephone network. It provides multi-media total communications (audio, video, real-time text) software in accordance with international standards for use on mass market devices plus all of the supporting cloud-based network infrastructure . It also delivers advanced Artificial Intelligence (AI) based text to speech and speech to text services in multiple languages. Tera is an Automatic Speech Recognition (ASR) system that supplements total communications to deliver a complete communications solution for all user groups.
Today the world’s largest market for captioned telephone service is the USA where in 2018 there were five certified providers of CTS:
- CaptionCall LCC (a wholly owned subsidiary of Sorenson Communications Inc);
- Clear Captions LLC;
- Hamilton Relay Inc. (CapTel™ reseller);
- Mezmo Corporation , d/b/a/ InnoCaption; and
- Sprint Corporation (CapTel™ reseller).
a), b), c), and e) use human revoicers and d) uses humans and computer assisted real time captioning.
Internet Protocol Captioned Telephone Service (IPCTS) in the USA is funded through the Federal Communications Commission with an uncapped budget. The budget was increased from approximately USD620m to USD999m in the 2018-19 year.
The compensation rate proposed by the FCC in the USA for human assisted IPCTS was reduced to USD1.58/minute for the 2019-20 year. The FCC proposed a rate of USD0.49/minute for ASR IPCTS. T-Meeting considers the rate for ASR based IPCTS to be fair and reasonable. The financial impact of ASR on the U.S. IPCTS market and the incumbent providers is obvious. Due to the threat of ASR to existing IPCTS providers using legacy systems arguments are being put forward in an apparent effort to generate fear, uncertainty and doubt about ASR technology in the areas of speed and accuracy, even though human assisted speech recognition systems are themselves less than perfect.
T-Meeting agrees with the WFD and IFHOH statement that the speech recognition ability of products like Alexa, Google home and Siri show promise. However, our own examination of these and many other systems is that they are not suitable for telephone equivalent communication. For that reason T-Meeting developed its own Artificial Intelligence (AI) system, Tera, from the ground up for both speech to text and text to speech. Tera supports both services at the same time if required for those people that are both hard of hearing and have a speech disability.
Two other countries that have used CTI Corporation’s CapTel™ IPCTS technology are Australia and New Zealand.
The use of CapTel ™ by the Australian National Relay Service appears to be discontinued and to be replaced by a web browser based service. This service requires a cumbersome user login, password and Captcha response that would not be tolerated by hearing-oral people when making a phone call. Captcha is inappropriate technology for a website that should be accessible to all communications disabled users including those with low vision. A “kill Captcha” movement existed in the Australian disabled community as far back as the M-Enabling Access and Inclusion conference hosted by the Australian Communications Consumer Action Network (ACCAN), Sydney, 14-15 August 2013.
The Ministry of Business Innovation & Employment that oversees the Telecommunications Relay Service including the Video Interpreting Service New Zealand proposes to phase out CapTel™ from 30 June 2020 in favour of digital text based relay services on everyday devices like mobile phones, tablets and computers.
T-Meeting provides ASR IPCTS services in Sweden (Swedish), and Norway (Norwegian Bokmal). These two languages are more difficult to transcribe than English due to the three extra vowels in these languages. Its service is also presently available in U.S. English and other accents and languages, including those that require diacritics, can be added according to demand.
The average monthly number of minutes of use per CapTel™ user in New Zealand has steadily declined according to the latest data publically available. By contrast, the average monthly usage of Tera per user in Sweden is now 4.4 times that of the average New Zealand CapTel™ user and the ratio is growing. This is an indicator of the superiority of T-Meeting’s ASR IPCTS over human assisted IPCTS.
Compared to legacy systems with a human intermediary T-Meeting’s ASR system delivers:
- Virtually instantaneous transcription (human assisted systems introduce a lag of at least two seconds between the delivery of audio followed by captions and that lag extends when the human intermediary makes corrections to the output of the voice recognition engine. When the lag approaches 4 seconds users find the service unusable);
- Contextual spelling correction in a fraction of a second;
- Privacy, there is no human third party listening to and revoicing call content. No call content information is stored anywhere in the T-Meeting cloud infrastructure. However, captions of each call are stored on the user’s device in the Call History (Dialled, Missed, Received) until deleted by the user.
T-Meeting will be exhibiting at the WFD Congress in Paris 23-27 July 2019 and welcomes delegates to visit its booth and witness for themselves the quality of Tera ASR as already enjoyed by users in Sweden and Norway as well as demonstrations in English.
We Make it Possible!