Jump to content

Draft:Near-instantaneous Voice Translation

From Wikipedia, the free encyclopedia

Translation in real-time, or more aptly called "near-instantaneous" translation, is a process carried out by specialised software though is not yet operational. It would require the simultaneous high quality execution of 3 main steps for each sub-second group of words spoken. These are:

  1. speech recognition and discourse analysis,
  2. translation (using translation memory and rules)
  3. speech synthesis, potentially through natural language generation processes and techniques (certain metaphors, comparisons or expressions have a clear meaning in one language, but cannot be literally translated into other languages).

Ideally in real-time (or near-real-time), the system should also provide a score for the assumed (i.e. calculated) quality of the translation, measured by the word error rate (WER), the standard unit of measurement for the performance of a speech recognition system.

Listeners could also be notified that several translations are possible, in cases of doubt, by a word being vocally superimposed. An audio or visual signal could also indicate the probability of a good translation.

For a long time, translation in real-time was considered technically impossible, given hardware and software limitations.

From fiction[edit]

Near-instantaneous or instantaneous (and even telepathic) voice translation has often been dreamt up or imagined by science fiction authors, sometimes in the form of a "universal translator" enabling a foreign or even extraterrestrial language to be translated. Some have even imagined systems for communicating with animals.

The Babel Fish is an imaginary species of fish invented by Douglas Adams, the author of The Hitchhiker's Guide to the Galaxy. Once stuck in the ear, one can understand any language, which brought rise to war within the story.

Challenges[edit]

Current Projects[edit]

  • In early 2009, a project was underway in Japan to equip a mobile phone with an automatic multilingual translator. The initial aim of the project was to display translations of simple words and phrases autonomously (i.e. without depending on a server) spoken in Japanese or in other languages on the phone's screen in a matter of seconds.
  • On February 7, 2010, Google announced a speech-to-speech translation application[1]. According to an article in the Times, Google was preparing to embed an integrated system of speech recognition and machine translation into a mobile phone. However, the expectation was that the system would work correctly within only a few years, says Franz Och, head of translation at Google, who believes that mobile phones should favour translation output, as it is more likely, a priori, to be able to recognise and potentially "learn" the voice and language of its owner or frequent user (as long as they are not sick, inebriated, impaired, or muffled by ambient noise). Google benefits from the experience of its online translator (which at the beginning of 2010 was more or less translating 52 languages in text form). Google can also record and keep track of the voice of mobile phone users when they make voice queries on its search engine. This would make it easier for the translation system to understand the speaker's voice. It would even be theoretically possible to imitate the timbre of the voice, or feelings expressed (e.g. anger) when rendering the translation by voice synthesis. Google is also well placed to make use of its enormous database of translated websites and documents.

Prospects[edit]

  • The perfect universal translator will remain science fiction for a long time to come, if not forever, but various direct or derived uses for voice translators seem plausible for the years and decades to come, particularly with peer production techniques that could facilitate their insertion into productivity software.
  • Live subtitling (for the deaf and hard-of-hearing on TV or cinema screens, or through special glasses, for example).
  • Subtitling translated from the sound track of a video recording or from a recording made by a Dictaphone.
  • In the one room, or during a guided tour, different listeners could listen to the same lecturer or commentator in their own language, via an earpiece or headset.
  • Speech assistance (via a mobile phone or direct translator for people with a speech impediment).
  • Possible misuse. For example, in the long term there is a risk that a person's timbre, tone and voice could be reconstructed well enough to simulate his or her voice, potentially with ill intent.
  • Such a translation tool, depending on how it is used, could either slow down or facilitate language learning, and potentially promote the persistence of rare or ancient native languages (if they can be taken into account by the machine translator, as some of these languages have been relatively well studied by ethnologists and linguists). Initially, positive effects could include better diction and sentence construction on the part of users who want their translation software to make as few mistakes as possible.
  • It would be conceivable to listen to a text written in a extinct language (Latin and Greek in particular) in one's own language in near real-time, as long as it can be "read" by optical character recognition software.

See also[edit]

Connected articles[edit]

External Links[edit]

Bibliography[edit]

References[edit]

  1. ^ Gourlay, Chris (2024-01-25). "Google leaps language barrier with translator phone". ISSN 0140-0460. Retrieved 2024-01-25.