Note: The videos were totally unscripted, and edited with all the sophistication of a caveman trying to use a fountain pen on his cave wall. You have been warned!
As some of you may know, I’ve been interested in using voice recognition software for translating for the past few months. This all started with Kevin Lossner invitation to try Mac OS/X Yosemite’s built-in dictation tool, which, after my first attempt, yielded pathetic results.
However, as the saying goes in Portugal, the “hardware is always right” and the fault was obviously my own, as a student under Professor David Hardisty named Joana Bernardo had achieved remarkable success with this.
So, I stuck with it for a little while longer and actually got to improve both my dictation and software features to the point where it has become my main ancillary tool for translation when paired up with my CAT tool of choice. In fact, the productivity boost it provided was only matched by my adoption of CAT tools shortly after I began translating.
At about the time I had this figured out, I wrote a post for Kevin Lossner’s Translation Tribulations blog, detailing how to spec a virtual machine for using the OS/X dictation feature under a Windows environment as the guest operating system. This combination of OS/X and virtual machine software allows you to use whatever CAT tool you want (or none at all) and the built-in speech recognition.
OS/X’s built-in speech recognition has one marked advantage over all other solutions that have been explored so far: by downloading the improved audio dictionaries, you make it possible to dictate without sending any data to a remote server, thus preserving confidentiality – an important feature if you work with materials requiring such handling.
However, this also poses some limitations, namely the apparent inability to add vocabulary, some minor capitalization issues that arise whenever one stops dictating mid-sentence, and the lack of application-specific commands outside the built-in selection that comes with OS/X for the applications bundled with said operating system.
Well, these can all be easily circumvented. Vocabulary can be added both manually and automatically by using a very simple procedure that’s doesn’t even require you to speak the word that you want to add – this made automated feeding of vocabulary a real possibility. As for the capitalization problems and the application-specific commands to be used in say, your CAT tool, the solution is exactly the same: for the capitalization, you merely create a verbal command that will change the capitalization status of highlighted text. Such commands exist in word processors and in CAT tools. As for the application-specific commands and, you can do it in exactly the same manner, by simply chaining a verbal command to a mapped keystroke.
The videos below show how to use these features in OS X.
For the mobile professionals…
People on the move, particularly those that do not require a high level of confidentiality on the documents they translate, have a whole new set of possibilities. This is because the most advanced speech recognition technology currently available is based on mobile platforms. While the world-famous Dragon Naturally Speaking (DNS) product line by Nuance is often mentioned, it merely supports 7 languages. Nuance’s mobile API supports roughly 40, a similar number to the OS/X built in speech recognition feature.
The major difference between the OS/X implementation of the feature and the mobile-based solutions lies with the flow of data: unlike computer based solution offered by the improved dictation audio dictionaries, iOS built-in dictation and Nuance’s API call upon remote servers through an Internet connection in order to process the audio recorded by the device into text, according to a previously chosen language. This is where confidentiality issues potentially arise, and will certainly become a hot topic for discussion and improvement over the coming months. (ERRATUM: Kevin Lossner kindly pointed out that Nuance does not retain any details, hence this may be a non-issue, see this video of a presentation on memoQfest 2015 for further details)
So, when you’re working based on a mobile platform you basically have 2 options:
Option A – do as much as you can on your mobile platform and then pass it on to your CAT tool
Option B – use your mobile platform as an interface for your primary machine which is running your CAT tool, word processor, whatever.
Option A basically requires you to use the export features in CAT tools in order to open the document and dictate into it on your mobile device. There are some subtleties to this, but it can allow for a rather good level of productivity, and would definitely unchain you from your desk.
Option B is far more powerful. You can be somewhat unchained from your desk, as you no longer need to sit in front of the keyboard. In fact, depending on your choice of software, you may not even need to be in the same country as your computer. However, from the current selection of available software, the applications that present the best results also require you to manipulate the cursor to position the output of the speech recognition in the correct place.
Option A – working with CAT tools and data bilingual file formats
The main disadvantage of these procedures lies with the way these software packages use tags (Trados), or in the way certain mobile platforms have deprecated the use of a universal file format (Apple dropped the support for RTF files in iOS devices).
Of course, these can all be circumvented with a minimum of fuss. The videos below show a basic workflow allowing for the use of these features in said software packages.
Option B – using software to map speech recognition output as keystrokes
This is pretty much like going to the zoo – you’ll find plenty of similar animals of varying proficiency at their game, and you’ll certainly confuse yourself with some of them. I will discuss two that work and one that has pretty good potential once some minor difficulties are overcome.
MyEcho is an application that allows you to use your iOS built-in recognition feature to dictate at your leisure. It then captures text output returning from the server and inserts it on your PC at your cursor’s current position. It does that by running an iOS application (currently costing €1.99) and a free Windows based program.
These are paired up by a very simple procedure involving QR codes, and you can pair multiple machines to a single mobile device. It will also work inside a virtual machine. As you dictate, the text will appear in the Windows based program, and it will be automatically inserted at the cursor’s position.
MyEcho suffers from the same confidentiality limitations as previously described, and uses Apple’s speech recognition servers for the audio processing.
See the video below for an example of how it works:
You can find several virtual keyboard applications in the iOS App Store that will allow you to use your iOS device as a trackpad or keyboard for your Mac or PC. Just like MyEcho, they will require a program to interface with the computer, with the advantage that the pairing up is done on a local network basis – that is, only the audio data is being sent to a remote server in order to be traded by the text output.
The basics are fairly simple – you activate the virtual keyboard on your iOS device and entirely forego the use of the virtual keys for the very conveniently placed microphone icon that will allow you to bring up the dictation feature.
Of the several applications I’ve tried so far, Remote Keyboard+ has some of the most user friendly installation and startup procedures imaginable. You simply run a helper program on your computer, and it will broadcast a signal on through your Wi-Fi network, which will then be found by the mobile device’s application.
Remote Control & Dictation – The PC based lifesaver
One final combination of contenders for the PC crowd (for any platform, really) is the combination of a remote control tool and Swype, with these two applications being run on a mobile device.
The example I’m using in the video below includes Teamviewer Remote Control for Android (free) and Swype (€0.75) to do just that, running off my mobile phone (a OnePlus One).
In fact, the whole process can be executed straight from a simple mobile phone, because the only interaction you’ll have with it is the pressing of the microphone button to start dictating your text.
Teamviewer Remote Control allows you to control a machine identified by a unique number, following the insertion of said number and randomly generated password. Once the computer and mobile device are paired up, you can drop off the phone in front of you and focus on the computer.
Teamviewer Remote Control does not prevent your use of the computer being controlled remotely, so you simply use your machine as you would. Whenever you wish to dictate, simply place the cursor where you wish the dictated text to appear, press the Dictate button on the mobile device and start dictating.
The mobile device will then receive the text output of the speech recognition and map it out as keystrokes, which are subsequently sent to the computer via Teamviewer, thus inserting the dictated text straight onto your screen, at the cursor position.
The same can be achieved using Google’s Chrome Remote Desktop, as seen below:
As you might have noticed so far, with the exception of dictation using the built in features of OS/X, all of these solutions require the use of two separate applications: one that enables the use of high quality speech recognition, and a second one to function as an interface between the speech recognition platform and the machine running the software where the speech recognition output is to be used.
There is a current, unmet need for a single application or at least an unified approach that allows you to not require platform hopping in order to get things done.
Current, highly desirable solutions include:
- A way to use Nuance’s technology available through their API or Swype natively into a PC platform – in essence, a hugely extended, online version of Dragon Dictation.
- A direct, mobile-based solution, not requiring a third party application to serve as an interface to the host machine for the CAT tool, word processor, etc…
- The ability to have these very same features with a degree of data confidentiality that allows their approval by clients for whom that is an essential requisite
- An improved interface that allows for simpler control of the dictation vocabulary and settings in OS/X
The first two items basically depend solely on developers creating such applications, as the tools and resources already exist and are readily available.
Item number three will doubtlessly be very high in the future changes for this technology. The technology is already being used in this age of the “Internet of things”, and just like machine translation and its confidentiality issues, this will soon be “regulated” or at least adapted in a way to make it acceptable from a confidentiality standpoint.
Item number four basically depends on improving the interface on OS/X. If there is any talented Cocoa developer that wants to help me address that situation, my e-mail is a click away, and help with the coding would be greatly appreciated.