voice memos speech recognition / transcription

Started by jimspoon on 1/4/2015
jimspoon 1/4/2015 6:10 am
I would like to be able to grab a voice recorder, and without even looking at it, push a button to start recording, start talking, push another button to end the recording. Then I'd like the recording to be automatically uploaded to Google's or Dragon's servers, and have the recognized text appended to a text file with an appropriate time stamp.

I've looked for an Android app that would enable me to do this, but have not found anything very close to what I'm looking for. I am able to enter text in android notetaking apps by dictation with real-time recognition and transcription. But this involves steps I want to be able to avoid - (1) wake up phone; (2) find the notetaking app; (3) start a new note; (4) activate the speech recognizer; (5) monitor the speech recognition results while talking, (6) push a button to save the stop the recording, (7) name the note, etc.

In other words - the process currently involves too many steps, which discourages the making of such notes in the first place.

Has any found a device or app (on any platform) which does something like I've described in the first paragraph? I haven't found it on Android, but maybe it's different on other platforms.

thanks in advance -
jim




jimspoon 1/4/2015 12:00 pm
just to clarify - I'm looking for an app or device that does NOT require that upload, recognition, and text delivery take place as I speak - but rather is capable of uploading previously recorded sound files for recognition and text delivery - preferably automatically and unattended. that's the main thing that I"m looking for - that I could make a voice recording quickly, and not have to attend to the recognition and transcription.

Hugh 1/4/2015 6:37 pm
As far as I know, a solution to meet your exact requirements doesn't exist (yet). You could probably get near to them using Siri on an iDevice, or its equivalent on Android (I'm sorry, I'm unfamiliar with the Android platform). I gather from your post that it's the surrounding procedures rather more than the dictation technology itself that you're - understandably - finding frustrating. To reduce the button-pushing on an iDevice, personally I'd use Launch Center Pro, into which keyboard shortcuts can be programmed under iOS8, plus Nuance's Dragon Dictation app (which I've found to be more accurate than Siri - although see below), plus email, again pre-programmed. As you're probably aware, Nuance is the big dog in this market, with - probably - the most advanced technology and in my experience improving results - and to the best of my knowledge, others, including Apple and Microsoft, license their software, including Siri, from Nuance - Google I don't know.

(There is reason for optimism. According to this weekend's newspapers, AI research is starting to make some progress after a period of years when results did not match expectations. And one area in which strides are being made by several Silicon Valley enterprises is pattern recognition, including voice recognition. Possibly it isn't too much to hope that dictation/voice-to-text and voice control of software will become ubiquitous, easy-to-use and still more precise before very long?)
Hugh 1/4/2015 6:51 pm
Re-reading your posts, I conclude that Siri or its Android equivalent isn't a solution. Nuance also distribute a recording app for i-devices: that, plus some form of automation for holding, uploading, transcribing and filing the resultant file, sound like the answer you need. That would probably require different kinds of automation at the levels of the mobile device, and the server; I don't know what those would be.
Paul Korm 1/4/2015 7:04 pm
I use Dictate + Connect on my iPhone -- there's also an Android version.

http://www.dictate-connect.com

Over here, I dictate a note and it is automatically uploaded to Dropbox as a .wav. Later, on my Mac, I open Dragon Dictate, launch a transcription profile, and then open the .wav files created with Dictate. They are transcribed, and Dragon then asks where I want to save the transcriptions.

Not as automated flow as Jim's requirements -- but Dictate has a lot of config options (including automatically sending an email of the dictation) that might be tweaked.
Paul Korm 1/4/2015 7:12 pm
And there's also Dragon Dictate for iOS
Dr Andus 1/5/2015 1:37 am
jimspoon wrote:
I would like to be able to grab a voice recorder, and without even
looking at it, push a button to start recording, start talking, push
another button to end the recording. Then I'd like the recording to be
automatically uploaded to Google's or Dragon's servers, and have the
recognized text appended to a text file with an appropriate time stamp.

I agree that this would be great, but I also doubt whether this is technically possible yet, or rather, whether the results are sufficiently reliable.

The main issue I've run into (as a long-term Dragon user) is that even if recognition is 99% accurate, there will be mistakes, and if you don't spot and correct them right away, days, weeks or years down the line the text may become unintelligible, as you won't remember what you meant to say (although one could check the recording again, but it's extra work).

I use Dragon on the PC to dictate book passages, and even after I check, months later I find sentences that don't make sense, and I have to go back to the original to figure out what's going on.

As for personal notes with a digital recorder, I find that they often contain background noise (if you're recording them in a public space) or are not sufficiently succinct or coherent, so what I tend to do these days is 1) record the note, 2) listen to it back at the office, and 3) transcribe them in a more coherent and succinct form (which could be dictated with Dragon), and 4) delete the audio file. Anything else just ended up being too much hassle for it to be worth it (such as trying to upload the files to the PC and have them automatically transcribed by Dragon).
jimspoon 1/6/2015 1:29 pm
Thanks all for your input - I will investigate your suggestions.

What surprises me is that the Google and Nuance recognizers produce such good results when entering text using voice in real time on my Android phone. So it seems like it should be possible to get equally good results by submitting previously recorded sound files to the same servers/software. I mean quick voice recordings that would be made without even looking at the device, simply by holding down a button or something like that.

There ought to be a dedicated voice recorder device out there that works like that - grab it, hold down button, wait for beep, talk, release button. Device automatically uploads recorded audio to servers via 3G/4G or wifi if available. Text together with recorded sound is available via smartphone app or web app - instantly searchable. For correcting recognition errors, the interface would need to have something like the desktop Dragon's "play this back" function - highlight text, click "Play this Back", and hear the portion of the audio recording which the recognizer transcribed into the selected text. The data plan for such a device shouldn't be that expensive, after all - it wouldn't take that many megabytes to transmit voice recordings to the servers. I think such a device would be a killer product. I've looked at the websites for various voice recorder manufacturers (Olympus, Sony), haven't found anything like it. Some high end recorders have some wjfi capability.

I did find an interesting android app called Speech to Text Notepad - for me the interesting feature was the ability to delete words by saying, for example, "delete 4" to delete the four words preceding the cursor. This makes it easy to delete mis-recognized words. Just tap at the appropriate location before issuing the command.

https://play.google.com/store/apps/details?id=com.heterioun.HandsFreeNotes

The speech recognition built into the Google Keyboard and Swype keyboard doesn't recognize many commands - while they properly interpret "new line", "comma", "period", "exclamation point" "question mark" - "backspace" and "delete" do not work as they should. They should also recognize and respond appropriately to "insert date" and "insert time" (in configurable formats).

Seen anything like this, for android or iOS?
MadaboutDana 1/6/2015 5:48 pm
Evernote haters among us will be amused by the fact that, yes, Evernote for Android does indeed convert spoken input to text, and preserves both the audio and text files, using Google's speech recognition engine.
MadaboutDana 1/6/2015 5:54 pm
There's also a voice recognition function built into iOS 8. There's a nice article on it here:

http://www.makeuseof.com/tag/type-superfast-real-time-voice-dictation-ios-8/

I have a couple of iOS 8 devices, but haven't experimented with this. I could have a go, I suppose, and see what happens to the audio files...
jimspoon 1/6/2015 6:27 pm
Bill, Evernote for Android used to preserve both the audio recording and the recognized text, but it seems this is no longer the case. I have just made a speech-to-text note in Evernote in Galaxy Note 3, and it does not contain an audio attachment, or at least I can't find how to access it ... no audio attachment shows up on the web client or the desktop client when I sync the notes from the android app.

Even when the audio recording (for speech to text notes) was preserved and was accessible - it would often contain gaps, and whole portions of the speech that was dictated would be missing. Sometimes when dictating an Evernote speech-to-text note, the spoken words simply don't show up.

It seems I'd be better off just using the speech recognition in the Google or Swype keyboards, rather than the speech-to-text button in the evernote app.

Apart from the speech to text function in Evernote for Android - one can make an "audio note" and it is preserved as an .AMR file attachment. And the attachment is synced to the desktop client. But I can't find any way in the app to convert this .AMR audio clip to text.

Bummer!
jimspoon 1/6/2015 6:37 pm
Bill, here's another link for the dictation commands available for OS X - supposedly the same commands are available in iOS.

http://support.apple.com/en-us/HT203084

Some of these I hadn't thought about - i should try on my Android device (Google / Swype).

I will be interested to hear whether the iOS preserves the audio when you use the dictation function.
MadaboutDana 1/6/2015 7:17 pm
Shame about Evernote, kind of underlines the rather vacillating roadmap that appears to be undermining what was once a confident, forward-looking team.

And thanks for the iOS commands: I must try the dictation function out and see how well it works. Reviews suggest it's pretty good!

Cheers,
Bill
MadaboutDana 1/6/2015 7:25 pm
I've just dictated some stuff into Byword on my iPad, and it works fairly well. It needs a good connection to Apple's servers, and has a peculiar way of dropping the connection quite suddenly. But it's impressively accurate!
MadaboutDana 1/6/2015 7:27 pm
However, there's no obvious way to save the audio file, let alone synchronise it with the text output.

Various apps will keep track of audio alongside e.g. hand-typed/written notes (OneNote, Notebooks, Notability, Pear Notes etc.), but I don't know of anything that will store audio alongside the text output from that audio.

Well, just have to keep looking!