© 2019 by Rational Touch.

Speech Recognition

First step for Woice SDK is to ​capture your audio input device and your input audio stream, and convert your voice commands into text.

How to accomplish this, depends on the browser that the user uses:

1. For Chrome or Opera users, Woice uses the speech recognition methods defined by the Web Speech API under the W3C consortium specification.  This provide a free transcription service, making it the most cost-effective environment for end users.

2. Unfortunately Microsoft Edge, Firefox and Safari, don't support that standard yet, so no in-browser native speech recognition is available. For them, Woice taps into the input audio device and captures the input audio stream, processing it with Woice's Speech-To-Text Server, which in turn uses best-in-class Google's Speech Recognition API. This adds an additional cost to your Woice service.

3. Internet Explorer and even older versions of other browsers* don't even support accesing the audio device, which makes providing the user experience even tougher. 

 

For these users, we provide a Windows desktop application that works in combination with Woice SDK. The desktop application runs in the background, providing the input and output voice functions, but application events are routed by Woice into your application in the same way as the standard Woice SDK would do in the other configurations. This also adds an additional cost to your Woice service.

 

Woice SDK automatically detects the user's browser capabilities and adjusts its behaviour, dispatching application events consistently to your implementation code. This means your user experience is preserved, with a single implementation code, across all different browsers.

* Check the documentation in the private area to find detailed information about supported browsers.

Processing

The second step for Woice SDK is to ​process users' requests with the Woice Server. The technical implementation of this processing varies depending on the browser that the user is using, but does not have any impact in your Woice service cost or your code as a developer.

The processing includes going through your DialogFlow agent (if configured*), and may have the following outcomes:

1. A hit on your DialogFlow model. In this case, Woice SDK dispatches an application event for your implementation code to process.

2. A transcription into a text input field if no hit on your DialogFlow model has been found (or no DialogFlow agent configured) but a text field is active in the application at the time of processing the user's request.

3. A voice message of your choice (usually prompting your user to repeat the message), if none of the above conditions are met.

This process is completely internal to Woice SDK, and you don't need to interact with it. It's explained here for you to understand Woice's decision process when transcribing and triggering application events.

* Woice can be used for transcribing voice commands into text input fields with no coupling with DialogFlow.

Providing feedback

If your DialogFlow intent definition includes a "Text Response" along with the intent and entities analysis, Woice SDK will provide it as spoken feedback to the user, in addition to routing the application events to your code.

This is done using the browser's Speech Synthesis functions in all cases, except for Microsoft Internet Explorer, which lacks this function. For Internet Explorer users, this function is provided by the desktop application, in a seamless way.

In addition to providing automatic feedback in connection with Natural Language Analysis hits, Woice SDK provides a global "talk" JS method. This allows you to add verbal feedback and communication with the user at any point of your application, not just in connection to hits on the NLU model.