Building a Speech Recognition App with Microsoft Cognitive Services and React
Introduction:
Speech recognition technology has revolutionized the way we interact with applications and devices. In this tutorial, we’ll explore how to create a speech recognition app using Microsoft Cognitive Services and React.
Microsoft Cognitive Services provides powerful speech recognition capabilities that can be easily integrated into your React applications. By the end of this tutorial, you’ll have a working speech recognition app with options to start, stop, and pause the microphone.
Prerequisites:
Before we dive into the implementation, make sure you have the following prerequisites in place:
- An Azure account for accessing Microsoft Cognitive Services.
- The necessary subscription key and region for using the Speech Services API.
- Node.js and npm (Node Package Manager) installed on your machine.
- Basic knowledge of React and JavaScript.
Setting Up the Project:
Create a new React project by running the following command in your terminal:
npx create-react-app speech-recognition-app
cd speech-recognition-app
Install the Azure Cognitive Services SDK for JavaScript:
npm install microsoft-cognitiveservices-speech-sdk --save-dev
Implementation:
Now let’s implement the speech recognition functionality in our React app.
Create file SpeechToTextComponent.js
in the src
directory and paste the the following code:
import React, { useState, useEffect, useRef } from 'react';
import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
const SPEECH_KEY = '';
const SPEECH_REGION = '';
export function SpeechToTextComponent() {
const [isListening, setIsListening] = useState(false);
const speechConfig = useRef(null);
const audioConfig = useRef(null);
const recognizer = useRef(null);
const [myTranscript, setMyTranscript] = useState("");
const [recognizingTranscript, setRecTranscript] = useState("");
useEffect(() => {
speechConfig.current = sdk.SpeechConfig.fromSubscription(
SPEECH_KEY,
SPEECH_REGION
);
speechConfig.current.speechRecognitionLanguage = 'en-US';
audioConfig.current = sdk.AudioConfig.fromDefaultMicrophoneInput();
recognizer.current = new sdk.SpeechRecognizer(
speechConfig.current,
audioConfig.current
);
const processRecognizedTranscript = (event) => {
const result = event.result;
console.log('Recognition result:', result);
if (result.reason === sdk.ResultReason.RecognizedSpeech) {
const transcript = result.text;
console.log('Transcript: -->', transcript);
// Call a function to process the transcript as needed
setMyTranscript(transcript);
}
};
const processRecognizingTranscript = (event) =>{
const result = event.result;
console.log('Recognition result:', result);
if (result.reason === sdk.ResultReason.RecognizingSpeech) {
const transcript = result.text;
console.log('Transcript: -->', transcript);
// Call a function to process the transcript as needed
setRecTranscript(transcript);
}
}
recognizer.current.recognized = (s, e) => processRecognizedTranscript(e);
recognizer.current.recognizing = (s, e) => processRecognizingTranscript(e);
recognizer.current.startContinuousRecognitionAsync(() => {
console.log('Speech recognition started.');
setIsListening(true);
});
return () => {
recognizer.current.stopContinuousRecognitionAsync(() => {
setIsListening(false);
});
};
}, []);
const pauseListening = () => {
setIsListening(false);
recognizer.current.stopContinuousRecognitionAsync();
console.log('Paused listening.');
};
const resumeListening = () => {
if (!isListening) {
setIsListening(true);
recognizer.current.startContinuousRecognitionAsync(() => {
console.log('Resumed listening...');
});
}
};
const stopListening = () => {
setIsListening(false);
recognizer.current.stopContinuousRecognitionAsync(() => {
console.log('Speech recognition stopped.');
});
};
return (
<div>
<button onClick={pauseListening}>Pause Listening</button>
<button onClick={resumeListening}>Resume Listening</button>
<button onClick={stopListening}>Stop Listening</button>
<div>
<div>
Recognizing Transcript : {recognizingTranscript}
</div>
<div>
RecognizedTranscript : {myTranscript}
</div>
</div>
</div>
);
};
Update SPEECH_KEY and SPEECH_REGION.
Now go to App.js and remove the exiting code and paste the following:
import logo from './logo.svg';
import './App.css';
import { SpeechToTextComponent } from './SpeechToTextComponent';
function App() {
return (
<div className="App">
<SpeechToTextComponent></SpeechToTextComponent>
</div>
);
}
export default App;
Starting the Application:
Now we are ready to start the server. Go to terminal and execute the following command:
npm start
This will bring up your react application at port 3000.
Using the Application:
You will see a mic icon popping up in the title section of browser. If it does not popup then press stop and then resume mic button.
Now try saying something … e.g -> Hello World 😏
You will see the transcript and also recognizing transcript (dynamic transcript). This is all coming from azure 😄
Now lets try to understand the code.
Code Explanation:
Let’s walk through the code to understand the changes and how the start, stop, and pause microphone options work.
- We added two functions,
pauseListening
andstopListening
, for pausing and stopping the speech recognition process - The
pauseListening
function sets theisListening
state to false, stops the recognition process withstopContinuousRecognitionAsync
, and logs a message. - The
stopListening
function does a similar job but also stops the recognition process completely. Well both are similar 😄. - We added a new button for stopping the listening process by invoking the
stopListening
function.
Conclusion:
Congratulations! 👏 You have successfully created a speech recognition app using Microsoft Cognitive Services and React. You’ve learned how to integrate the Speech Services SDK, start, stop, and pause the microphone, and process the speech recognition results.
Feel free to customize the app further, adding features like error handling or real-time display of recognized speech.
Speech recognition opens up endless possibilities for improving user experiences, enabling hands-free interactions, and enhancing accessibility. Explore the capabilities of Microsoft Cognitive Services and unleash the power of speech recognition in your applications.
But our journey doesn’t end here! If you’re interested in exploring the world of speech synthesis, stay tuned for my upcoming tutorial where we’ll delve into creating a speech synthesis app using the Azure SDK. With speech synthesis, you’ll be able to convert text into lifelike speech, opening up new opportunities for interactive and engaging user experiences.
Thank you for joining me on this journey, and I look forward to having you explore speech synthesis in the upcoming tutorial. Stay tuned and happy coding!