Creating Voice-Controlled Applications with Flutter

In the realm of mobile application development, voice control is emerging as a critical feature for accessibility, convenience, and innovation. Flutter, Google’s UI toolkit, empowers developers to create natively compiled applications for mobile, web, and desktop from a single codebase. Implementing voice control in Flutter can significantly enhance user experience and accessibility. This blog post explores the methods and techniques for creating voice-controlled applications with Flutter.

Why Implement Voice Control in Flutter?

Voice control can revolutionize how users interact with mobile applications. The key advantages of integrating voice control into Flutter applications include:

  • Accessibility: Makes apps more accessible for users with disabilities.
  • Hands-Free Operation: Enables usage while the user’s hands are occupied (e.g., cooking, driving).
  • Enhanced Convenience: Simplifies interactions by enabling spoken commands.
  • Innovative User Experience: Opens up possibilities for new, interactive app designs.

Methods for Implementing Voice Control in Flutter

Several methods and packages are available in Flutter to facilitate voice control. The primary approaches include using Flutter’s sound streams, utilizing platform-specific APIs (via Flutter’s Platform Channels), and leveraging dedicated Flutter packages.

Method 1: Using the speech_to_text Package

The speech_to_text package is a popular choice for integrating voice recognition capabilities in Flutter applications. This package provides a high-level API that simplifies voice input and speech recognition.

Step 1: Add the speech_to_text Dependency

Include the speech_to_text package in your pubspec.yaml file:

dependencies:
  flutter:
    sdk: flutter
  speech_to_text: ^6.3.0 # Use the latest version

Run flutter pub get to install the package.

Step 2: Implement Voice Recognition

Create a simple Flutter app to capture and display speech:


import 'package:flutter/material.dart';
import 'package:speech_to_text/speech_to_text.dart' as stt;

void main() => runApp(MyApp());

class MyApp extends StatefulWidget {
  @override
  _MyAppState createState() => _MyAppState();
}

class _MyAppState extends State {
  stt.SpeechToText speech = stt.SpeechToText();
  bool _isListening = false;
  String _text = 'Press the button and start speaking';

  @override
  void initState() {
    super.initState();
    _initSpeech();
  }

  void _initSpeech() async {
    bool available = await speech.initialize(
      onStatus: (status) => print('Status: $status'),
      onError: (errorNotification) => print('Error: $errorNotification'),
    );
    if (available) {
      print('Speech recognition is available');
    } else {
      print('Speech recognition not available');
    }
  }

  void _startListening() async {
    setState(() {
      _isListening = true;
      _text = 'Listening...';
    });

    await speech.listen(
      onResult: (result) => setState(() {
        _text = result.recognizedWords;
        if (result.finalResult) {
          _isListening = false;
        }
      }),
      listenFor: const Duration(seconds: 5),
      pauseFor: const Duration(seconds: 3),
    );
  }

  void _stopListening() async {
    await speech.stop();
    setState(() {
      _isListening = false;
    });
  }

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(
          title: const Text('Speech to Text Example'),
        ),
        body: Center(
          child: Text(
            _text,
            style: const TextStyle(fontSize: 24),
            textAlign: TextAlign.center,
          ),
        ),
        floatingActionButton: FloatingActionButton(
          onPressed: _isListening ? _stopListening : _startListening,
          tooltip: 'Listen',
          child: Icon(_isListening ? Icons.mic_off : Icons.mic),
        ),
      ),
    );
  }
}

In this code:

  • Initialize the speech recognition in the initState method.
  • Use speech.listen to start listening for voice input, updating the text based on recognized words.
  • Utilize speech.stop to stop the recognition process.

Method 2: Using Platform Channels for Native Voice Recognition

For platform-specific voice recognition features, Flutter’s Platform Channels allow access to native APIs on Android (Java/Kotlin) and iOS (Swift/Objective-C).

Step 1: Set up the Flutter Project

Create a Flutter project and configure Platform Channels.

Step 2: Implement Voice Recognition in Native Code
  • Android (Kotlin):
    
    import android.content.Intent
    import android.os.Bundle
    import android.speech.RecognizerIntent
    import io.flutter.embedding.android.FlutterActivity
    import io.flutter.plugin.common.MethodChannel
    
    class MainActivity: FlutterActivity() {
        private val CHANNEL = "voice_channel"
        private val VOICE_REQUEST_CODE = 123
    
        override fun onCreate(savedInstanceState: Bundle?) {
            super.onCreate(savedInstanceState)
    
            MethodChannel(flutterEngine?.dartExecutor?.binaryMessenger, CHANNEL).setMethodCallHandler {
                call, result ->
                if (call.method == "startVoiceRecognition") {
                    startVoiceRecognition(result)
                } else {
                    result.notImplemented()
                }
            }
        }
    
        private fun startVoiceRecognition(result: MethodChannel.Result) {
            val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
                putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
                putExtra(RecognizerIntent.EXTRA_PROMPT, "Speak now!")
            }
            startActivityForResult(intent, VOICE_REQUEST_CODE)
        }
    
        override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
            super.onActivityResult(requestCode, resultCode, data)
    
            if (requestCode == VOICE_REQUEST_CODE && resultCode == RESULT_OK) {
                val speechResult = data?.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS)
                val recognizedText = speechResult?.get(0) ?: ""
                
                // Invoke result.success with recognized text.
            } else if (requestCode == VOICE_REQUEST_CODE && resultCode == RESULT_CANCELED) {
                 // Invoke result.error() with the user cancelled error.
            }
        }
    }
    
  • iOS (Swift):
    
    import Flutter
    import UIKit
    import Speech
    
    @UIApplicationMain
    @objc class AppDelegate: FlutterAppDelegate {
      private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
      private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
      private var recognitionTask: SFSpeechRecognitionTask?
      private let audioEngine = AVAudioEngine()
      
      override func application(
        _ application: UIApplication,
        didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?
      ) -> Bool {
        let controller : FlutterViewController = window?.rootViewController as! FlutterViewController
        let voiceChannel = FlutterMethodChannel(name: "voice_channel",
                                                  binaryMessenger: controller.binaryMessenger)
        voiceChannel.setMethodCallHandler({ (call: FlutterMethodCall, result: @escaping FlutterResult) -> Void in
          guard call.method == "startVoiceRecognition" else {
            result(FlutterMethodNotImplemented)
            return
          }
    
          self.startVoiceRecognition(result: result)
        })
        return super.application(application, didFinishLaunchingWithOptions: launchOptions)
      }
      
        func startVoiceRecognition(result: @escaping FlutterResult) {
          SFSpeechRecognizer.requestAuthorization { authStatus in
            OperationQueue.main.addOperation {
              switch authStatus {
                case .authorized:
                  do {
                    try self.startRecording(result: result)
                  } catch {
                    result(FlutterError(code: "RECORDING_ERROR", message: "Audio engine failed to start.", details: nil))
                  }
                case .denied, .restricted, .notDetermined:
                  result(FlutterError(code: "AUTHORIZATION_ERROR", message: "User not authorized to use speech recognition service", details: nil))
                  break
                default:
                    result(FlutterError(code: "UNEXPECTED_AUTHORIZATION_ERROR", message: "Unexpected authrorization status.", details: nil))
                  break
              }
            }
          }
        }
        
        private func startRecording(result: @escaping FlutterResult) throws {
            // Cancel the previous task if it's running.
              if let recognitionTask = self.recognitionTask {
                  recognitionTask.cancel()
                  self.recognitionTask = nil
              }
          
            // Configure the audio session for the app.
              let audioSession = AVAudioSession.sharedInstance()
            try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
              try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
            let inputNode = audioEngine.inputNode
    
            // Create and configure the speech recognition request.
              recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
            guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }
              recognitionRequest.shouldReportPartialResults = true
            
            // Keep app running
            if #available(iOS 13, *) {
              recognitionRequest.requiresOnDeviceRecognition = false
            }
    
            // Create and configure a recognition task.
              recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { speechResult, error in
                  var isFinal = false
    
                if let speechResult = speechResult {
                    let bestResult = speechResult.bestTranscription.formattedString
                  result(bestResult)
                    isFinal = speechResult.isFinal
                }
    
                if error != nil || isFinal {
                    self.audioEngine.stop()
                    inputNode.removeTap(onBus: 0)
    
                    self.recognitionRequest = nil
                    self.recognitionTask = nil
                }
            }
            
            // Configure the microphone input.
              let recordingFormat = inputNode.outputFormat(forBus: 0)
              inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
                  self.recognitionRequest?.append(buffer)
              }
    
              audioEngine.prepare()
            try audioEngine.start()
          }
      
    }
    
Step 3: Invoke the Native Method from Flutter

In your Flutter application:


import 'package:flutter/material.dart';
import 'package:flutter/services.dart';

class VoiceRecognitionScreen extends StatefulWidget {
  @override
  _VoiceRecognitionScreenState createState() => _VoiceRecognitionScreenState();
}

class _VoiceRecognitionScreenState extends State {
  static const platform = MethodChannel('voice_channel');
  String _recognizedText = 'Press the button and start speaking';

  Future _startVoiceRecognition() async {
    String text;
    try {
      final String result = await platform.invokeMethod('startVoiceRecognition');
      text = 'Recognized text: $result';
    } on PlatformException catch (e) {
      text = "Failed to get voice input: '${e.message}'.";
    }

    setState(() {
      _recognizedText = text;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: Text('Voice Recognition Example'),
      ),
      body: Center(
        child: Text(_recognizedText),
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _startVoiceRecognition,
        tooltip: 'Start Voice Input',
        child: Icon(Icons.mic),
      ),
    );
  }
}

Method 3: Leveraging APIs for Natural Language Processing

In addition to speech-to-text capabilities, natural language processing (NLP) enhances the creation of smart applications. To add the intelligence you can implement integration between Dialogflow with flutter by referring following example


// Add dependencies
dependencies:
  flutter:
    sdk: flutter
  dialogflow_grpc: ^0.1.0 
  # Add the dialogflow_grpc package to your pubspec.yaml

// Import necessary packages:

import 'package:dialogflow_grpc/dialogflow_grpc.dart';
import 'package:dialogflow_grpc/generated/google/cloud/dialogflow/v2beta1/dialogflow_v2beta1.pb.dart';

class ChatScreenState extends State {
  final GlobalKey gKey = new GlobalKey();
  final TextEditingController textController = new TextEditingController();
  List messages = [];
  
  // Initilization

  initState() {
    super.initState();
    initPlugin();
  }
  
  // DialogFlow intilization
  Future initPlugin() async {
    languageCode = "en-US";
    credentialsFile = "path/to/your/dialogflow_credentials.json";  // Replace with the path to your JSON credentials file.
    grpcSecret = "YOUR_GRPC_SERVICE_SECRET";       // Replace with the secret for connecting to the GRPC service.
    projectID = "YOUR_PROJECT_ID";          // Replace with your Dialogflow project ID.
    authGoogle = await auth.AuthGoogle(
        credentialsFile: credentialsFile,
        grpcSecret: grpcSecret);
      
    dialogflow = DialogflowGrpcV2Beta1.viaGoogleAuth(authGoogle!);
  }
  
  // Sending messages

    void sendMessage(String text) async {
      textController.clear();

      // Add user message to the chat list
      setState(() {
        messages.add(ChatMessage(
          text: text,
          isUserMessage: true,
        ));
      });

      // Make Dialogflow detect intent
      DetectIntentResponse data = await dialogflow!.detectIntent(
          text, languageCode);

      // Handle Dialogflow's response
      String fulfillmentText = data.queryResult.fulfillmentText;
      if (fulfillmentText.isNotEmpty) {
        setState(() {
          messages.add(ChatMessage(
            text: fulfillmentText,
            isUserMessage: false,
          ));
        });
      } else {
        setState(() {
          messages.add(ChatMessage(
            text: "I didn't understand. Can you please rephrase?",
            isUserMessage: false,
          ));
        });
      }
    }

}

Considerations for Building Voice-Controlled Apps

When building voice-controlled apps, it’s crucial to keep the following aspects in mind:

  • Accuracy and Reliability: Speech recognition accuracy can vary based on ambient noise, accents, and pronunciation. It’s important to provide feedback and error handling.
  • User Permissions: Always request and handle microphone permissions gracefully to avoid disrupting the user experience.
  • Contextual Awareness: Design the app to understand voice commands in context to improve accuracy and reduce misunderstandings.
  • Usability and Intuitive Commands: Simplify command structures and provide voice command guidelines to improve usability.
  • Privacy: Be transparent about data collection and processing to adhere to privacy policies and user expectations.

Conclusion

Incorporating voice control in Flutter apps presents an excellent opportunity to enhance accessibility, usability, and innovation. Using packages like speech_to_text provides a straightforward method, while Platform Channels enable deeper integration with platform-specific APIs for enhanced functionalities. Integrating services such as Dialogflow through api provides intutive experience to users, that can assist voice command through contextual analysis. By focusing on accuracy, privacy, and intuitive command structures, developers can craft exceptional voice-controlled Flutter applications that transform the way users interact with their devices.