Building a Document Scanner App with SwiftUI

In today’s mobile-first world, document scanner apps are indispensable tools for quickly digitizing paper documents. While there are many document scanning apps available, creating one from scratch allows for complete customization and control over features. This article provides a step-by-step guide on how to build a document scanner app using SwiftUI, Apple’s declarative UI framework.

Introduction

SwiftUI provides a straightforward way to build native iOS applications. By combining SwiftUI with VisionKit, Apple’s framework for document scanning, developers can create powerful document scanner apps with ease. This article outlines the process, offering code samples and detailed explanations.

Prerequisites

Before diving into the development, make sure you have the following:

  • Xcode 13 or later
  • iOS 15 or later
  • Basic understanding of Swift and SwiftUI

Step 1: Setting Up the Project

Open Xcode and create a new project:

  1. Launch Xcode.
  2. Select “Create a new Xcode project.”
  3. Choose “App” under the iOS tab.
  4. Enter a project name (e.g., “DocumentScanner”) and ensure SwiftUI is selected for the interface.

Step 2: Implementing Document Scanning with VisionKit

VisionKit simplifies the document scanning process. To use VisionKit, you’ll need to create a UIViewControllerRepresentable wrapper to integrate VNDocumentCameraViewController with SwiftUI.

Creating DocumentScannerView

import SwiftUI
import VisionKit

struct DocumentScannerView: UIViewControllerRepresentable {
    @Binding var recognizedText: String
    @Environment(\\.presentationMode) var presentationMode

    func makeUIViewController(context: Context) -> VNDocumentCameraViewController {
        let documentCameraController = VNDocumentCameraViewController()
        documentCameraController.delegate = context.coordinator
        return documentCameraController
    }

    func updateUIViewController(_ uiViewController: VNDocumentCameraViewController, context: Context) {
        // Nothing to update
    }

    func makeCoordinator() -> Coordinator {
        Coordinator(recognizedText: $recognizedText, presentationMode: presentationMode)
    }

    class Coordinator: NSObject, VNDocumentCameraViewControllerDelegate {
        @Binding var recognizedText: String
        var presentationMode: Binding

        init(recognizedText: Binding, presentationMode: Binding) {
            _recognizedText = recognizedText
            self.presentationMode = presentationMode
        }

        func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
            print("Document scanned!")

            for pageIndex in 0 ..< scan.pageCount {
                let scannedImage = scan.imageOfPage(at: pageIndex)
                // Perform OCR or image processing here (implementation details omitted for brevity)
                // Recognized text from the scanned image should be stored in the recognizedText variable.
                recognizedText += "Scanned Page \(pageIndex + 1)\\n" // Placeholder
            }

            presentationMode.wrappedValue.dismiss()
        }

        func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
            presentationMode.wrappedValue.dismiss()
        }

        func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error) {
            print("Document scan failed with error: \(error)")
            presentationMode.wrappedValue.dismiss()
        }
    }
}

Key aspects of the code:

  • DocumentScannerView: Conforms to UIViewControllerRepresentable, allowing the integration of a UIViewController (VNDocumentCameraViewController) into a SwiftUI view.
  • recognizedText: A binding variable to hold the recognized text from the scanned document.
  • makeUIViewController: Creates and configures the VNDocumentCameraViewController.
  • Coordinator: Acts as a delegate for the VNDocumentCameraViewController, handling the scan results.
  • documentCameraViewController(_:didFinishWith:): Called when a scan is completed. It iterates through the scanned pages, extracts the images, and performs (or could perform) OCR to recognize the text. In this simplified example, it updates the recognizedText with a placeholder message for each page.
  • documentCameraViewControllerDidCancel(_:) and documentCameraViewController(_:didFailWithError:): Handle cancellation and error cases, respectively.

Step 3: Integrating the Scanner into the SwiftUI View

Now, integrate the DocumentScannerView into your main SwiftUI view.

import SwiftUI

struct ContentView: View {
    @State private var isScanning: Bool = false
    @State private var scannedText: String = ""

    var body: some View {
        NavigationView {
            VStack {
                Text(scannedText)
                    .padding()

                Button("Scan Document") {
                    self.isScanning = true
                }
                .padding()
                .sheet(isPresented: $isScanning) {
                    DocumentScannerView(recognizedText: $scannedText)
                }
            }
            .navigationTitle("Document Scanner")
        }
    }
}

Explanation:

  • isScanning: A state variable to control the presentation of the document scanner view as a sheet.
  • scannedText: A state variable that displays the recognized text from the scanned document.
  • The Button triggers the presentation of the DocumentScannerView using the .sheet modifier.
  • The scanned text is displayed using the Text view, which binds to the scannedText variable.

Step 4: Add Privacy - Camera Usage Description

You need to add a Privacy - Camera Usage Description in Info.plist to let users know why the app needs access to the camera.

  1. Open Info.plist.
  2. Add a new entry.
  3. Choose Privacy - Camera Usage Description from the dropdown.
  4. Enter a description, such as "The app needs camera access to scan documents."

Step 5: Testing the App

  1. Build and run the app on a physical iOS device.
  2. Tap the "Scan Document" button.
  3. The document scanner view will be presented.
  4. Scan a document, and upon completion, the scanned text (or placeholder text) will be displayed in the main view.

Enhancements and Further Steps

The basic app can be enhanced in the following ways:

  • Optical Character Recognition (OCR): Implement OCR to extract text from the scanned images. Vision offers functionality but external libraries (e.g., Tesseract) can be integrated too.
  • Image Processing: Apply filters to enhance image quality, adjust perspective, and correct contrast.
  • Document Storage: Allow users to save the scanned documents locally or to cloud services.
  • UI/UX Improvements: Enhance the user interface with additional controls and feedback mechanisms.

Integrating OCR could involve something like the below (Conceptual and assumes framework or library usage):


import Vision
import UIKit

func performOCR(on image: UIImage, completion: @escaping (String?) -> Void) {
    guard let cgImage = image.cgImage else {
        completion(nil)
        return
    }
    
    let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    let request = VNRecognizeTextRequest { (request, error) in
        guard let observations = request.results as? [VNRecognizedTextObservation], error == nil else {
            completion(nil)
            return
        }
        
        let recognizedText = observations.compactMap { observation in
            return observation.topCandidates(1).first?.string
        }.joined(separator: "\\n")
        
        completion(recognizedText)
    }
    
    do {
        try requestHandler.perform([request])
    } catch {
        print("Error performing OCR: \(error)")
        completion(nil)
    }
}

Usage in the Coordinator, replacing placeholder.


func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
    print("Document scanned!")

    for pageIndex in 0 ..< scan.pageCount {
        let scannedImage = scan.imageOfPage(at: pageIndex)
        
        //Perform OCR
        performOCR(on: scannedImage) { recognizedTextResult in
            
            if let recognizedTextResult = recognizedTextResult {
                 recognizedText += "Scanned Page \(pageIndex + 1):\\n\(recognizedTextResult)\\n"
            } else {
                 recognizedText += "Scanned Page \(pageIndex + 1):\\n(Text not recognized)\\n"
            }

           // After the last page, dismiss the camera view controller (VERY IMPORTANT)
           if pageIndex == scan.pageCount - 1 {
                DispatchQueue.main.async {
                    presentationMode.wrappedValue.dismiss()
                 }

           }
           
        }
        
        
    }
}

Conclusion

With SwiftUI and VisionKit, building a document scanner app on iOS becomes a manageable task. This guide provides a fundamental framework to get started. The described solution focuses on quickly capturing a document, further extended to process the scanned information into usable text. Integrating features such as OCR, image enhancement, and document management would substantially boost the app's practical applicability and improve the overall user experience. As mobile technology develops, capabilities of scanning applications evolve alongside it to meet demanding demands and deliver intuitive digital document solutions.