ML Kit Document Scanner in action

Julien Salvi
Google Developer Experts
5 min readApr 15, 2024

--

Recently, ML Kit released a new library to digitize physical documents directly from your Android application. ML Kit Document Scanner allows the users to easily scan documents with maximum controls and minimum code integration. During my time at Aircall, I took some time to explore and integrate the library to build an Android-exclusive feature for the messaging scope.

Discovering the library

ML Document Scanner is the newborn of the ML Kit family. It allows users to convert physical documents into digital formats. The library now brings this feature to any apps with many capabilities like applying filters, cleaning the image, removing shadows and many more. The entire scan flow is delivered by the Play Services so no camera permission is mandatory and the library size (~300KB download size increase) has a low impact in your app (the ML models are also managed by the Play Services). So let’s dig into the what the library offers.

The ML Kit Document Scanner offers key capabilities like a high-quality user interface that ensures consistency across Android applications. With automatic capture and document detection, the users can effortlessly scan documents, while accurate edge detection ensures optimal cropping results. Additionally, automatic rotation detection ensures that scanned documents are presented upright.

For developers seeking customization options, the Document Scanner API offers flexibility. You can set a limit on the number of pages scanned and enable or disable the capability to import from the photo gallery. Moreover, there are 3 editing modes available: basic editing capabilities, editing with image filters or editing with ML-enabled image cleaning capabilities (erase stains, fingers…).

You will find a complete overview of the library with the official documentation.

Build with Document Scanner

Crafting the feature

It’s now time to craft our new feature with ML Kit Document Scanner. Here, we want to give the ability to the user to scan a document which will be converted into a PDF file in order to send it as a MMS to a recipient. Let’s see how we did it!

First, let’s add the dependency to your to libs.version.toml file:

#ML Kit Document Scanner
mlkit-doc-scanner = "16.0.0-beta1"
mlkit-doc-scanner = { module = "com.google.android.gms:play-services-mlkit-document-scanner", version.ref = "mlkit-doc-scanner" }

Which can then be added to the build.gradle file of the module of your choice in your project.

dependencies {
// ML Kit Document Scanner
implementation(libs.mlkit.doc.scanner)
}

In your Activity or Fragment, you are going to setup the Document Scanner Client with a bunch of options. Here, for example, we don’t allow the user to import photo from the gallery when building the document, we add a 10 page limit, we enable the scan with full capabilities and set the scan result as a PDF file (we could use RESULT_FORMAT_JPEG if we want the result as JPEG images). We are now almost good to go!

private var documentScannerClient: GmsDocumentScanner? = null
documentScannerClient = GmsDocumentScanning.getClient(buildScannerOptions())

// Set the options that suit your use case
private fun buildScannerOptions() = GmsDocumentScannerOptions.Builder()
.setGalleryImportAllowed(false)
.setPageLimit(10)
.setResultFormats(RESULT_FORMAT_PDF)
.setScannerMode(SCANNER_MODE_FULL)
.build()

Last step! Here we need to register an ActivityResultLauncher to get the results from the Intent triggered by Document Scanner client. Once registered, you will be able to process the result in order to retrieve the paths of the images or PDF files that have been produced by the library. Here, we only want to the scan as a PDF file in order to send it as a MMS.

private lateinit var scannerLauncher: ActivityResultLauncher<IntentSenderRequest>

private var documentScannerClient: GmsDocumentScanner? = null

scannerLauncher = registerForActivityResult(ActivityResultContracts.StartIntentSenderForResult()) { result ->
val resultCode = activityResult.resultCode
val result = GmsDocumentScanningResult.fromActivityResultIntent(activityResult.data)
if (resultCode == Activity.RESULT_OK && result != null) {
// Use result.pages to access the image URIs
result.pdf?.uri?.path?.let { path ->
viewModel.onDocumentScanned(path)
}
} else if (resultCode == Activity.RESULT_CANCELED) {
// Nothing happened
} else {
// Notify the user that something wrong happened
}
}

What ML Kit returns is the cache path of the URI, in order to process the file and send the PDF to our servers thanks to the Aircall API, we need to get the external URI of that file with androidx.core.content.FileProvider.

val fileProviderAuthority = "${BuildConfig.APPLICATION_ID}.fileprovider"
val externalUri = FileProvider.getUriForFile(context, fileProviderAuthority, File(path))
// Process the external file URI
controller.onFilesSelected(externalUri.toString())

One last thing to make it work, do not forget to give access to the cache directory to your app by adding the following configuration file in your res/xml folder:

<paths>
<cache-path name="cache_files" path="."/>
</paths>

Finally, to trigger the scanning flow, you need to get the scan Intent from the Client and then send the request thanks to the ActivityResultLauncher you just defined above. This flow can be triggered by a click on a button for example. Now you are all set! 🚀

documentScannerClient?.getStartScanIntent(requireActivity())
?.addOnSuccessListener { intentSender ->
scannerLauncher.launch(IntentSenderRequest.Builder(intentSender).build())
}
?.addOnFailureListener { e: Exception ->
// Oops an error occurred
}

With a few lines of code, you can now take advantage of the ML Kit Document Scanner capabilities to enhance existing features or create new ones in your Android applications.

The result

We now have our new document scanner capabilities integrated within our messaging feature 🥳

Scan and clean your document with ML Kit
Validate and send the scanned document as a PDF file

We were able to build a POC within a day in order to showcase the great capabilities of the Document Scanner library. A few weeks later, this new feature was released to production and our customers can now benefit from this nice enhancement when sending MMS.

Although the library is still in its beta phase and saw some nice-to-have improvements in the future with the control over PDF generation aspects such as image compression, it has enabled us to introduce an Android-exclusive feature into the Aircall Android app.

Do not hesitate to ping me on Twitter if you have any question 🤓

--

--

Julien Salvi
Google Developer Experts

Google Developer Expert for Android — Lead Android Engineer @ Aircall (Paris) — Startup way of life, beer lover and world traveler.