Overview

Introduction

OpenCV.js is a JavaScript binding for selected subset of OpenCV functions for the web platform. It allows emerging web applications with multimedia processing to benefit from the wide variety of vision functions available in OpenCV. OpenCV.js leverages Emscripten to compile OpenCV functions into asm.js or WebAssembly targets, and provides a JavaScript APIs for web application to access them.

However, now the performance of OpenCV.js still have a big gap with Native, and it can’t support real-time tasks very well, such as face detection and face recognition. The biggest reason is that the current version of OpenCV.js runs with single thread and no SIMD, which greatly wastes the parallel computing power of the CPU.

But at this time, WebAssembly can reduce the performance gap between Web and Native. WebAssembly now support multi-threading with Web Worker and shareArrayBuffer, and is going on supporting new v128 value types used for SIMD, which can both improve the parallel computing capability on Web.

Therefore, the main goal of this project is to speedup OpenCV.js by multi-threading and SIMD.

Work structure

Create the base of OpenCV.js performance test

Benchmark.js is a benchmarking library that supports high-resolution timers & returns statistically significant results. And the OpenCV.js performance test tool is based on it. Now we add three kernels of imgproc module into these performance test, which are cvtColor, Resize and Threshold. And all the performance tests are based on native performance test.

To run performance tests, launch a local web server in <build_dir>/bin folder. For example, node http-server which serves on localhost:8080. If you want to test threshold, please navigate the web browser to http://localhost:8080/perf/perf_imgproc/perf_threshold.html. You need to input the test parameter like (1920x1080, CV_8UC1, THRESH_BINARY), and then click the Run button to run the case. And if you don’t input the parameter, it will run all the cases of this kernel.

You can also run tests using Node.js. For example, run threshold with parameter (1920x1080, CV_8UC1, THRESH_BINARY):

1
2
3
cd bin/perf
npm install
node perf_threshold.js --test_param_filter="(1920x1080, CV_8UC1, THRESH_BINARY)"

Optimize the OpenCV.js performance by WebAssembly threads

WebAssembly now support multi-threading with Web Worker and SharedArrayBuffer (i.e., WebAssembly threads). Developers are able to use Emscripten to translate the pthreads based native code to Web Workers and SharedArrayBuffer based WebAssembly code. So we leverage this capability to translate OpenCV pthreads API implementation into equivalent WebAssembly code by using Web Workers with SharedArrayBuffer. The multithreading version of OpenCV.js will have a pool of Web Workers and will schedule a worker when a new thread is being spawn. And this optimization can only be used in browser as node.js have no Web Worker API.

We expose two new API cv.parallel_pthreads_set_threads_num(number) and cv.parallel_pthreads_get_threads_num(), so we can use the former to set threads number dynamically and use the latter to get the current threads number. And the default threads number is the logic core number of the device.

Optimize the OpenCV.js performance by WebAssembly SIMD

WebAssembly is adding the support of SIMD128 instructions (i.e., WebAssembly SIMD). This features has been landed in V8/Chromium behind a developer flag. On the tooling side, the WebAssembly SIMD builtins has been added to LLVM compiler and Emscripten has released the first version of WebAssembly intrinsics. So we can use Emscripten LLVM upstream backend to translate the native vectorization implementation to WebAssembly SIMD128 instructions and deploy them to browsers.

Today’s OpenCV Universal intrinsics implementation have multiple backends for different architectures, such as SSE, NEON, AXV and VSX. Therefore, we added a new WebAssembly SIMD backend by using LLVM WebAssembly builtins and WebAssembly intrinsics.

We also enabled the WebAssembly intrinsics tests by compiling the native intrinsics tests to WebAssembly. With this tool, we can easily test whether our WebAssembly backend implementation of Universal Intrinsics is right. And now it pass all the tests.

The SIMD optimization is experimental as WebAssembly SIMD is still in development. Therefore, the simd version of OpenCV.js built by latest LLVM upstream may not work with the stable browser or old version of Node.js. Please use the latest version of unstable browser or Node.js to get new features, like Chrome Dev.

Result

For OpenCV kernels, take Threshold kernel with parameter (1920x1080, CV_8UC1, THRESH_BINARY) as example:

OS: Ubuntu 16.04.5

Emscripten: 1.38.42, LLVM upstream backend

Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)

Hardware: Core™ i7-8700 CPU @ 3.20GHz with 12 logical cores:

OpenCV.js Build Mean Time (ms) Speedup (to scalar)
scalar 1.164 1
threads 0.261 4.45
simd 0.123 9.46
threads + simd 0.039 29.84

For real case, take OpenCV.js face recognition sample as example:

OS: Ubuntu Linux 16.04.5

Emscripten: 1.38.42, LLVM upstream backend

Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)

Hardware: Intel® Core™ i7-8700 CPU @ 3.20GHz with 12 logical cores

OpenCV.js Build FPS Speedup (to scalar)
scalar 3 1
threads 10 3.33
simd 12 4
threads + simd 26 8.6

Future Work

  1. Add more modules and kernels into performance test, like core, feature2d, video and so on.

  2. Optimize the Universal Intrinsics WebAssembly backend with the development of WebAssembly SIMD.

OpenCV.js Demos

OpenCV.js Demos (May need the latest version of Chrome-Dev)

My video report for GSoC on Youtube

Commits List

The PR

The list of my commits