OpenVINO™ toolkit Tried quantization (INT8) of public models in Open Model Zoo

Introduction

Intel has released pre-trained models in the form of Open Model Zoo, which can be divided into two main categories: Intel-provided models and public models.

GitHub - openvinotoolkit/open_model_zoo: Pre-trained Deep Learning models and demos (high quality and extremely fast)

Some quantized models are provided by Intel, but unfortunately not for the public models. This content introduces the procedure for generating quantized models using the OpenVINO™ DL Workbench.

This content is written on Ubuntu 18.04 with OpenVINO™ DL Workbench's Docker Image Tag 2021.4 as the target.

Reference URL:.

The main steps are

Start the Docker-based OpenVINO™ DL Workbench
Convert public model ssd_mobilenet_v2_coco to FP32 IR
Generate INT8 IR from FP32 IR of public model ssd_mobilenet_v2_coco

Prepare COCO dataset for OpenVINO™ DL Workbench

When using OpenVINO™ DL Workbench, there is a specification on how the datasets are to be stored. Following the Validation Dataset Tip displayed when selecting a dataset, the dataset must be pre-processed for the DL Workbench. Below we will show you how to do this.

1.1 We will be using the COCO2017 dataset, so download the images and annotation data.

1.2 Create a directory named COCO2017, and further create directories named annotations and val within it.

1.3 Unzip the downloaded image data and copy it into the val directory.

1.4 Unzip the downloaded annotations data and copy instances_val2017.json in it into the annotations directory.

1.5 Compress the coco2017 directory to create coco2017.zip.

2. launch Docker's OpenVINO™ DL Workbench

This section describes how to pull the Docker Image of DL Workbench and start DL Workbench.

2.1 Installing Docker

If you have not installed Docker, please refer to the following URL to install it.

https://docs.docker.com/engine/install/#server

2.2 Selecting an OpenVINO™ DL Workbench image

You will find a number of Tags on the Docker Hub. We will use 2021.4.

https://hub.docker.com/r/openvino/workbench

2.3 Pulling the Docker Image

First, we will pull the Docker Image, so copy the 2021.4 command.

In a terminal, grant sudo privileges, copy the command and run it.

 $ sudo docker image pull openvino/workbench:2021.4

2.4 Starting DL Workbench

For the command to start DL Workbench, use Excecute / Results after selecting the options at the following URL

https://docs.openvinotoolkit.org/latest/workbench_docs_Workbench_DG_Run_Locally.html

This time, "Linux" was selected for "My OS", "CPU" for "Accelerators on my machine", "Docker command" for "Start DL Workbench", and "No" for "Proxy".

2.5 Executing DL Workbench startup commands

Execute the following command from the terminal

 $ sudo docker run -p 0.0.0.0:5665:5665 --name workbench -it openvino/workbench:2021.4

Wait a few minutes for DL Workbench to set up. The setup is complete when a message similar to the following is generated

2.6 Accessing DL Workbench

Open the link in the console log ( http://127.0.0.1:5665 in this example ) in a web browser. If the following screen is displayed, DL Workbench has been successfully started.

3. generating FP32 IR from public model ssd_mobilenet_v2_coco

This time, we use the Public model ssd_mobilenet_v2_coco. After that, the data set is also specified, and the data set prepared at the beginning of this section is used.

3.1 Generating a DL Workbench Project

3.1.1.1 Click the "Create button" on the DL Workbench screen to create a project and move to the Create Project screen.

The following screen will be displayed.

3.1.2 Importing a Model

Click the "Import button" next to Model.

Click "ssd_mobilenet_v2_coco" to highlight it and click "Download and import button".

3.1.3 Testing IR file generation

Check if the imported model can be used to generate an IR file.

After "1. Import" and "2. Prepare Environment" are completed, select FP32 in "Precision" in "3.

Return to the Create Project screen and confirm that the IR for FP32 has been successfully generated.

3.1.4 Importing a Data Set

Scroll down the Create Project screen and click the "Import button" under "Validation Dataset.

Once you are on the Import Dataset screen, click on the "Upload Dataset tab.

Click the "Select Button" under "Select File" to select the prepared dataset coco2017.zip, and then click the "Import Button.

The upload is successfully completed.

3.1.5 Generating a DL Workbench Project (continued)

In the Create Project screen, select and highlight "ssd_mobilenet_v2_coco" in "Model".

In "Environment", select and highlight "CPU".

Under "Validation Dataset" select and highlight "coco2017".

Click on the "Create button" at the bottom of the screen as it becomes active to create the project.

The following screen will appear during project creation.

The project was successfully created, and the FP32 IR shows a Throughput of 147.3 FPS and a Latency of 6.39 ms.

3.2 Checking Accuracy

The Accuracy is checked on the generated project.

3.2.1 Since Accuracy is set to N/A, click the "screw icon" next to the network model to calculate the mAP.

3.2.2 Click the "Run Accuracy Check button".

The Accuracy (mAP) calculation is now complete. The result is 24.95.

The quantization model of INT8 is generated based on the IR of FP32.

4. generation of INT8 IR from FP32 IR of public model ssd_mobilenet_v2_coco

4.1 Scroll down the screen and click on the "Perform tab".

4.2 In the "Optimize Performance tab", leave the "Optimize Performance tab" as it is.
Under "Optimization Method," select "INT8" and click the "Optimize button.

You are now on the Optimize INT8 screen.

You can set the ratio to be used for the data set selected in 4.3 Subset Size. In this case, we leave it at the default value of 100%.

4.4 "Optimize Methods" and "Calibration Schemes" are first set to the defaults of
Leave "Default Method" and "Performance Preset" as they are, and click the "Optimize Button".

(If the generated INT8 mAP is bad, try a non-default setting.)

The generated IR for INT8 is now shown: Throughput 310.53 FPS and Latency 2.76 ms, which is a significant improvement in performance (the IR for FP32 is Throughput 310.53 FPS and Latency 2.76 ms, respectively). (FP32's IR is Throughput 147.3 FPS and Latency 6.39 ms.)

4.5 Since Accuracy is set to N/A, click the "screw icon on the side" to calculate the mAP of the network model.

4.6 Click the "Run Accuracy Check button.

Accuracy (mAP) calculation is now complete. The result is 24.7, confirming that Accuracy has hardly decreased at all. (Accyracy result for FP32 is 24.95)

4.7 This is the procedure for exporting the generated INT8 IRs.
Click on the "Export Project tab" in the "Perform tab." Click on the "Export Project tab" in the "Perform tab".

4.8 Set "Include Model" to "Yes" and click the "Export Button".

The model is now exported to the download directory.

You can also quantize yolo-based network models, so please try it out.

5. summary

As you can see, using OpenVINO™ DL Workbench, you can easily generate quantized IR formats from public models.
The 11th generation Intel® Core™ processors, such as Tiger Lake, support quantized IR not only on the CPU but also on the built-in GPU, which improves performance even without an external accelerator. Long-term supply of CPUs is also possible, making them ideal for mass production. Please give it a try.

Reference links

Performance Evaluation of Tiger Lake Quantization in IR Format (coming soon)
OpenVINO™ Toolkit Video (Renewal)
Convert YOLOv4 to IR format for use with the OpenVINO™ toolkit
Summary of demos included with OpenVINO™ (based on 2020.3)