Skip to main content

A nnie quantization aware training tool on pytorch.

Project description

nnieqat-pytorch

This is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and input data as fake fp32 format. To train model which is more friendly to NNIE, just import nnieqat and replace torch.nn default modules with corresponding one.

Table of Contents

  1. Installation
  2. Usage
  3. Code Examples
  4. Results
  5. Todo
  6. Reference

Installation

  • Supported Platforms: Linux
  • Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.
  • Dependencies:
    • python >= 3.5, < 4
    • llvmlite >= 0.31.0
    • pytorch >= 1.0
    • numba >= 0.42.0
    • numpy >= 1.18.1
  • Install nnieqat via pypi: $ pip install nnieqat

Usage

  • Replace default module with NNIE quantization optimized one. include:

    • torch.nn.modules.conv -> nnieqat.modules.conv
    • torch.nn.modules.linear -> nnieqat.modules.linear
    • torch.nn.modules.pooling -> nnieqat.modules.pooling
    from nnieqat.modules import convert_layers
    ...
    ...
      model = convert_layers(model)
      print(model)  # Quantized layers have "Quantized" prefix.
    ...
    
  • Freeze bn after a few epochs of training

    from nnieqat.gpu.quantize import freeze_bn
    ...
    ...
        if epoch > 2:
          net.apply(freeze_bn)
    ...
    
  • Unquantize weight before update it

    from nnieqat.gpu.quantize import unquant_weight
    ...
    ...
        net.apply(unquant_weight)
        optimizer.step()
    ...
    
  • Dump weight quantized model

    from nnieqat.gpu.quantize import quant_weight, unquant_weight
    ...
    ...
        net.apply(quant_weight)
        save_checkpoint(...)
        net.apply(unquant_weight)
    ...
    

Code Examples

Results

  • ImageNet

    python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
    python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
    python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft
    

    finetune result:

    trt_fp32 trt_int8 nnie
    torchvision 0.56992 0.56424 0.56026
    nnie_lr_e-3_ft 0.56600 0.56328 0.56612
    lr_e-4_ft 0.57884 0.57502 0.57542
    nnie_lr_e-4_ft 0.57834 0.57524 0.57730

Todo

  • Multiple GPU training support.
  • Other platforms and accelerators support.
  • Generate quantized model directly.

Reference

HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nnieqat-0.1.0b0.tar.gz (812.0 kB view hashes)

Uploaded Source

Built Distribution

nnieqat-0.1.0b0-py3-none-any.whl (818.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page