Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Sparsify [Alpha]

ML model optimization product to accelerate inference

🚨 February 2024: Important Sparsify Update

The Neural Magic team is pausing the Sparsify Alpha at this time. We are refocusing efforts around a new exciting project to be announced in the coming months. Thank you for your continued support and stay tuned!

🚨 October 2023: Important Sparsify Announcement

Given our new focus on enabling sparse large language models (LLMs) to run competitively on CPUs, Sparsify Alpha is undergoing upgrades to focus on fine-tuning and optimizing LLMs. This means that we will no longer be providing bug fixes, prioritizing support, or building new features and integrations for non-LLM flows including the CV and NLP Sparsify Pathways.

Neural Magic is super excited about these new efforts in building Sparsify into the best LLM fine-tuning and optimization tool on the market over the coming months and we cannot wait to share more soon. Thanks for your continued support!

🚨 July 2023: Sparsify's next generation is now in alpha as of version 1.6.0!

Sparsify enables you to accelerate inference without sacrificing accuracy by applying state-of-the-art pruning, quantization, and distillation algorithms to neural networks with a simple web application and one-command API calls.

Sparsify empowers you to compress models through two components:

Sparsify Cloud - a web application that allows you to create and manage Sparsify Experiments, explore hyperparameters, predict performance, and compare results across both Experiments and deployment scenarios.
Sparsify CLI/API - a Python package and GitHub repository that allows you to run Sparsify Experiments locally, sync with the Sparsify Cloud, and integrate them into your workflows.

Quickstart Guide
Companion Guides
Resources

Quickstart Guide

Interested in test-driving our alpha? Get a sneak peek and influence the product's development process. Thank you in advance for your feedback and interest!

This quickstart details several pathways you can work through. We encourage you to explore one for Sparsify's full benefits. When you finish the quickstart, sparsifying your models is as easy as:

sparsify.run sparse-transfer --use-case image-classification --data imagenette --optim-level 0.5

1. Install and Setup

1.1 Verify Prerequisites

First, verify that you have the correct software and hardware to run the Sparsify Alpha.

Software

Sparsify is tested on Python 3.8 and 3.10, ONNX 1.5.0-1.12.0, ONNX opset version 11+, and manylinux compliant systems. Sparsify is not supported natively on Windows and MAC OS.

Additionally, for installation from PyPi, pip 20.3+ is required.

Hardware

Sparsify requires a GPU with CUDA + CuDNN in order to sparsify neural networks. We recommend you use a Linux system with a GPU that has a minimum of 16GB of GPU Memory, 128GB of RAM, 4 CPU cores, and is CUDA-enabled. If you are sparsifying a very large model, you may need more RAM than the recommended 128GB. If you encounter issues setting up your training environment, file a GitHub issue.

1.2 Create an Account

Creating a new one-time account is simple and free.
An account is required to manage your Experiments and API keys.
Visit the Neural Magic's Web App Platform and create an account by entering your email, name, and unique password. If you already have a Neural Magic Account, sign in with your email.

1.3 Install Sparsify

pip is the preferred method for installing Sparsify. It is advised to create a fresh virtual environment to avoid dependency issues.

Install with pip using:

pip install sparsify-nightly

1.4 Log in via CLI

Next, with Sparsify installed on your training hardware:

Authorize the local CLI to access your account by running the sparsify.login command and providing your API key.
Locate your API key on the homepage of the Sparsify Cloud under the 'Get set up' modal, and copy the command or the API key itself.
Run the following command:

sparsify.login API_KEY

2. Run an Experiment

Experiments are the core of sparsifying a model. They allow you to apply sparsification algorithms to a dataset and model through the three Experiment types detailed below:

One-Shot
Training-Aware
Sparse-Transfer

All Experiments are run locally on your training hardware and can be synced with the cloud for further analysis and comparison, using Sparsify's two components:

Sparsify Cloud - explore hyperparameters, predict performance, and generate the desired CLI/API command.
Sparsify CLI/API - run an experiment.

2.1 One-Shot

Sparsity	Sparsification Speed	Accuracy
++	+++++	+++

One-Shot Experiments quickly sparsify your model post-training, providing a 3-5x speedup with minimal accuracy loss, ideal for quick model optimization without retraining your model.

To run a One-Shot Experiment for your model, dataset, and use case, use the following command:

sparsify.run one-shot --use-case USE_CASE --model MODEL --data DATASET --optim-level OPTIM_LEVEL

For example, to sparsify a ResNet-50 model on the ImageNet dataset for image classification, run the following commands:

wget https://public.neuralmagic.com/datasets/cv/classification/imagenet_calibration.tar.gz
tar -xzf imagenet_calibration.tar.gz -C ./imagenet_calibration
sparsify.run one-shot --use-case image_classification --model "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/base-none" --data ./imagenet_calibration --optim-level 0.5

Or, to sparsify a BERT model on the SST2 dataset for sentiment analysis, run the following commands:

wget https://public.neuralmagic.com/datasets/nlp/text_classification/sst2_bert_calibration.tar.gz
tar -xzf sst2_bert_calibration.tar.gz
sparsify.run one-shot --use-case text_classification --model "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/base-none" --data --data ./sst2_bert_calibration --optim-level 0.5

To dive deeper into One-Shot Experiments, read through the One-Shot Experiment Guide.

Note, One-Shot Experiments currently require the model to be in an ONNX format and the dataset to be in a NumPy format. More details are provided in the One-Shot Experiment Guide.

2.2 Sparse-Transfer

Sparsity	Sparsification Speed	Accuracy
++++	++++	+++++

Sparse-Transfer Experiments quickly create a smaller and faster model for your dataset by transferring from a SparseZoo pre-sparsified foundational model, providing a 5-10x speedup with minimal accuracy loss, ideal for quick model optimization without retraining your model.

To run a Sparse-Transfer Experiment for your model (optional), dataset, and use case, run the following command:

sparsify.run sparse-transfer --use-case USE_CASE --model OPTIONAL_MODEL --data DATASET --optim-level OPTIM_LEVEL

For example, to sparse transfer a SparseZoo model to the Imagenette dataset for image classification, run the following command:

sparsify.run sparse-transfer --use-case image_classification --data imagenette --optim-level 0.5

Or, to sparse transfer a SparseZoo model to the SST2 dataset for sentiment analysis, run the following command:

sparsify.run sparse-transfer --use-case text_classification --data sst2 --optim-level 0.5

To dive deeper into Sparse-Transfer Experiments, read through the Sparse-Transfer Experiment Guide.

Note, Sparse-Transfer Experiments require the model to be saved in a PyTorch format corresponding to the underlying integration such as Ultralytics YOLOv5 or Hugging Face Transformers. Datasets must additionally match the expected format of the underlying integration. More details and exact formats are provided in the Sparse-Transfer Experiment Guide.

2.3 Training-Aware

Sparsity	Sparsification Speed	Accuracy
+++++	++	+++++

Training-aware Experiments sparsify your model during training, providing a 6-12x speedup with minimal accuracy loss, ideal for thorough model optimization when the best performance and accuracy are required.

To run a Training-Aware Experiment for your model, dataset, and use case, run the following command:

sparsify.run training-aware --use-case USE_CASE --model OPTIONAL_MODEL --data DATASET --optim-level OPTIM_LEVEL

For example, to sparsify a ResNet-50 model on the Imagenette dataset for image classification, run the following command:

sparsify.run training-aware --use-case image_classification --model "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenette/base-none" --data imagenette --optim-level 0.5

Or, to sparsify a BERT model on the SST2 dataset for sentiment analysis, run the following command:

sparsify.run training-aware --use-case text_classification --model "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/base-none" --data sst2 --optim-level 0.5

To dive deeper into Training-Aware Experiments, read through the Training-Aware Experiment Guide.

Note that Training-Aware Experiments require the model to be saved in a PyTorch format corresponding to the underlying integration such as Ultralytics YOLOv5 or Hugging Face Transformers. Datasets must additionally match the expected format of the underlying integration. More details and exact formats are provided in the Training-Aware Experiment Guide.

3. Compare Results

Once you have run your Experiment, the results, logs, and deployment files will be saved under the current working directory in the following format:

[EXPERIMENT_TYPE]_[USE_CASE]_{DATE_TIME}
├── deployment
│   ├── model.onnx
│   └── *supporting files*
├── logs
│   ├── *logs*
├── training_artifacts
│   ├── *training artifacts*
    ├── *metrics and results*

You can compare the accuracy by looking through the metrics printed out to the console and the metrics saved in the experiment directory. Additionally, you can use DeepSparse to compare the inference performance on your CPU deployment hardware.

Note: In the near future, you will be able to visualize the results in Sparsify Cloud, simulate other scenarios and hyperparameters, compare the results to other Experiments, and package for your deployment scenario.

To run a benchmark on your deployment hardware, use the deepsparse.benchmark command with your original model and the new optimized model. This will run a number of inferences to simulate a real-world scenario and print out the results.

It's as simple as running the following command:

deepsparse.benchmark --model_path MODEL --scenario SCENARIO

For example, to benchmark a dense ResNet-50 model, run the following command:

deepsparse.benchmark --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenette/base-none" --scenario sync

This can then be compared to the sparsified ResNet-50 model with the following command:

deepsparse.benchmark --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" --scenario sync

The output will look similar to the following:

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230629 COMMUNITY | (fc8b788a) (release) (optimized) (system=avx512, binary=avx512)
deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
	onnx_file_path: ./model.onnx
	batch_size: 1
	num_cores: 1
	num_streams: 1
	scheduler: Scheduler.default
	fraction_of_supported_ops: 0.9981
	cpu_avx_type: avx512
	cpu_vnni: False
=Original Model Path: ./model.onnx
Batch Size: 1
Scenario: sync
Throughput (items/sec): 134.5611
Latency Mean (ms/batch): 7.4217
Latency Median (ms/batch): 7.4245
Latency Std (ms/batch): 0.0264
Iterations: 1346

See the DeepSparse Benchmarking User Guide for more information on benchmarking.

4. Deploy a Model

As an optional step to this quickstart, now that you have your optimized model, you are ready for inferencing. To get the most inference performance out of your optimized model, we recommend you deploy on Neural Magic's DeepSparse. DeepSparse is built to get the best performance out of optimized models on CPUs.

DeepSparse Server takes in a task and a model path and will enable you to serve models and Pipelines for deployment in HTTP.

You can deploy any ONNX model using DeepSparse Server with the following command:

deepsparse.server --task USE_CASE --model_path MODEL_PATH

Where USE_CASE is the use case of your Experiment and MODEL_PATH is the path to the deployment folder from the Experiment.

For example, to deploy a sparsified ResNet-50 model, run the following command:

deepsparse.server --task image_classification --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none"

If you're not ready for deploying, congratulations on completing the quickstart!

Companion Guides

Resources

Now that you have explored Sparsify [Alpha], here are other related resources.

Feedback and Support

Report UI issues and CLI errors, submit bug reports, and provide general feedback about the product to the Sparsify team via the nm-sparsify Slack Channel, or via GitHub Issues. Alpha support is provided through those channels.

Terms and Conditions

Sparsify Alpha is a pre-release version of Sparsify that is still in active development. The product is not yet ready for production use; APIs and UIs are subject to change. There may be bugs in the Alpha version, which we hope to have fixed before Beta and then a general Q3 2023 release. The feedback you provide on quality and usability helps us identify issues, fix them, and make Sparsify even better. This information is used internally by Neural Magic solely for that purpose. It is not shared or used in any other way.

That being said, we are excited to share this release and hear what you think. Thank you in advance for your feedback and interest!

Learning More

Documentation: SparseML, SparseZoo, Sparsify, DeepSparse
Neural Magic: Blog, Resources

Release History

Official builds are hosted on PyPI

stable: sparsify
nightly (dev): sparsify-nightly

Additionally, more information can be found via GitHub Releases.

License

The project is licensed under the Apache License Version 2.0.

Community

Contribute

We appreciate contributions to the code, examples, integrations, and documentation as well as bug reports and feature requests! Learn how here.

Join

For user help or questions about Sparsify, sign up or log in to our Neural Magic Community Slack. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue.

You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please fill out this form.

Cite

Find this project useful in your research or other communications? Please consider citing:

@InProceedings{
    pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research}, 
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}, 
    abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.} 
}

@misc{
    singh2020woodfisher,
    title={WoodFisher: Efficient Second-Order Approximation for Neural Network Compression}, 
    author={Sidak Pal Singh and Dan Alistarh},
    year={2020},
    eprint={2004.14340},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.7.0.20240304

Mar 5, 2024

1.7.0.20240131

Jan 31, 2024

1.7.0.20240124

Jan 24, 2024

1.7.0.20240103

Jan 3, 2024

1.7.0.20231210

Dec 11, 2023

1.6.0.20231201

Dec 1, 2023

1.6.0.20231120

Nov 20, 2023

1.6.0.20231110

Nov 10, 2023

1.6.0.20231031

Oct 31, 2023

1.6.0.20231020

Oct 20, 2023

1.6.0.20231019

Oct 19, 2023

1.6.0.20231012

Oct 12, 2023

1.6.0.20231011

Oct 11, 2023

1.6.0.20231007

Oct 9, 2023

1.6.0.20230928

Sep 28, 2023

1.6.0.20230923

Sep 23, 2023

1.6.0.20230906

Sep 6, 2023

1.6.0.20230829

Aug 29, 2023

1.6.0.20230825

Aug 25, 2023

1.6.0.20230823

Aug 23, 2023

1.6.0.20230815

Aug 16, 2023

1.6.0.20230811

Aug 14, 2023

1.6.0.20230809

Aug 9, 2023

1.6.0.20230801

Aug 1, 2023

1.6.0.20230723

Jul 24, 2023

1.6.0.20230721

Jul 20, 2023

1.6.0.20230720

Jul 20, 2023

1.6.0.20230616

Jun 16, 2023

1.6.0.20230608

Jun 8, 2023

1.6.0.20230604

Jun 5, 2023

1.5.0.20230521

May 24, 2023

1.5.0.20230520

May 24, 2023

1.5.0.20230516

May 16, 2023

1.5.0.20230509

May 9, 2023

1.5.0.20230502

May 2, 2023

1.5.0.20230420

Apr 20, 2023

1.5.0.20230418

Apr 18, 2023

1.5.0.20230412

Apr 12, 2023

1.5.0.20230404

Apr 4, 2023

1.5.0.20230330

Mar 30, 2023

1.5.0.20230329

Mar 29, 2023

1.5.0.20230320

Mar 20, 2023

1.5.0.20230301

Mar 2, 2023

1.5.0.20230228

Feb 28, 2023

1.5.0.20230224

Feb 27, 2023

1.5.0.20230216

Feb 16, 2023

1.4.0.20230210

Feb 13, 2023

1.4.0.20230208

Feb 8, 2023

1.4.0.20230202

Feb 2, 2023

1.4.0.20230124

Jan 24, 2023

1.4.0.20230120

Jan 20, 2023

1.4.0.20230117

Jan 17, 2023

1.4.0.20230114

Jan 16, 2023

1.4.0.20230105

Jan 5, 2023

1.4.0.20221230

Dec 31, 2022

1.3.0.20221216

Dec 17, 2022

1.3.0.20221129

Nov 29, 2022

1.3.0.20221121

Nov 21, 2022

1.3.0.20221118

Nov 18, 2022

1.3.0.20221115

Nov 15, 2022

1.3.0.20221108

Nov 8, 2022

1.3.0.20221104

Nov 4, 2022

1.2.0.20221024

Oct 24, 2022

1.2.0.20220927

Sep 28, 2022

1.2.0.20220916

Sep 16, 2022

1.2.0.20220903

Sep 6, 2022

1.2.0.20220826

Aug 26, 2022

1.1.0.20220819

Aug 19, 2022

1.1.0.20220810

Aug 10, 2022

1.1.0.20220804

Aug 4, 2022

1.1.0.20220722

Jul 22, 2022

1.1.0.20220715

Jul 15, 2022

1.1.0.20220713

Jul 13, 2022

1.1.0.20220707

Jul 7, 2022

1.1.0.20220701

Jul 1, 2022

1.1.0.20220629

Jun 29, 2022

0.13.0.20220623

Jun 23, 2022

0.13.0.20220621

Jun 21, 2022

0.13.0.20220609

Jun 9, 2022

0.13.0.20220601

Jun 1, 2022

0.13.0.20220521

May 23, 2022

0.13.0.20220517

May 17, 2022

0.13.0.20220514

May 16, 2022

0.13.0.20220506

May 7, 2022

0.12.0.20220419

Apr 20, 2022

0.12.0.20220412

Apr 12, 2022

0.12.0.20220406

Apr 6, 2022

0.12.0.20220405

Apr 5, 2022

0.12.0.20220329

Mar 29, 2022

0.12.0.20220324

Mar 24, 2022

0.12.0.20220322

Mar 22, 2022

0.12.0.20220318

Mar 18, 2022

0.12.0.20220315

Mar 15, 2022

0.11.0.20220311

Mar 11, 2022

0.11.0.20220225

Feb 25, 2022

0.11.0.20220223

Feb 23, 2022

0.11.0.20220221

Feb 21, 2022

0.11.0.20220216

Feb 16, 2022

0.11.0.20220215

Feb 15, 2022

0.11.0.20220211

Feb 11, 2022

0.11.0.20220209

Feb 9, 2022

0.11.0.20220208

Feb 8, 2022

0.10.0.20220123

Jan 23, 2022

0.10.0.20220119

Jan 19, 2022

0.10.0.20220111

Jan 12, 2022

0.10.0.20220106

Jan 6, 2022

0.10.0.20220103

Jan 3, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparsify-nightly-1.7.0.20240304.tar.gz (76.7 kB view hashes)

Uploaded Mar 5, 2024 Source

Built Distribution

sparsify_nightly-1.7.0.20240304-py3-none-manylinux1_x86_64.whl (104.2 kB view hashes)

Uploaded Mar 5, 2024 Python 3

Hashes for sparsify-nightly-1.7.0.20240304.tar.gz

Hashes for sparsify-nightly-1.7.0.20240304.tar.gz
Algorithm	Hash digest
SHA256	`2d74240a835f270b84f9958271042f1b66a638bd178f284639a7cfe401397fce`
MD5	`adddd46bc2ef0a65409ccceae48778b5`
BLAKE2b-256	`06de0fc5f49d22ccb3f13e66355bb6f26d1bc807cbd7b190a4937812f0ba47df`

Hashes for sparsify_nightly-1.7.0.20240304-py3-none-manylinux1_x86_64.whl

Hashes for sparsify_nightly-1.7.0.20240304-py3-none-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`0cd7f7ee5732b01e3e087e85267af904df716735a7100450905394b606eaa146`
MD5	`d1d3647a6e04b5c2cd4a7c34b49ea4b8`
BLAKE2b-256	`70a71421c58ade99f820a5d71b262cc7349dfb341cbc3c46ee890f94163bd97d`

sparsify-nightly 1.7.0.20240304

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Sparsify [Alpha]

ML model optimization product to accelerate inference

Table of Contents

Quickstart Guide

1. Install and Setup

1.1 Verify Prerequisites

1.2 Create an Account

1.3 Install Sparsify

1.4 Log in via CLI

2. Run an Experiment

2.1 One-Shot

2.2 Sparse-Transfer

2.3 Training-Aware

3. Compare Results

4. Deploy a Model

Companion Guides

Resources

Feedback and Support

Terms and Conditions

Learning More

Release History

License

Community

Contribute

Join

Cite

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution