A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

torchsnapshot

This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible. If you have suggestions for improvements, please open a GitHub issue. We'd love to hear your feedback.

A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

Install

Requires Python >= 3.7 and PyTorch >= 1.12

From pip:

pip install --pre torchsnapshot-nightly

From source:

git clone https://github.com/pytorch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install

Why TorchSnapshot

Performance

TorchSnapshot provides a fast checkpointing implementation employing various optimizations, including zero-copy serialization for most tensor types, overlapped device-to-host copy and storage I/O, parallelized storage I/O.
TorchSnapshot greatly speeds up checkpointing for DistributedDataParallel workloads by distributing the write load across all ranks (benchmark).
When host memory is abundant, TorchSnapshot allows training to resume before all storage I/O completes, reducing the time blocked by checkpoint saving.

Memory Usage

TorchSnapshot's memory usage adapts to the host's available resources, greatly reducing the chance of out-of-memory issues when saving and loading checkpoints.
TorchSnapshot supports efficient random access to individual objects within a snapshot, even when the snapshot is stored in a cloud object storage.

Usability

Simple APIs that are consistent between distributed and non-distributed workloads.
Out of the box integration with commonly used cloud object storage systems.
Automatic resharding (elasticity) on world size change for supported workloads (more details).

Security

Secure tensor serialization without pickle dependency [WIP].

Getting Started

from torchsnapshot import Snapshot

# Taking a snapshot
app_state = {"model": model, "optimizer": optimizer}
snapshot = Snapshot.take(app_state=app_state, "/path/to/snapshot")

# Restoring from a snapshot
snapshot.restore(app_state=app_state)

See the documentation for more details.

License

torchsnapshot is BSD licensed, as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2024.1.11

Jan 11, 2024

2024.1.10

Jan 10, 2024

2024.1.9

Jan 9, 2024

2024.1.8

Jan 8, 2024

2024.1.7

Jan 7, 2024

2024.1.6

Jan 6, 2024

2024.1.5

Jan 5, 2024

2024.1.4

Jan 4, 2024

2024.1.3

Jan 3, 2024

2024.1.2

Jan 2, 2024

2024.1.1

Jan 1, 2024

2023.12.31

Dec 31, 2023

2023.12.30

Dec 30, 2023

2023.12.29

Dec 29, 2023

2023.12.28

Dec 28, 2023

2023.12.27

Dec 27, 2023

2023.12.26

Dec 26, 2023

2023.12.25

Dec 25, 2023

2023.12.24

Dec 24, 2023

2023.12.23

Dec 23, 2023

2023.12.22

Dec 22, 2023

2023.12.19

Dec 19, 2023

2023.12.18

Dec 18, 2023

2023.12.17

Dec 17, 2023

2023.12.16

Dec 16, 2023

2023.12.15

Dec 15, 2023

2023.12.14

Dec 14, 2023

2023.12.13

Dec 13, 2023

2023.12.12

Dec 12, 2023

2023.12.11

Dec 11, 2023

2023.12.9

Dec 9, 2023

2023.12.8

Dec 8, 2023

2023.12.7

Dec 7, 2023

2023.12.6

Dec 6, 2023

2023.12.5

Dec 5, 2023

2023.12.4

Dec 4, 2023

2023.12.2

Dec 2, 2023

2023.12.1

Dec 1, 2023

2023.11.30

Nov 30, 2023

2023.11.29

Nov 29, 2023

2023.11.28

Nov 28, 2023

2023.11.27

Nov 27, 2023

2023.11.26

Nov 26, 2023

2023.11.25

Nov 25, 2023

2023.11.24

Nov 24, 2023

2023.11.23

Nov 23, 2023

2023.11.22

Nov 22, 2023

2023.11.21

Nov 21, 2023

2023.11.20

Nov 20, 2023

2023.11.19

Nov 19, 2023

2023.11.18

Nov 18, 2023

2023.11.17

Nov 17, 2023

2023.11.16

Nov 16, 2023

2023.11.15

Nov 15, 2023

2023.11.14

Nov 14, 2023

2023.11.13

Nov 13, 2023

2023.11.12

Nov 12, 2023

2023.11.11

Nov 11, 2023

2023.11.10

Nov 10, 2023

2023.11.9

Nov 9, 2023

2023.11.7

Nov 7, 2023

2023.11.6

Nov 6, 2023

2023.11.5

Nov 5, 2023

2023.11.3

Nov 3, 2023

2023.11.2

Nov 2, 2023

2023.11.1

Nov 1, 2023

2023.10.25

Oct 25, 2023

2023.10.24

Oct 24, 2023

2023.10.22

Oct 22, 2023

2023.10.21

Oct 21, 2023

2023.10.20

Oct 20, 2023

2023.10.19

Oct 19, 2023

2023.10.18

Oct 18, 2023

2023.10.11

Oct 11, 2023

2023.10.10

Oct 10, 2023

2023.10.9

Oct 9, 2023

2023.10.8

Oct 8, 2023

2023.10.7

Oct 7, 2023

2023.10.6

Oct 6, 2023

2023.10.5

Oct 5, 2023

2023.10.4

Oct 4, 2023

2023.10.3

Oct 3, 2023

2023.10.2

Oct 2, 2023

2023.10.1

Oct 1, 2023

2023.9.30

Sep 30, 2023

2023.9.29

Sep 29, 2023

2023.9.28

Sep 28, 2023

2023.9.27

Sep 27, 2023

2023.9.26

Sep 26, 2023

2023.9.25

Sep 25, 2023

2023.9.24

Sep 24, 2023

2023.9.23

Sep 23, 2023

2023.9.22

Sep 22, 2023

2023.9.21

Sep 21, 2023

2023.9.20

Sep 20, 2023

2023.9.19

Sep 19, 2023

2023.9.18

Sep 18, 2023

2023.9.16

Sep 16, 2023

2023.9.15

Sep 15, 2023

2023.9.14

Sep 14, 2023

2023.9.13

Sep 13, 2023

2023.9.12

Sep 12, 2023

2023.9.11

Sep 11, 2023

2023.9.10

Sep 10, 2023

2023.9.9

Sep 9, 2023

2023.9.7

Sep 7, 2023

2023.9.6

Sep 6, 2023

2023.9.5

Sep 5, 2023

2023.9.4

Sep 4, 2023

2023.9.3

Sep 3, 2023

2023.9.2

Sep 2, 2023

2023.9.1

Sep 1, 2023

2023.8.31

Aug 31, 2023

2023.8.29

Aug 29, 2023

2023.8.28

Aug 28, 2023

2023.8.27

Aug 27, 2023

2023.8.26

Aug 26, 2023

2023.8.24

Aug 24, 2023

2023.8.23

Aug 23, 2023

2023.8.20

Aug 20, 2023

2023.8.19

Aug 19, 2023

2023.3.15

Mar 15, 2023

2023.3.14

Mar 14, 2023

2023.3.13

Mar 13, 2023

2023.3.12

Mar 12, 2023

2023.3.11

Mar 11, 2023

2023.3.10

Mar 10, 2023

2023.3.9

Mar 9, 2023

2023.3.8

Mar 8, 2023

2023.3.7

Mar 7, 2023

2023.3.6

Mar 6, 2023

2023.3.5

Mar 5, 2023

2023.3.4

Mar 4, 2023

2023.3.3

Mar 3, 2023

2023.2.17

Feb 17, 2023

2023.2.15

Feb 15, 2023

2023.2.14

Feb 14, 2023

2023.2.13

Feb 13, 2023

2023.2.12

Feb 12, 2023

2023.2.11

Feb 11, 2023

2023.2.10

Feb 10, 2023

2023.2.9

Feb 9, 2023

2023.2.8

Feb 8, 2023

2023.2.7

Feb 7, 2023

2023.2.5

Feb 5, 2023

2023.2.4

Feb 4, 2023

2023.2.3

Feb 3, 2023

2023.2.2

Feb 2, 2023

2023.2.1

Feb 1, 2023

2022.11.28

Nov 28, 2022

2022.11.27

Nov 27, 2022

2022.11.26

Nov 26, 2022

2022.11.25

Nov 25, 2022

2022.11.24

Nov 24, 2022

2022.11.23

Nov 23, 2022

2022.11.22

Nov 22, 2022

2022.11.21

Nov 21, 2022

2022.11.20

Nov 20, 2022

2022.11.19

Nov 19, 2022

2022.11.18

Nov 18, 2022

2022.11.17

Nov 17, 2022

2022.11.16

Nov 16, 2022

2022.11.15

Nov 15, 2022

2022.11.14

Nov 14, 2022

2022.11.13

Nov 13, 2022

2022.11.11

Nov 11, 2022

2022.11.10

Nov 10, 2022

2022.11.9

Nov 9, 2022

2022.11.4

Nov 4, 2022

2022.11.2

Nov 2, 2022

2022.11.1

Nov 1, 2022

2022.10.31

Oct 31, 2022

2022.10.30

Oct 30, 2022

2022.10.29

Oct 29, 2022

2022.10.28

Oct 28, 2022

2022.10.27

Oct 27, 2022

This version

2022.10.19

Oct 19, 2022

2022.10.18

Oct 18, 2022

2022.10.17

Oct 17, 2022

2022.10.16

Oct 16, 2022

2022.10.13

Oct 13, 2022

2022.10.12

Oct 12, 2022

2022.10.11

Oct 11, 2022

2022.10.10

Oct 10, 2022

2022.10.9

Oct 9, 2022

2022.10.7

Oct 7, 2022

2022.10.6

Oct 6, 2022

2022.10.5

Oct 5, 2022

2022.10.4

Oct 4, 2022

2022.10.3

Oct 3, 2022

2022.10.2

Oct 2, 2022

2022.10.1

Oct 1, 2022

2022.9.30

Sep 30, 2022

2022.9.29

Sep 29, 2022

2022.9.28

Sep 28, 2022

2022.9.27

Sep 27, 2022

2022.9.26

Sep 26, 2022

2022.9.25

Sep 25, 2022

2022.9.24

Sep 24, 2022

2022.9.23

Sep 23, 2022

2022.9.22

Sep 22, 2022

2022.9.21

Sep 21, 2022

2022.9.20

Sep 20, 2022

2022.9.19

Sep 19, 2022

2022.9.18

Sep 18, 2022

2022.9.17

Sep 17, 2022

2022.9.16

Sep 16, 2022

2022.9.15

Sep 15, 2022

2022.9.14

Sep 14, 2022

2022.9.13

Sep 13, 2022

2022.9.12

Sep 12, 2022

2022.9.11

Sep 11, 2022

2022.9.10

Sep 10, 2022

2022.9.9

Sep 9, 2022

2022.9.8

Sep 8, 2022

2022.9.7

Sep 7, 2022

2022.9.6

Sep 6, 2022

2022.9.5

Sep 5, 2022

2022.9.4

Sep 4, 2022

2022.9.3

Sep 3, 2022

2022.9.2

Sep 2, 2022

2022.9.1

Sep 1, 2022

2022.8.31

Aug 31, 2022

2022.8.30

Aug 30, 2022

2022.8.29

Aug 29, 2022

2022.8.28

Aug 28, 2022

2022.8.27

Aug 27, 2022

2022.8.26

Aug 26, 2022

2022.8.25

Aug 25, 2022

2022.8.24

Aug 24, 2022

2022.8.23

Aug 23, 2022

2022.8.22

Aug 22, 2022

2022.8.21

Aug 21, 2022

2022.8.20

Aug 20, 2022

2022.8.19

Aug 19, 2022

2022.8.18

Aug 18, 2022

2022.8.17

Aug 17, 2022

2022.8.16

Aug 16, 2022

2022.8.14

Aug 14, 2022

2022.8.13

Aug 13, 2022

2022.8.12

Aug 12, 2022

2022.8.11

Aug 11, 2022

2022.8.10

Aug 10, 2022

2022.8.9

Aug 9, 2022

2022.8.8

Aug 8, 2022

2022.8.7

Aug 7, 2022

2022.8.6

Aug 6, 2022

2022.8.5

Aug 5, 2022

2022.8.4

Aug 4, 2022

2022.8.3

Aug 3, 2022

2022.7.30

Jul 30, 2022

2022.7.29

Jul 29, 2022

2022.7.29a0 pre-release

Jul 29, 2022

2022.7.28

Jul 28, 2022

2022.7.27

Jul 27, 2022

2022.7.26

Jul 26, 2022

2022.7.25

Jul 25, 2022

2022.7.24

Jul 24, 2022

2022.7.23

Jul 23, 2022

2022.7.22

Jul 22, 2022

2022.7.21

Jul 21, 2022

2022.7.19

Jul 19, 2022

2022.7.18

Jul 18, 2022

2022.7.17

Jul 17, 2022

2022.7.16

Jul 16, 2022

2022.7.15

Jul 15, 2022

2022.7.14

Jul 14, 2022

2022.7.14a0 pre-release

Jul 14, 2022

2022.7.13

Jul 13, 2022

2022.7.12

Jul 12, 2022

2022.7.12a0 pre-release

Jul 12, 2022

2022.7.11

Jul 11, 2022

2022.7.10

Jul 10, 2022

2022.7.9

Jul 9, 2022

2022.7.8

Jul 8, 2022

2022.7.7

Jul 7, 2022

2022.7.6

Jul 6, 2022

2022.7.5

Jul 5, 2022

2022.7.4

Jul 4, 2022

2022.7.3

Jul 3, 2022

2022.7.2

Jul 2, 2022

2022.7.1

Jul 1, 2022

2022.6.30

Jun 30, 2022

2022.6.29

Jun 29, 2022

2022.6.28

Jun 28, 2022

2022.6.27

Jun 27, 2022

2022.6.26

Jun 26, 2022

2022.6.25

Jun 25, 2022

2022.6.24

Jun 24, 2022

2022.6.23

Jun 23, 2022

2022.6.22

Jun 22, 2022

2022.6.21

Jun 21, 2022

2022.6.20

Jun 20, 2022

2022.6.19

Jun 19, 2022

2022.6.18

Jun 18, 2022

2022.6.17

Jun 17, 2022

2022.6.17a3 pre-release

Jun 17, 2022

2022.6.16

Jun 16, 2022

2022.6.16a2 pre-release

Jun 16, 2022

2022.6.15

Jun 15, 2022

2022.6.14

Jun 14, 2022

2022.6.13

Jun 14, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchsnapshot-nightly-2022.10.19.tar.gz (50.5 kB view hashes)

Uploaded Oct 19, 2022 Source

Built Distribution

torchsnapshot_nightly-2022.10.19-py3-none-any.whl (61.5 kB view hashes)

Uploaded Oct 19, 2022 Python 3

Hashes for torchsnapshot-nightly-2022.10.19.tar.gz

Hashes for torchsnapshot-nightly-2022.10.19.tar.gz
Algorithm	Hash digest
SHA256	`0eace96fe16b9c2eccc53324544da1381364b419fa8066738aefc28ea2cfa477`
MD5	`0951cdcea7970bc7779c9be81a3c3742`
BLAKE2b-256	`50f189c474181fd3c5d20f44f340a3c7c39a13edeffea534121ea696f9478ba8`

Hashes for torchsnapshot_nightly-2022.10.19-py3-none-any.whl

Hashes for torchsnapshot_nightly-2022.10.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b5a89b9cb942bc4feb3732b81b5afb5a9520cb074b35a5456f53a5606cbd25a`
MD5	`df50a37ad73cc253e4f01279a277d528`
BLAKE2b-256	`b9a76ffca4db8ae4ab47207418515edaaa5962f53955733c84ecc4ebc02ddcff`