Skip to main content

Python bindings for CityHash and FarmHash

Project description

CityHash/FarmHash

Python wrapper for FarmHash and CityHash, a family of fast non-cryptographic hash functions.

Build Status Latest Version Downloads License Supported Python versions

Getting Started

To use this package in your program, simply type

pip install cityhash

This package exposes Python APIs for CityHash and FarmHash under cityhash and farmhash namespaces, respectively. Each provides 32-, 64- and 128-bit implementations.

Usage Examples

Stateless hashing

Usage example for FarmHash:

>>> from farmhash import FarmHash32, FarmHash64, FarmHash128
>>> FarmHash32("abc")
1961358185
>>> FarmHash64("abc")
2640714258260161385
>>> FarmHash128("abc")
76434233956484675513733017140465933893

Hardware-independent fingerprints

Fingerprints are seedless hashes which are guaranteed to be hardware- and platform-independent. This can be useful for networking applications which require persisting hashed values.

>>> from farmhash import Fingerprint128
>>> Fingerprint128("abc")
76434233956484675513733017140465933893

Incremental hashing

CityHash and FarmHash do not support incremental hashing and thus are not ideal for hashing of streams. If you require incremental hashing feature, use MetroHash or xxHash instead, which do support it.

Fast hashing of NumPy arrays

The Python Buffer Protocol allows Python objects to expose their data as raw byte arrays to other objects, for fast access without copying to a separate location in memory. Among others, NumPy is a major framework that supports this protocol.

All hashing functions in this packege will read byte arrays from objects that expose them via the buffer protocol. Here is an example showing hashing of a 4D NumPy array:

>>> import numpy as np
>>> from farmhash import FarmHash64
>>> arr = np.zeros((256, 256, 4))
>>> FarmHash64(arr)
1550282412043536862

The arrays need to be contiguous for this to work. To convert a non-contiguous array, use NumPy's ascontiguousarray() function.

SSE4.2 support

The 32-bit FarmHash variants benefit tremendously from SSE4.2 optimization, resulting in arguably the fastest non-cryptographic funtions in the 32-bit category. The 64-bit FarmHash version also benefits from SSE4.2 being enabled, but not as much as the 32-bit version does. It is still among the fastest 64-bit hash functions.

The vanilla CityHash fucntions (under cityhash module) do not take advantage of SSE4.2. Instead, the cityhashcrc module (provided with this package for x86-64 Mac and Linux platforms) exposes 128- and 256-bit CRC functions that were specifically built take advantage from microprocessor-specific instructions and which do harness SSE4.2, These functions are very fast, and in fact beat FarmHash128 on speed (though please verify for yourself whether they provide sufficient randomness).

For most use cases, I would recommend FarmHash over CityHash as it handles SSE4.2 optimizations more transparently and includes a bunch of other improvements.

Development

Local workflow

For those who want to contribute, here is a quick start using some makefile commands:

git clone https://github.com/escherba/python-cityhash.git
cd python-cityhash
make env           # create a Python virtualenv
make test          # run Python tests
make cpp-test      # run C++ tests
make shell         # enter IPython shell

The Makefiles provided have self-documenting targets. To find out which targets are available, type:

make help

Distribution

The wheels are built using cibuildwheel and are distributed to PyPI using GitHub actions using this workflow. The wheels contain compiled binaries and are available for the following platforms: windows-amd64, ubuntu-x86, linux-x86_64, linux-aarch64, and macosx-x86_64.

See Also

For other fast non-cryptographic hash functions available as Python extensions, see MetroHash, MurmurHash, and xxHash.

Authors

The original CityHash Python bindings are due to Alexander [Amper] Marshalov. These were rewritten in Cython by Eugene Scherba, who also added the FarmHash bindings. The CityHash and FarmHash algorithms and their C++ implementation are by Google.

License

This software is licensed under the MIT License. See the included LICENSE file for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cityhash-0.3.5.post3.tar.gz (209.8 kB view hashes)

Uploaded Source

Built Distributions

cityhash-0.3.5.post3-cp39-cp39-win_amd64.whl (46.8 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

cityhash-0.3.5.post3-cp39-cp39-win32.whl (47.4 kB view hashes)

Uploaded CPython 3.9 Windows x86

cityhash-0.3.5.post3-cp39-cp39-manylinux2010_x86_64.whl (528.8 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

cityhash-0.3.5.post3-cp39-cp39-manylinux1_x86_64.whl (528.8 kB view hashes)

Uploaded CPython 3.9

cityhash-0.3.5.post3-cp39-cp39-macosx_11_0_arm64.whl (61.6 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

cityhash-0.3.5.post3-cp39-cp39-macosx_10_9_x86_64.whl (69.3 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

cityhash-0.3.5.post3-cp38-cp38-win_amd64.whl (46.9 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

cityhash-0.3.5.post3-cp38-cp38-win32.whl (47.5 kB view hashes)

Uploaded CPython 3.8 Windows x86

cityhash-0.3.5.post3-cp38-cp38-manylinux2010_x86_64.whl (535.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

cityhash-0.3.5.post3-cp38-cp38-manylinux1_x86_64.whl (534.9 kB view hashes)

Uploaded CPython 3.8

cityhash-0.3.5.post3-cp38-cp38-macosx_10_9_x86_64.whl (69.4 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

cityhash-0.3.5.post3-cp37-cp37m-win_amd64.whl (46.5 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

cityhash-0.3.5.post3-cp37-cp37m-win32.whl (47.2 kB view hashes)

Uploaded CPython 3.7m Windows x86

cityhash-0.3.5.post3-cp37-cp37m-manylinux2010_x86_64.whl (512.4 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

cityhash-0.3.5.post3-cp37-cp37m-manylinux1_x86_64.whl (512.4 kB view hashes)

Uploaded CPython 3.7m

cityhash-0.3.5.post3-cp37-cp37m-macosx_10_9_x86_64.whl (68.7 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

cityhash-0.3.5.post3-cp36-cp36m-win_amd64.whl (46.5 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

cityhash-0.3.5.post3-cp36-cp36m-win32.whl (47.1 kB view hashes)

Uploaded CPython 3.6m Windows x86

cityhash-0.3.5.post3-cp36-cp36m-manylinux2010_x86_64.whl (508.1 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

cityhash-0.3.5.post3-cp36-cp36m-manylinux1_x86_64.whl (508.1 kB view hashes)

Uploaded CPython 3.6m

cityhash-0.3.5.post3-cp36-cp36m-macosx_10_9_x86_64.whl (68.5 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page