130 million TDEs per second, Python + CUDA TDEs from Nikkhoo and Walter 2015
Project description
Python + CUDA TDEs from Nikkhoo and Walter 2015
CUDA and OpenCL-enabled fullspace triangle dislocation elements. Benchmarked at 130 million TDEs per second. Based on the original MATLAB code from Nikhoo and Walter 2015.. In addition to the basic pair-wise TDE operations for displacement and strain, cutde
also has:
- all pairs matrix construction functions.
- matrix free functions for low memory usage settings.
- block-wise functions that are especially helpful in an FMM or hierarchical matrix setting.
- a CUDA adaptive cross approximation implementation for building hierarchical matrices.
See below for basic usage and installation instructions. For more realistic usage examples, please check out the TDE sequence in the BIE book. You'll find examples of using all the above variants.
import matplotlib.pyplot as plt
import numpy as np
import cutde
xs = np.linspace(-2, 2, 200)
ys = np.linspace(-2, 2, 200)
obsx, obsy = np.meshgrid(xs, ys)
pts = np.array([obsx, obsy, 0 * obsy]).reshape((3, -1)).T.copy()
fault_pts = np.array([[-1, 0, 0], [1, 0, 0], [1, 0, -1], [-1, 0, -1]])
fault_tris = np.array([[0, 1, 2], [0, 2, 3]], dtype=np.int64)
disp_mat = cutde.disp_matrix(obs_pts=pts, tris=fault_pts[fault_tris], nu=0.25)
slip = np.array([[1, 0, 0], [1, 0, 0]])
disp = disp_mat.reshape((-1, 6)).dot(slip.flatten())
disp_grid = disp.reshape((*obsx.shape, 3))
plt.figure(figsize=(5, 5), dpi=300)
cntf = plt.contourf(obsx, obsy, disp_grid[:, :, 0], levels=21)
plt.contour(
obsx,
obsy,
disp_grid[:, :, 0],
colors="k",
linestyles="-",
linewidths=0.5,
levels=21,
)
plt.colorbar(cntf)
plt.title("$u_x$")
plt.tight_layout()
plt.savefig("docs/example.png", bbox_inches="tight")
Usage documentation
Simple pair-wise TDEs
Computing TDEs for observation point/source element pairs is really simple:
import cutde
disp = cutde.disp(obs_pts, src_tris, slips, nu)
strain = cutde.strain(obs_pts, src_tris, slips, nu)
obs_pts
is anp.array
with shape(N, 3)
src_tris
is anp.array
with shape(N, 3, 3)
where the second dimension corresponds to each vertex and the third dimension corresponds to the cooordinates of those vertices.- slips is a
np.array
with shape(N, 3)
whereslips[:,0]
is the strike slip component, while component 1 is the dip slip and component 2 is the tensile/opening component. - the last parameter, nu, is the Poisson ratio.
IMPORTANT: N should be the same for all these arrays. There is exactly one triangle and slip value used for each observation point.
- The output
disp
is a(N, 3)
array with displacement components in the x, y, z directions. - The output
strain
is a(N, 6)
array representing a symmetric tensor.strain[:,0]
is the xx component of strain, 1 is yy, 2 is zz, 3 is xy, 4 is xz, and 5 is yz.
I want stress.
Use:
stress = cutde.strain_to_stress(strain, sm, nu)
to convert from stress to strain assuming isotropic linear elasticity. sm
is the shear modulus and nu
is the Poisson ratio.
All pairs interactions matrix
If, instead, you want to create a matrix representing the interaction between every observation point and every source triangle, there is a different interface:
import cutde
disp_mat = cutde.disp_matrix(obs_pts, src_tris, nu)
strain_mat = cutde.strain_matrix(obs_pts, src_tris, nu)
obs_pts
is anp.array
with shape(N_OBS_PTS, 3)
src_tris
is anp.array
with shape(N_SRC_TRIS, 3, 3)
where the second dimension corresponds to each vertex and the third dimension corresponds to the cooordinates of those vertices.- the last parameter, nu, is the Poisson ratio.
- The output
disp_mat
is a(N_OBS_PTS, 3, N_SRC_TRIS, 3)
array. The second dimension corresponds to the components of the observed displacement while the fourth dimension corresponds to the component of the source slip vector. The slip vector components are ordered the same way as incutde.disp
andcutde.strain
. - The output
strain_mat
is a(N_OBS_PTS, 6, N_SRC_TRIS, 3)
array. Like above, the dimension corresponds to the components of the observation strain with the ordering identical tocutde.strain
.
Note that to use the strain_to_stress
function, you'll need to re-order the axes of the strain
array. You can do this with np.transpose(...)
.
Matrix-free all pairs interactions
A common use of the matrices produced above by cutde.disp_matrix
would be to perform matrix-vector products with a input vector with (N_SRC_TRIS * 3)
entries and an output vector with (N_OBS_PTS * 6)
entries. But, building the entire matrix can require a very large amount of memory. In some situations, it's useful to compute matrix-vector products without ever computing the matrix itself, a so-called "matrix-free" operation. In order to do this, the matrix entries are recomputed whenever they are needed. As a result, performing a matrix-vector product is much slower -- on my machine, about 20x slower. But, the trade-off may be worthwhile if you are memory-constrained.
disp = cutde.disp_free(obs_pts, src_tris, slips, nu)
strain = cutde.strain_free(obs_pts, src_tris, slips, nu)
The parameters are the same as for cutde.disp_matrix
with the addition of slips
. The slips
array is a (N_SRC_TRIS, 3)
array containing the source slip vectors.
Block-wise interaction matrices
In some settings, it is useful to compute many sub-blocks of a matrix without computing the full matrix. For example, this is useful for the nearfield component of a hierarchical matrix or fast multipole approximation (LINKS AND CITATIONS).
disp_matrices, block_idxs = cutde.disp_block(
obs_pts, src_tris, obs_start, obs_end, src_start, src_end, nu
)
strain_matrices, strain_block_idxs = cutde.strain_block(
obs_pts, src_tris, obs_start, obs_end, src_start, src_end, nu
)
obs_pts
,src_tris
andnu
are the same as fordisp_matrix
.obs_starts
andobs_end
are arrays withN_BLOCKS
elements representing the first and last observation point indices in each block.src_starts
andsrc_end
are arrays withN_BLOCKS
elements representing the first and last source triangle indices in each block.
The output disp_matrices
and strain_matrices
will be a densely packed representation with each block's boundaries demarcated by block_idxs
. As an example of extracting a single block:
disp_matrices, block_idxs = cutde.disp_block(obs_pts, src_tris, [0, 5], [5, 10], [0, 2], [2, 4], nu)
block1 = disp_matrices[block_idxs[0]:block_idxs[1]].reshape((5, 3, 2, 3))
Adaptive cross approximation (ACA)
Sometimes the matrix blocks we want to compute represent far-field interactions where the observation points are all sufficiently far away and separated as a group from the source triangles. In this situation, the matrix blocks are approximately low rank. An approximate matrix will require much less storage space and allow for more efficient matrix-vector products. Adaptive cross approximation is an algorithm for computing such a low rank representation. (LINKS AND CITATIONS)
disp_appxs = cutde.disp_aca(
obs_pts, tris, obs_start, obs_end, src_start, src_end, nu, tol, max_iter
)
The parameters are the same as cutde.disp_block
with the addition of tol
and max_iter
. The tolerance, tol
, is specified as an array of length N_BLOCKS
in terms of the Frobenius norm of the error matrix between the true matrix and the approximation. The algorithm is not guaranteed to reach the specified tolerance but should come very close. The maximum number of iterations (equal to the maximum rank of the approximation) is also specified as an array of length N_BLOCKS
.
The output disp_appxs
will be a list of (U, V)
pairs representing the left and right vectors of the low rank approximation. To approximate a matrix vector product:
U, V = disp_appxs[0]
y = U.dot(V.dot(x))
Installation
To install cutde
itself run:
pip install cutde
Then, install either PyCUDA or PyOpenCL following the directions below.
PyCUDA
If you have an NVIDIA GPU, install PyCUDA with:
conda config --prepend channels conda-forge
conda install -c conda-forge pycuda
Mac OS X
Install PyOpenCL and the PoCL OpenCL driver with:
conda config --prepend channels conda-forge
conda install pocl pyopencl
Ubuntu + PyOpenCL/PoCL
Just like on a Mac:
conda config --prepend channels conda-forge
conda install pocl pyopencl
Ubuntu + PyOpenCL with system drivers**
conda install pyopencl ocl-icd ocl-icd-system
You will need to install the system OpenCL drivers yourself depending on the hardware you have. See the "Something else" section below.
Windows
I'm not aware of anyone testing cutde on Windows yet. It should not be difficult to install. I would expect that you install pyopencl via conda and then install the OpenCL libraries and drivers that are provided by your hardware vendor. See the "Something else" section below.
Something else
I'd suggest starting by trying the instructions for the system most similar to yours above. If that doesn't work, never fear! OpenCL should be installable on almost all recent hardware and typical operating systems. These directions can be helpful.. I am happy to try to help if you have OpenCL installation issues, but I can't promise to be useful.
Why can't I use Apple CPU OpenCL?
You might have gotten the message: cutde does not support the Apple CPU OpenCL implementation and no other platform or device was found. Please consult the cutde README.
The Apple OpenCL implementation for Intel CPUs has very poor support for the OpenCL standard and causes lots of difficult-to-resolve errors. Instead, please use the PoCL implementation. You can install it with conda install -c conda-forge pocl
.
Development
For developing cutde
, clone the repo and set up your conda environment based on the environment.yml
with:
conda env create
Next, install either pycuda
or pyopencl
as instructed in the Installation section above.
Then, you should re-generate the baseline test data derived from the MATLAB code from Mehdi Nikhoo. To do this, first install octave
. On Ubuntu, this is just:
sudo apt-get install octave
And run
./tests/setup_test_env
which will run the tests/matlab/gen_test_data.m
script.
Finally, to check that cutde
is working properly, run pytest
!
The library is extremely simple:
cutde.fullspace
- the main entrypoint.fullspace.cu
- a direct translation of the original MATLAB into CUDA/OpenCL. This probably should not be modified.cutde.gpu
- a layer that abstracts between CUDA and OpenCLcutde.cuda
- the PyCUDA interface.cutde.opencl
- the PyOpenCL interface.
The tests/tde_profile.py
script is useful for assessing performance.
Some tests are marked as slow. To run these, run pytest --runslow
.
If you have both CUDA and OpenCL installed, cutde
will default to using CUDA. To use OpenCL instead, run export CUTDE_USE_OPENCL=1
to set the environment flag before launching the Python session that will use cutde
.
The README.md
is auto-generated from a template in docs/
. To run this process, run docs/build_docs
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.