Tool for calculating costs and needs between cloud and HPC.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- Unix
Programming Language
- Python
- Python :: 3.8
Topic
- Scientific/Engineering
- Software Development

Project description

Cloud Select

This is a tool that helps a user select a cloud. It will make it easy for an HPC user to say:

I need 4 nodes with these criteria, to run in the cloud.

And then be given a set of options and prices for different clouds to choose from. There are some supporting packages that exist already (in Go for AWS) so we will start there.

🚧️ under developemnt 🚧️

This tool is under development and is not ready for production use.

Usage

Installation

You can typically create an environment

$ python -m venv env
$ source env/bin/activate

and then pip install. You can install with no clouds (assuming you have a cache), or support for all clouds, or selected clouds:

# No clouds (assuming using cache)
$ pip install cloud-select-tool

# All clouds
$ pip install cloud-select-tool[all]

# Google Cloud
$ pip install cloud-select-tool[google]

# Amazon web services
$ pip install cloud-select-tool[aws]

or install from the repository:

$ git clone https://github.com/converged-computing/cloud-select
$ cd cloud-select
$ pip install .

To do a development install (from your local tree):

$ pip install -e .

This should place an executable, cloud-select in your path.

Clouds Supported

We currently support Amazon Web Services and Google Cloud. If you have cached data for either cloud, that can be used without credentials. If not, credentials are required for an initial retrieval of data.

Google Cloud

For Google Cloud, you can generally provide your default credentials

$ gcloud auth application-default login

to be discovered by the client. You will need to enable the billing and compute APIs.

Commands

Instance

Find an instance based on availability. The client will connect to the clouds that you have cached data for, and also the clouds that you have credentials for. If you have neither credentials nor data, you'll get an error. There are a lot of variables to select from, see:

$ cloud-select instance --help

Let's ask for an exact amount of memory (as opposed to a min and/or max). This will not print instance atoms to the screen.

$ cloud-select instance --memory 4

If you want to see the atoms:

$ cloud-select --verbose instance --memory 4

Or write the atoms to file:

$ cloud-select instance --memory 4 --out atoms.lp

Ask for a specific cloud on the command line (note you can also ask via your settings.yml configuration file for a more permanent default):

$ cloud-select --cloud google instance --cpus-min 200 --cpus-max 400

Sorting

By default we sort results based on the order the solver produces them. However, you can ask to sort your results by an attribute, e.g., here is memory:

$ cloud-select --sort-by memory instance

By default, when sort is enabled on an attribute we do descending, so the largest values are at the top. You can ask to reverse that with --asc for ascending, meaning we sort from least to greatest:

$ cloud-select --asc --sort-by memory instance

Max Results

You can always change the max results (which defaults to 25):

$ cloud-select --max-results 100 instance

We currently sort from greatest to least. Set max results to 0 to set no limit.

$ cloud-select --max-results 0 instance

Note that this argument comes before the instance command.

Regions

For regions, note that you have a default set in your settings.yml. E.g.,:

google:
  regions: ["us-east1", "us-west1", "us-central1"]

aws:
  regions: ["us-east-1"]

These are used for API calls to retrieve a filtered set, but not to filter that set. You should generally be more verbose in this set, as it will be the meta set we further filter. When appropriate, "global" is also added to find resources across regions. For a one-off region for a query:

$ cloud-select  instance  --region east

Since region names are non consistent across clouds, the above is just a regular expression. This means that to change region:

edit settings.yml to change the global set you use
add --region to a particular query to filter (within the set above).

If you have a cache with older data (and different regions) you will want to clear it. If we eventually store the cache by region this might be easier to manage, however this isn't done yet to maintain simplicity of design.

Note We use regions and zones a bit generously - on a high level a region encompasses many zones, and thus a specification of regions (as shown below) typically indicates regions, but under the hood we might be filtering the specific zones. A result might generally be labeled with "region" and include a zone name.

Cache Only

To set a global setting to only use the cache (and skip trying to authenticate) you cat set cache_only in your settings.yml to true:

cache_only: true

This will be the default when we are able to provide a remote cache, as then you won't be required to have your own credential to use the tool out of the box!

TODO and Questions

See our current design document for background about design.

[ ]create cache of instance types and maybe prices in GitHub (e.g., automated update)
[ ]add tests and testing workflow
- [ ]properties testing for handling min/max/numbers
- ensure that required set of attributes for each instance are returned (e.g., name, cpu, memory)
how to handle instances that don't have an attribute of interest? Should we unselect them?
pretty branded documentation and spell checking
add GPU memory - available in AWS and I cannot find for GCP
should cache be organized by region to allow easier filter (data for AWS doesn't have that attribute)
need to do something with costs
can we just scrape prices from? https://cloud.google.com/compute/all-pricing
TODO: we don't currently account for region as unique property in results (and need to)

Future desires

These are either "nice to have" or small details we can improve upon. Aka, not top priority.

should we allow currency outside USD? Probably not for now.
could eventually support different resource types (beyond compute or types of prices, e.g., pre-emptible vs. on demand)
aws instance listing (based on regions) should validate regions - an invalid regions simply returns no results
for AWS description, when appropriate convert to TB (like Google does)

Planning for minimizing cost:

% generate a bunch of candidate_instance() predicates for each instance type that matches the user request
candidate_instance(Cloud, Instance) :-
  cloud_instance_type(Cloud, Instance),
  instance_attr(Cloud, Instance, Name, Value) : requested_attr(Name, Value).

% Tell clingo to select exactly one (at least one and at most one) of them
1 { select(Cloud, Instance) : candidate_instance(Cloud, Instance) } 1.

% associate the cost from your input facts with every candidate instance
selected_instance_cost(Cloud, Instance, Cost) :-
  select(Cloud, Instance),
  instance_cost(Cloud, Instance, Cost).

% tell clingo to find the solution (the one select() it got to choose with minimal cost
#minimize { Cost,Cloud,Instance : selected_instance_cost(Cloud, Instance, Cost) }.cv

😁️ Contributors 😁️

We use the all-contributors tool to generate a contributors graphic below.

_{Vanessasaurus}
💻

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- Unix
Programming Language
- Python
- Python :: 3.8
Topic
- Scientific/Engineering
- Software Development

Release history Release notifications | RSS feed

0.0.24

Dec 5, 2023

0.0.23

Nov 30, 2023

0.0.22

Oct 20, 2023

0.0.21

Oct 15, 2023

0.0.19

Sep 10, 2023

0.0.18

Feb 27, 2023

0.0.17

Jan 18, 2023

0.0.16

Jan 6, 2023

0.0.15

Dec 26, 2022

0.0.14

Dec 13, 2022

0.0.13

Dec 13, 2022

0.0.12

Dec 12, 2022

0.0.11

Dec 10, 2022

0.0.2

Sep 10, 2023

0.0.1

Dec 9, 2022

This version

0.0.0

Dec 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud-select-tool-0.0.0.tar.gz (43.7 kB view hashes)

Uploaded Dec 9, 2022 Source

Hashes for cloud-select-tool-0.0.0.tar.gz

Hashes for cloud-select-tool-0.0.0.tar.gz
Algorithm	Hash digest
SHA256	`8f7ef14c72cf1888c4e01c8c7b9591bdde27cd21748c9f42520fefa6083484dc`
MD5	`548c4c39b01aafcf628c0c74ea79b8be`
BLAKE2b-256	`c2122177823c85bf577665fb907d83558ae8781893d66476bf908e9de4ef6465`