A package for formatting floats into scientific formatted strings.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

sciform

This package is used to format floats into scientific presentation formats. Features include fixed point, and decimal and binary scientific and engineering notations. Where possible, formatting follows documented standards such as those published by BIPM or IEC.

Floats are formatted into strings of the form

mantissa [exp_str]

Where exp_str can be of the form exp_symbol exp where exp_symbol is e,E, b, or B, and exp is an integer like +03. exp_str can also be a single SI or IEC "prefix" character like k M or Mi.

Example formatted floats are

'103.0400'
'1.03e+02'
'1.03E+03'
'1.03 k'
'1023b+10'
'1.00b+20'
'3.4 Mi'

In this document, the terminology "precision" of a string representing a float is number of digits that appear past the decimal point, e.g. 1.030400 has a precision of 6. The number of "significant figures" or "sig figs" for a string representing a float is the number of digits past the left-most non-zero digit. So 1.030400 has 7 sig figs. The string 1030 has a precision of 0 and may have 3 or 4 sig figs. When presenting a number to a certain number of sig figs, we first round the number to the digit place corresponding to that number of sig figs (based on the digit place of the most significant digit) then truncate the number to that digit. If that digit is the ones place or larger, then trailing zeros are added until the ones place.

Credits

sciform was heavily motivated by the float formatting provided in the prefixed and the value +/- uncertainty formatting in the uncertainties package.

Built-in Format Specification Mini Language

In Python, float objects can already be converted to string representations using built in formatting. For example, f'{0.00438:#.4g}' yeilds the string '0.004380'. Here the float 0.00438 has been formatted according to the format specification string '#.4g'. The rules for constructing format specification strings are specified in the format specification mini lanuage (FSML) documentation.

The built-in FSML has a few short-comings making it non-ideal for all scientific formatting tasks:

The built-in FSML lacks certain features around rounding and presenting floats based on significant figures (as opposed to precision) which makes it challenging to apply certain formatting strategies.
- It is possible to specify sig figs for formatting using the e and gbuilt in formatting modes, but it is impossible to format numbers according to a specific number of sig figs while also presenting the numbers in fixed point format. Specifically, it is impossible to coerce the string formatting to perform rounding "above the decimal point". There is no way to format 123 to yield '120'. You can use f'{123:#.4g}' to get '123.0' (4 sig figs), but if you do f'{123:#.2g}' you get '1.2e+02'. This is because built-in float formatting does not allow formatting to perform rounding/truncation above the decimal point.
- The # option is necessary to format to a specified number of sig figs in g mode (which must be used if you want any possibility of fixed-point sig fig formatting) but, this option means mantissa with no fractional part will include a trailing decimal point, e.g. f'{123:#.3g}' gives 123. which may be undesirable.
While built-in formatting does provide a means to fill a string to a certain overall width (including all non-numeric symbols), it does not provide a means to fill a string up to a certain digit place, e.g. add zeros up to the hundreds place which may sometimes be desirable.
In the sciences it is very common to display numbers in engineering notation in which the exponent is chosen so that it is an integer multiple of 3 and the mantissa is between 0.1 <= m < 1000. Built-in formatting has no functionality for this feature. See NIST Guide to the SI 7.9 for more details.
The built-in formatting has limited functionality for customizing the separation and decimal characters used to display numbers.

While these shortcomings are all minor, and can be worked around with simple helper functions, it would be most convenient if scientific float formatting could be easily accessed in-line during string formatting operations. This motivated the development of the sciform FSML.

`sciform` Format Specification Mini Language

sciform introduces a new FSML based on the built-in FSML but which avoids the shortcomings described above and includes a few additional features. These features include:

Flexible significant figure formatting
Engineering notation
Binary (base-2) exponent formatting
Flexible separator selection
Explicit exponent value specification

The sciform FSML is based on the built-in FSML, but it is not fully backwards compatible with it. For the sake of simplicity, some format specifications that are valid for the built-in FSML are invalid for the sciform FSML. Also, a valid built-in format specification may give different results when used as part of the built-in FSML compared to when used as part of the sciform FSML. These incompatibilities are captured in a section below.

The sciform format specification mini language is given by:

[fill "="][sign]["#"][fill_top_digit]
[thousands_separator][decimal_separator][thousandths_separator]
[prec_mode precision][format_mode][exp][prefix_mode]

Where the terms are described in the table below.

The sciform FSML is accessed via the sfloat object. Regular built-in floats are cast to sfloat objects which can be formatted using the sciform FSML.

from sciform import sfloat

num = sfloat(123456)
print(f'{num:_!2f}')
# 120_000

sciform Format Specification Mini Language Terms:

Format Specifier	Description
fill (`' ='`, `'0='`)	Fill characters will be padded between the most signifant digit and the sign symbol until the digit corresponding to the `fill_top_digit` is filled.
sign (`'-'`, `'+`', `' '`)	`'-'` will include a sign symbol only for negative numbers. `'+'` will include a sign symbol for all numbers. `' '` will include a minus symbol for negative numbers and a space for positive numbers. Zero is always considered to be positive.
alternate mode (`'#'`)	Alternate mode is enabled (disabled by default) if the `'#'` flag is included in the format specification. In engineering notation (`r` or `R`), the alternate mode coerces the mantissa to be `0.1 <= m < 100` rather than `1 <= m < 1000`. In binary mode (`b` or `B`), the alternate flag coerces the mantissa to be between `1 <= m < 1024` rather than `1 <= m < 2`.
fill_top_digit (`\d+`)	Any non-negative integer, default (0). Indicates the decimal or binary place to which the formatted string should be padded. e.g. `f'{sfloat(123):0=4}'` will give `00123`, i.e. padding to the `10^4` place.
thousands_separator (`'n'`, `'.'`, `','`, `'s'`, `'_'`)	Indicates the character to use as a thousands separator. `'n'` is no separator, `'s'` is a single-whitespace separator and `'.'`, `','`, and `'_'` are period, comma, and underscore separators. Note that NIST discourages the use of `','` or `'.'` as thousands seperators because they can be confused with the decimal separators depending on the locality. See NIST Guide to the SI 10.5.3.
decimal_separator (`'.'`, `','`)	Symbol to use as the decimal separator.
thousandths_separator (`'n'`, `'s'`, `'_'`)	Indicates the character to use as a thousandths separator. `'n'` is no separator, `'s'` is a single-whitespace separator and `'_'` is an underscore separators.
prec_mode (`'!'`, `'.'`)	Indicates whether the float will be rounded and displayed according to precision (digits past the decimal point) or significant figure. `'.'` indicates precision mode and `'!'` indicates significant figure mode. E.g. `f'{sfloat(123.456):.2f}'` gives `'123.46'` while `f'{sfloat(123.456):!2f}'` gives `'120'`.
prec (`-?\d+`)	Integer indicating the precision or number of significant figures to which the float shall be rounded and displayed. Can be negative for precision formatting mode. Must be greater than zero for significant figure mode. If no precision is supplied then an algorithm will be used to attempt to infer the least significant digit for the float and the precision will be chosen to match this least significant digit. This algorithm may have surprising behavior for floats with a large number (e.g. 15) of significant digits or due to the underlying binary nature of floats, e.g. `0.1+0.2 = 0.30000000000000004`.
format_mode (`'f'`, `'F'`, `'%'`, `'e'`, `'E'`, `'r'`, `'R'`, `'b'`, `'B'`)	Indicates which formatting mode should be used. In all cases the capitalization of the exponent symbol matches the capitalization of the format mode flag. -`'f'` and `'F'` indicate fixed point mode in which no exponent is used to display the number. -`'%'` mode is like fixed mode but the number is first multiplied by 100 and presented followed by a `'%'` character. -`'e'` and `'E'` indicate scientific notation in which the exponent is chosen so that the mantissa satisfies `1 <= m < 10`. -`'r'` and `'R'` indicate engineering notation in which the exponent is chosen so that the mantissa satisfies `1 <= m <= 1000`. If the alternate mode is enabled then the mantissa satisfies `0.1 <= m < 100`. In both cases the exponent is always an integer multiple of 3. -`'b'` and `'B'` indicate binary mode in which the number is presented as a mantissa and exponent in base 2. The mantissa satisfies `1 <= m < 2`. If alternate mode is enabled the mantissa satisfies `1 <= m < 1024 = 2^10`. In this case the exponent is always an integer multiple of 10.
exp (`[+-]\d+`)	Positive or negative integer that can be used to force the exponent to take a particular value. This flag is ignored in fixed format mode. If an explicit exponent is used in engineering mode or alternate binary mode which is incompatible with those modes (e.g. an exponent that is not a multiple of 3 for engineering notation), the exponent will be rounded down to the nearest compatible value.
prefix_mode (`'p'`)	Flag (default off) indicating whether exponent strings should be replaced with SI or IEC prefix characters. E.g. `'123e+03' -> 123 k` or `857.2B+20 -> 857.2 Mi`.

Prefix Mode

Prefix mode offers a simple translation between exponent strings and one or two letter prefixes. For scientific and engineering formats the prefixes are matched to integer multiple of 3 exponent according to the SI prefixes. For binary formatsthe prefixes are matched to integer multiples of 10 according to the IEC prefixes. Supported translations:

SI Prefixes:

Exponent Value	Prefix Name	Prefix
10⁺³⁰	Quetta	Q
10⁺²⁷	Ronna	R
10⁺²⁴	Yotta	Y
10⁺²¹	Zetta	Z
10⁺¹⁸	Exa	E
10⁺¹⁵	Peta	P
10⁺¹²	Tera	T
10⁺⁹	Giga	G
10⁺⁶	Mega	M
10⁺³	Kilo	k
10^-3	milli	m
10^-6	micro	µ
10^-9	nano	n
10^-12	pico	p
10^-15	femto	f
10^-18	atto	a
10^-21	zepto	z
10^-24	yocto	y
10^-27	ronto	r
10^-30	quecto	q

IEC Prefixes:

Exponent Value	Prefix Name	Prefix
2⁺⁸⁰	Ronna	Yi
2⁺⁷⁰	Yotta	Zi
2⁺⁶⁰	Zetta	Ei
2⁺⁵⁰	Exa	Pi
2⁺⁴⁰	Peta	Ti
2⁺³⁰	Tera	Gi
2⁺²⁰	Giga	Mi
2⁺¹⁰	Kibi	Ki

Examples of prefix mode are:

f'{sfloat(12.4e+06):rp}' gives '12 M'
f'{sfloat(1024*2**10):bp' gives 1 Mi

Configuration options (forthcoming)

Forthcoming features to improve ease of configuration:

Function-based (as opposed to string formatting/__format__ based) formatting.
Ability to set module or class level defaults for each FSML term to avoid repetitive, verbose format specifications.
Class-based API for storing default configurations?
Optional registration of new prefixes, notably c, d, da, and h which are recognized SI prefixes for 10^-2, 10^-1, 10⁺¹ and 10⁺² respectively.

Value + uncertainty formatting (forthcoming)

One of (if not the) most important use cases for scientific formatting is formatting a value together with its specified uncertainty, e.g. 84.3 +/- 0.2. The ability to format pairs of floats as value/uncertainty pairs will be supported by the forthcoming ufloat class.

Value/uncertainty formatting is not yet fully implemented or tested, but it will support

Selection of the exponent based on the value
Selection of the least significant digit based on a user-requested number of sig figs to display for the uncertainty.
Optional padding so that the value and uncertainty have the same width
Short form "parentheses" uncertainty display, e.g. 84.3 +/- 2= 84.3(2).

Incompatibilities With Built-in Format Specification Mini Language

The sciform FSML extends the functionality of the built-in FSML. However, sciform FSML is not entirely backwards compatible with the built-in FSML. Certain allowed built-in format specifications are illegal in sciform FSML and certain allowed built-in format specifications give different results when used with sfloat rather than float.. These incompatibilities were intentionally introduced to simplify the sciform FSML by cutting out features less likely to be required for scientific formatting.

The built-in FSML accepts g, G and n precision types These precision types are not supported by scientific formatting. These precision types offer automated formatting decisions which are not compatible with the explicit formatting options preferred by sciform. These features include
- Automated selection of fixed-point or scientific notation. For sciform, the user must explicity indicate fixed point, scientific, or engineering notation by selecting one of the f, F, e, E, r or R flags.
- Truncation of trailing zeros without the # option. For sciform, trailing zeros are never truncated if they fall within the user-selected precision or sig figs.
- Inclusion of a hanging decimal point, e.g. 123.. sciform never includes a hanging decimal point.
Python float formatting uses a pre-selected, hard-coded precion of 6 for f, F, %, e, and E modes. When no precision or sig fig specification is provided, sciform, instead, infers the precision or sig fig specification from the float by determining the least significant decimal digit required to represent it. Note that there may be surprising results for floats that require more decimals to represent than sys.float_info.dig such as 0.1 * 3.
- f'{float(0.3):f}' yield 0.300000 while f'{sfloat(0.3):f} yields 0.3.
The built-in FSML supports left-aligned, right-aligned, center-aligned, and sign-aware string padding by any character. In the built-in FSML, the width field indicates the length to which the resulting string (including all punctuation such as +, -, ., e, etc.) should be filled to. sciform takes the perspective that these padding features are mostly tasks for string formatters, not number formatters. sciform only supports padding by a space ' ' or zero. For sciform, the user specifies the digits place to which the number should be padded. For sciform, the fill character may only be ' ' or '0' and must always be followed by the sign aware = flag. There is no 0 flag that may be placed before the width field to indicate sign-aware zero padding.
- f'{float(12): =4} yields ' 12' while f{sfloat(12): =4} yeilds ' 12'. I.e. fill characters are padded up to the 10^4 digits place.
The built-in FSML supports displaying negative zero, but also supports an option to coerce negative zero to be positive by including a 'z' flag. sciform always coerces negative zero to be positive and therefore has no corresponding option to coerce negative zero to be positive.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.37.0

Mar 8, 2024

0.36.0

Mar 7, 2024

0.35.0

Feb 17, 2024

0.34.1

Feb 10, 2024

0.34.0

Feb 4, 2024

0.33.0

Jan 31, 2024

0.32.3

Jan 12, 2024

0.32.2

Jan 12, 2024

0.32.0

Jan 10, 2024

0.31.1

Jan 7, 2024

0.30.1

Nov 24, 2023

0.30.0

Nov 24, 2023

0.29.1

Oct 23, 2023

0.29.0

Sep 5, 2023

0.28.2

Aug 31, 2023

0.28.1

Aug 28, 2023

0.28.0

Aug 27, 2023

0.27.4

Aug 25, 2023

0.27.3

Aug 24, 2023

0.27.2

Aug 20, 2023

0.27.1

Aug 19, 2023

0.27.0

Aug 18, 2023

0.26.2

Aug 18, 2023

0.26.1

Aug 18, 2023

0.26.0

Aug 15, 2023

0.25.2

Aug 11, 2023

0.25.1

Aug 11, 2023

0.25.0

Aug 3, 2023

0.24.0

Jul 30, 2023

0.23.0

Jul 30, 2023

0.22.2

Jul 30, 2023

0.22.1

Jul 27, 2023

0.20.1

Jun 24, 2023

0.20.0

Jun 22, 2023

0.19.0

Jun 22, 2023

0.18.1

Jun 22, 2023

0.18.0

Jun 20, 2023

0.17.1

Jun 19, 2023

0.17.0

Jun 19, 2023

0.16.0

Jun 18, 2023

0.15.2

Jun 18, 2023

0.15.1

Jun 18, 2023

0.15.0

Jun 18, 2023

0.14.0

Jun 18, 2023

0.13.2

Jun 18, 2023

0.13.1

Jun 17, 2023

0.13.0

Jun 17, 2023

0.12.0

Jun 7, 2023

0.11.1

Jun 6, 2023

0.11.0

Jun 6, 2023

0.10.0

Jun 3, 2023

0.7.0

May 28, 2023

0.6.1

May 27, 2023

0.6.0

May 27, 2023

0.4.1

May 27, 2023

This version

0.4.0

May 27, 2023

0.3.0

May 26, 2023

0.2.1

May 26, 2023

0.2.0

May 26, 2023

0.1.0

May 26, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciform-0.4.0.tar.gz (24.3 kB view hashes)

Uploaded May 27, 2023 Source

Built Distribution

sciform-0.4.0-py3-none-any.whl (17.1 kB view hashes)

Uploaded May 27, 2023 Python 3

Hashes for sciform-0.4.0.tar.gz

Hashes for sciform-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`98802f5c4aed96f275661bcdba2a2d831a5f0974c85abf82c3c1b673f424a473`
MD5	`fb8c74fe0074b24edbfa643ccc8b876f`
BLAKE2b-256	`25d3e3067b29d3e70e2348d347a9999e62d29b0d4a94107125be48267c376d67`

Hashes for sciform-0.4.0-py3-none-any.whl

Hashes for sciform-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`666acfac9767b8c4087c4a3cdb44f9c575c9fae61215367db1ab4ea563c2026c`
MD5	`db8bbc47233172b0c1f71c008d5c0ae2`
BLAKE2b-256	`a22d5db7454b616e6dfabfc2aaf310d68d68340258aed8cf41254062d51ed206`

sciform 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

sciform

Credits

Built-in Format Specification Mini Language

`sciform` Format Specification Mini Language

Prefix Mode

Configuration options (forthcoming)

Value + uncertainty formatting (forthcoming)

Incompatibilities With Built-in Format Specification Mini Language

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

sciform 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

sciform

Credits

Built-in Format Specification Mini Language

sciform Format Specification Mini Language

Prefix Mode

Configuration options (forthcoming)

Value + uncertainty formatting (forthcoming)

Incompatibilities With Built-in Format Specification Mini Language

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`sciform` Format Specification Mini Language