unicode_ids

Enable Sphinx to generate non-ASCII identifiers

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
- Web Environment
Framework
- Sphinx
- Sphinx :: Extension
Intended Audience
- Developers
- Education
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic

Project description

Introduction

Currently, June 3, 2015, Sphinx 1.3.1 generates ‘id’ attributes in HTML without Non-ASCII characters. This behavior is requirement of HTML4.01.

But today, these characters can be usable with latest web browsers and HTML5 specification.

Also, the Sphinx replaces non-ASCII URLs into ASCII only with imcompatible way against current web standards.

This extension fixes both problems above at every runtime. The patchs is applied to both docutils and Sphinx to make ideal behavior.

License

2-clause BSD, same as the Sphinx project.

Installation

You can install or uninstall this package like another Python packages. Also, you can use this package without installing this package on your Python systems, the configuration file of Sphinx(conf.py) enable you to use.

System requirements

Tested with 32bit version of Python 2.7.10 and 64bit version of 3.4.3, both on the Microsoft Windows 8.1 Pro 64bit edition. But with another versions and on another OSs would be usable.

Python 3 is required if you need full unicode support. When used with Python 2, the usable character set is limited with local encoding.

There’s a thing important to know: This extension depends on both docutils 0.12 and Sphinx 1.3.1. The patching is usable UNTIL the some functions keep same as ones on these versions.

How to install

You can install this package as you will do with another one.

Open a console and do pip install unicode_ids.

On the MS-Windows, <python_installed_path>\Scripts\pip.exe install unicode_ids.
Or when you get zip archive like unicode_ids-2.0.5(.zip) where ‘2.0.5’ is version number, change current directiory to the folder that has the zip file, and do pip install unicode_ids-2.0.5.zip.

On the MS-Windows, <python_installed_path>\Scripts\pip.exe install unicode_ids-2.0.5.zip.
Or, this way is the Sphinx specific, you can use this package just extracted any folder you want. the conf.py enables you to use the themes and extensions.

How to use

As another extensions, you can use this extension by editing conf.py.

First, you should add:

# add 3 lines below
import distutils.sysconfig
site_package_path = distutils.sysconfig.get_python_lib()
sys.path.insert(0, os.path.join(site_package_path, 'sphinxcontrib/unicode_ids'))

Or, when you don’t install with pip or like,

# add just 1 line below
sys.path.insert(0, '<path_to_the_folder_contains_unicode_ids_py>')

Next, add unicode_ids extension into extension list:

extension = ['unicode_ids', ] # Of course you can add another extensions.

How to know Unicode is acceptable with identifiers

This section is written at 2015-06-03(JST, UTC+9).

URI general with HTML

HTML4.01 [HTML401] restricts usable characters A-Za-z0-9_:. and hypen -. But also recommends how to do with B.2.1 Non-ASCII characters in URI attribute values.

By transforming as told at the section, the URIs consist of ASCII 7 bit characters only. In the HTML4.01, we should always UTF-8 to encode/decode URIs, but also be noted some old documents may expect another local encoding to encode/decode.

HTML5 [HTML5] [HTML51] specification has the section 2.5 URLs. The section shows more complex way to determine the encodings. When the URL is given with local encodings or source documents are encoded with local encoding, we should use that one instead of UTF-8.

Considering both specifications, we should always make HTML files with UTF-8 encoded, to make clear percent hexadecimal arrays represent unicode string transformed by UTF-8.

There’re another standards, W3C URL [W3CURL] and WHATWG URL Living Standard [WHATWGURL] . They also defines URL code units, URL code points and percent-encoded bytes. They say the percent-encoded bytes should represent UTF-8 sequences.

Identifiers (anchors) on HTML

HTML5 defines ‘id’ attribute(see 3.2.5.1 The id attribute) as the unique identifier.

In the explanation of the word ‘DOM’ described the 2.2.2 Dependencies section, you can know ‘The concept of an element's unique identifier (ID)’ is one of the ‘features are defined in the DOM specification’.

In DOM4 [DOM4] , 5.8 Interface Element defines the ‘id’ attribute as DOMString and the specification says the Elements can have an associated unique identifier (ID).

As described at 9 Historical/9.2 DOM Core, DOMString is now ‘defined in Web IDL’.

With W3C WebIDL [WebIDL] at 3.10.15 DOMString section, the DOMString is defined as a sequence of code units. The code unit is also defined on the WebIDL as a 16 bit unsigned integer, and is corresponding to UTF-16 encoding.

As shown, we can know the IDs of the HTML elements can be written with unicode characters. That can be considered UTF-16 encoded internally. Note that current CSS3 does not allow starting with digits, two hyphens or a hyphen followed by a digit(see next section).

Note that DOM3 defines DOMString at DOM3CORE [DOM3CORE], see the section 1.2.1 The DOMString Type.

Identifiers on CSS

Cascading Style Sheet(CSS) is now level 3. Starts from CSS3, the stability is defined module by module which are defined CSS 2.1.(see the 1.1 Introduction section of CSS Snapshot 2010 [CSSSnapshot] .

On CSS2.1 [CSS21] [CSS22] 4.1.3 Characters and case section shows the set of the characters we can use to define identifiers. The 2nd paragraph says:

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit …(snip)

As shown above, we can use Non-ASCII characters for identifiers. ISO 10646 is almost same with Unicode. And currently, CSS3 seems to use same definition for the identifiers.

Identifiers on JavaScript/ECMAScript

ECMAScript [ECMAScript] is the name of global standard of JavaScript, roughly to say :)

In the specification of the ECMAScript, the section 7.6 Identifier Names and Identifiers shows usable characters for identifiers.

The section clearly allows use Unicode characters. It seems some character group are not able to use, but in fact, the rule contains ‘Unicode escape sequence’. This means finally any character we can use.

Author

Suzumizaki-Kimitaka, 2011-2015

History

2.0.5(2015-07-04):

Extracted alone from Yogosyu extension.

First uploaded to PyPI.

2013-12-07:

Add Python 3 support.

2013-12-06:

updated to meet Sphinx 1.2.

2011-05-24:

First release. Included in Yogosyu extension.

References

[HTML401]

HTML 4.01, 1999-12-24REC

[HTML5]

HTML 5, 2014-10-28REC

[HTML51]

HTML 5.1, 2015-05-06WD

[W3CURL]

W3C URL, 2015-12-09WD

[WHATWGURL]

WHATWG URL Living Standard

[DOM4]

W3C DOM 4, 2015-04-28LC

[WebIDL]

(W3C) WebIDL, 2012-04-19CR

[DOM3CORE]

DOM Level 3 Core, 2004-04-07REC

[CSSSnapshot]

CSS Snapshot 2010, 2011-05-12NOTE

[CSS21]

CSS 2.1, 2011-06-07REC

[CSS22]

CSS 2.2, 2015-05-28WD(only permalink is broken)

[ECMAScript]

ECMAScript 5.1

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
- Web Environment
Framework
- Sphinx
- Sphinx :: Extension
Intended Audience
- Developers
- Education
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic

Release history Release notifications | RSS feed

This version

2.0.5

Jul 4, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_ids-2.0.5.zip (313.0 kB view hashes)

Uploaded Jul 4, 2015 Source

Built Distribution

unicode_ids-2.0.5-py2.py3-none-any.whl (246.0 kB view hashes)

Uploaded Jul 4, 2015 Python 2 Python 3

Hashes for unicode_ids-2.0.5.zip

Hashes for unicode_ids-2.0.5.zip
Algorithm	Hash digest
SHA256	`45f7392baeaf55e448603b91ceb940509745e9d892422e5ec10b9f48fa155b6b`
MD5	`0082c60418a84d7326d81711fcce5bde`
BLAKE2b-256	`7cd5385a85a3d4976b4d978365c3fdf56d4c590858b829fbf135d8ed0bbef818`

Hashes for unicode_ids-2.0.5-py2.py3-none-any.whl

Hashes for unicode_ids-2.0.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`385dec8bd853d4c45f168a7f07249fe70fa36e835cce66478ec71c9dc1277797`
MD5	`264182fb22cfefcd18f34e62e033bba4`
BLAKE2b-256	`74aabd89f9e15b3077a4aeaa414bdbb5a6012b53609eef3cb7a8bc447c07c1bf`

unicode_ids 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Introduction

License

Installation

System requirements

How to install

How to use

How to know Unicode is acceptable with identifiers

URI general with HTML

Identifiers (anchors) on HTML

Identifiers on CSS

Identifiers on JavaScript/ECMAScript

Author

History

References

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

unicode_ids 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Introduction

License

Installation

System requirements

How to install

How to use

How to know Unicode is acceptable with identifiers

URI general with HTML

Identifiers (anchors) on HTML

Identifiers on CSS

Identifiers on JavaScript/ECMAScript

Related products

Author

History

References

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution