osaca

Open Source Architecture Code Analyzer

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

OSACA

Open Source Architecture Code Analyzer

This tool allows automatic instruction fetching of assembly code, auto-generating of testcases for assembly instructions creating latency and throughput benchmarks on a specific instruction form and throughput analysis and throughput prediction for a innermost loop kernel.

https://travis-ci.com/RRZE-HPC/OSACA.svg?token=393L6z2HEXNiGLtZ43s6&branch=master

https://codecov.io/github/RRZE-HPC/OSACA/coverage.svg?branch=master

https://img.shields.io/badge/code%20style-black-000000.svg

Getting started

Installation

On most systems with python pip and setuputils installed, just run:

pip install --user osaca

for the latest release.

To build OSACA from source, clone this repository using git clone https://github.com/RRZE-HPC/OSACA and run in the root directory:

python ./setup.py install

After installation, OSACA can be started with the command osaca in the CLI.

Dependencies:

Additional requirements are:

Python3
Graphviz for dependency graph creation (minimal dependency is libgraphviz-dev on Ubuntu)
Kerncraft for marker insertion
ibench for throughput/latency measurements

Design

A schematic design of OSACA’s workflow is shown below:

Usage

The usage of OSACA can be listed as:

osaca [-h] [-V] [--arch ARCH] [--export-graph GRAPHNAME] FILEPATH

-h, --help: prints out the help message.
-V, --version: shows the program’s version number.
--arch ARCH: needs to be replaced with the wished architecture abbreviation. This flag is necessary for the throughput analysis (default function) and the inclusion of an ibench output (-i). Possible options are SNB, IVB, HSW, BDW, SKX and CSX for the latest Intel micro architectures starting from Intel Sandy Bridge and ZEN1 for AMD Zen (17h family) architecture. Furthermore, VULCAN for Marvell`s ARM-based ThunderX2 architecture is available.
--insert-marker: OSACA calls the Kerncraft module for the interactively insertion of IACA marker in suggested assembly blocks.
--db-check: Run a sanity check on the by “–arch” specified database. The output depends on the verbosity level. Keep in mind you have to provide a (dummy) filename in anyway.
--export-graph EXPORT_PATH: Output path for .dot file export. If “.” is given, the file will be stored as “./osaca_dg.dot”. After the file was created, you can convert it to a PDF file using dot: dot -Tpdf osaca_dg.dot -o osaca_dependency_graph.pdf

The FILEPATH describes the filepath to the file to work with and is always necessary

Hereinafter OSACA’s scope of function will be described.

Throughput & Latency analysis

As main functionality of OSACA this process starts by default. It is always necessary to specify the core architecture by the flag --arch ARCH, where ARCH can stand for SNB, IVB, HSW, BDW, SKX, CSX, ZEN or VULCAN.

For extracting the right kernel, one has to mark it beforehand. Currently, only the detechtion of markers in the assembly code and therefore the analysis of assemly files is supported by OSACA.

Assembly code

Marking a kernel means to insert the byte markers in the assembly file in before and after the loop. For this, the start marker has to be inserted right in front of the loop label and the end marker directly after the jump instruction. For the convience of the user, in x86 assembly IACA byte markers are used.

x86 Byte Markers

movl    $111,%ebx       #IACA/OSACA START MARKER
.byte   100,103,144     #IACA/OSACA START MARKER
Loop:
  # ...
movl    $222,%ebx       #IACA/OSACA END MARKER
.byte   100,103,144     #IACA/OSACA END MARKER

AArch64 Byte Markers

mov x1, #111            // OSACA START
.byte 213,3,32,31       // OSACA START
  \\ ...
mov x1, #222            // OSACA END
.byte 213,3,32,31       // OSACA END

Insert IACA markers

Using the --insert-marker flags for a given file, OSACA calls the implemented Kerncraft module for identifying and marking the inner-loop block in manual mode. More information about how this is done can be found in the Kerncraft repository. Note that this currrently only works for x86 loop kernels

Example

For clarifying the functionality of OSACA a sample kernel is analyzed for an Intel CSX core hereafter:

double a[N], double b[N];
double s;

// loop
for(int i = 0; i < N; ++i)
    a[i] = s * b[i];

The code shows a simple scalar multiplication of a vector b and a floating-point number s. The result is written in vector a. After including the OSACA byte marker into the assembly, one can start the analysis typing

osaca --arch CSX PATH/TO/FILE

in the command line.

The output is:

Open Source Architecture Code Analyzer (OSACA) - v0.3
Analyzed file:      scale.s.csx.O3.s
Architecture:       csx
Timestamp:          2019-10-03 23:36:21

 P - Throughput of LOAD operation can be hidden behind a past or future STORE instruction
 * - Instruction micro-ops not bound to a port
 X - No throughput/latency information for this instruction in data file


Throughput Analysis Report
--------------------------
                              Port pressure in cycles
     |  0   - 0DV  |  1   |  2   -  2D  |  3   -  3D  |  4   |  5   |  6   |  7   |
-----------------------------------------------------------------------------------
 170 |             |      |             |             |      |      |      |      |   .L22:
 171 | 0.50        | 0.50 | 0.50   0.50 | 0.50   0.50 |      |      |      |      |   vmulpd        (%r12,%rax), %ymm1, %ymm0
 172 |             |      | 0.50        | 0.50        | 1.00 |      |      |      |   vmovapd       %ymm0, 0(%r13,%rax)
 173 | 0.25        | 0.25 |             |             |      | 0.25 | 0.25 |      |   addq  $32, %rax
 174 | 0.25        | 0.25 |             |             |      | 0.25 | 0.25 |      |   cmpq  %rax, %r14
 175 |             |      |             |             |      |      |      |      | * jne   .L22

       1.00          1.00   1.00   0.50   1.00   0.50   1.00   0.50   0.50


Latency Analysis Report
-----------------------
 171 |  8.0 | | vmulpd      (%r12,%rax), %ymm1, %ymm0
 172 |  5.0 | | vmovapd     %ymm0, 0(%r13,%rax)

       13.0


Loop-Carried Dependencies Analysis Report
-----------------------------------------
173 |  1.0 | addq   $32, %rax                      | [173]

It shows the whole kernel together with the average port pressure of each instruction form and the overall port binding. Furthermore, the critical path of the loop kernel and all loop-carried dependencies, each with a list of line numbers being part of this dependency chain on the right.

Credits

Implementation: Jan Laukemann

License

AGPL-3.0

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.3

Dec 12, 2023

0.5.2

Aug 18, 2023

0.5.1

Aug 2, 2023

0.5.0

Mar 24, 2023

0.4.13

Feb 15, 2023

0.4.12

Oct 11, 2022

0.4.11

Sep 28, 2022

0.4.10

Sep 8, 2022

0.4.9

Aug 29, 2022

0.4.8

Apr 8, 2022

0.4.7

Nov 4, 2021

0.4.6

Oct 7, 2021

0.4.5

Jul 21, 2021

0.4.4

May 31, 2021

0.4.3

May 10, 2021

0.4.2

May 5, 2021

0.4.1

Apr 19, 2021

0.4.0

Apr 15, 2021

0.3.14

Dec 11, 2020

0.3.13

Nov 23, 2020

0.3.12

Nov 11, 2020

0.3.11

Nov 6, 2020

0.3.10

Nov 2, 2020

0.3.9

Oct 29, 2020

0.3.8

Oct 20, 2020

0.3.7

Oct 20, 2020

0.3.6

Aug 5, 2020

0.3.4

Aug 3, 2020

0.3.3.dev0 pre-release

Mar 16, 2020

0.3.2

Mar 10, 2020

0.3.2.dev5 pre-release

Jan 31, 2020

0.3.2.dev4 pre-release

Jan 28, 2020

0.3.2.dev3 pre-release

Jan 22, 2020

0.3.2.dev2 pre-release

Jan 8, 2020

0.3.2.dev1 pre-release

Dec 16, 2019

0.3.1

Nov 18, 2019

This version

0.3.1.dev1 pre-release

Oct 16, 2019

0.3.1.dev0 pre-release

Oct 4, 2019

0.3.0.dev0 pre-release

Sep 27, 2019

0.2.2

May 16, 2019

0.2.1

Jan 10, 2019

0.2.0

Sep 3, 2018

0.1

Jan 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osaca-0.3.1.dev1.tar.gz (62.7 kB view hashes)

Uploaded Oct 16, 2019 Source

Hashes for osaca-0.3.1.dev1.tar.gz

Hashes for osaca-0.3.1.dev1.tar.gz
Algorithm	Hash digest
SHA256	`ee21bf1eafce1094e7b63280d1bb7285f743efe6504122fe2217e4591323b5ec`
MD5	`42c1d74ea88066e866f83f116021f611`
BLAKE2b-256	`2765661f5d3885487ddc8c79c03a098a6cb4a440c0dd2b2f287117888acfff15`