skip to navigation
skip to content

basicanalysis 0.2b1

Extract and project fundamental factors in MPI applications.

Basic Analysis and Projection of Fundamental Factors

- Point local variable AUTOMATIC_ANALYSIS to this folder (more information in
- Dimemas installation (scripts has been evaluated with version 5.2.5)
- Python 2.7.3 (other versions had not been evaluated yet)
- The following python modules:
- numpy 1.6.4
- scipy 0.11.1 (or greater)
- lmfit 0.7.2 (or greater) --
numpy and scipy can be installed from the package manager.
To install lmfit, decompress the file included here, and as a root user type:

# python install

- Traces to be analyzed (some example traces are included in nekbone_bgq)

NOTE: To verify the versions of the modules installed, you can run

> python basicanalysis/share/install/

1. Extracting model factors (Load Balance, Serialization, Transfer, Parallel Efficiency):

To obtain a summary with information for performance factors:

$ -i indat.cfg -sim {time|cycles} -lat {latency} -bw {bandwitdh}
-sc {strong|weak} -phase <name_defined_by_user>
-t <list_of_traces>.prv

Parameters can also be fed by only passing the indat.cfg (there's an example in the
example folder inside this directory named indat_modelfactors.cfg).

For example:

$ -i indat.cfg -sim time -sc weak -phase nekbone_example
-t nekbone_bgq/*.prv

Extracts the performance factors from all the traces included in nekbone_bgq.
Traces were obtained using a weak scaling approach. That is the main reason of
choosing 'weak or strong' when running the script. Results are shown in a
model_factors_<name_defined_by_user>.csv file and a gnuplot file.

To see the resulting graph:

$ gnuplot model_factors_timeBased_nekbone_example.gnuplot

1.1 Current available graphs:
- Fundamental Factors: Serialization, Transfer, Load Balance and Parallel
- Speedup: Specific of weak or strong scaling executions.
- Point-to-Point Communications: some metrics about P2P communications in the
traces. Zero if there are any.
- Collective Communications: some metrics about bytes sends and calls performed
at collective level (Allreduce, Bcast, etc)
- Load Balances: Instruction, IPC, and Time Load Imbalances.
- Instruction rate vs. IPC: observed instruction rate and IPC, it also depends
on the type of scaling (strong or weak).
- Cycles per microsecons: Observed cycles per usec per evaluated point, useful to
identify changes among the processes.
- Elapsed time: execution time of each trace.
- Other Efficiency Factors: A summary of cycles per usec, load imbalance at instruction
or IPC level. Helpful for sanity check.

2. Projection of performance factors based on the knowledge of the application:

To extrapolate the collected performance factors (from a very small number of core counts
to larger core counts), user must indicate the appropiate fitting model to each one of
the factors.

From the measured values, Serialization and Transfer can be extrapolated based on an
Amdahl's Law-based model or on a Pipeline-based fitting model, under this form:

Amdalh_fit = elem_0 / (f_elem - (1-f_elem) * P)

Pipeline_fit = (elem_0 * P) / ((1-f_elem) + f_elem*(2*P-1))

*** elem_0 and f_elem are estimated using the least squares method over the collected
measurements, and P the is number of processes used.

In this version, it has been included the option to fit Serialization and Transfer using a
logaritmical function, under the form:
Logaritmical fit = f_elem * log(procs) + elem_0 (logarithm base 10)

While Load Balance supports Amdahl's-based fitting, it also supports the use of constants:
the minimum (min) --or worst value from collected measurements--, the average (avg) value,
and the same logarithmical function described above (log).

In addition, the efficiency loss may not be directly influenced by the number of processes.
Therefore, it has been considered to implement several scenarios for the evolution of
efficiency. As a default value, applications are expected to reduce their efficiency while
increasing the number of processes, thus indicating a linear relation between efficiency
and processes.

In some parallel applications, it can be observed that processes may not interact with
all their partners. They may exchange data with, lets say a number of processes close
to the cubic root of the total number of processes, or following a logaritmical function
of base 2.

Is for this reason, there are 3 parameters (linear, cubic, and log) that may alter
the interpretation of the number of processes for each one of the fundamental factors.

Therefore, the framework is called as following:

$ -P_ser {linear|cubic|log}
-ser_fit <serialization>{amdahl|pipeline|log}
-P_trf {linear|cubic|log} -trf_fit <transfer>{amdahl|pipeline|log}
-P_lb {linear|cubic|log} -lb_fit <loadbalance> {min|avg|amdahl|log}
-f <name_of_csv_file>.csv

From previous example:

$ -f model_factors_timeBased_nekbone_example.csv


$ -P_ser linear -ser_fit pipeline -lb_fit min
-f model_factors_timeBased_nekbone_example.csv

Generates the extrapolation of performance factors, with the comparison between
measurements and projected variables using amdahl's model to fit only Transfer
(Serialization is fitted with the pipeline model, and Load Balance using the minimum
value measured as a constant. All is summarized in three gnuplot files and one.csv,
the last is generated to facilitate porting data to a spreadsheet.

For this example the number of processes has been considered linear. The cubic
option was implemented for applications where the total data is distributed among
the processes under a cubic shape (e.g. HACC from Coral Benchmark has this
characteristic); where processes mainly interact with only a reduced group
of the total number of processes. has by default the values of linear for the number of
processes, and Amdahl's model to fit all performance factors. To change these
values the framework can be called using pipeline or min instead of amdahl as
parameter (if the performance factors has this option for fitting). For example:

$ -i indat.cfg -P_ser linear -ser_fit pipeline
-ser_trf pipeline -lb_trf min
-f model_factors_timeBased_nekbone_example.csv

Parameters can also be fed by only passing the indat_projection.cfg (there's a
copy in the example folder inside this directory).
New fitting modules and additional enhancements are still under development.

Any further questions or doubts, please contact:
File Type Py Version Uploaded on Size
basicanalysis-0.2b1.tar.gz (md5) Source 2015-04-09 9MB