Note: SSA has been deprecated and will not run in the Intel version 16 and above compilers, and results cannot be viewed in the Inspector version 2016 and above. It can still be run in older compiler and Inspector versions.

Modern x86 processors include vector units that can operate on multiple data objects with a single instruction, otherwise known as Single Instruction, Multiple Data (or SIMD) units. These are implemented in the 128-bit Streaming SIMD Extensions (SSE) and starting with Intel's Sandy Bridge architecture, the 256-bit Advanced Vector eXtensions (AVX).

Vampir is a full featured tool suite for analyzing the performance and message passing characteristics of parallel applications. Vampir is based on run-time tracing of program events collected as OTF format files by other tools/libraries, such as VampirTrace, TAU, Score-P, Open|SpeedShop, etc.

TotalView is a sophisticated and powerful tool used for debugging and analyzing both serial and parallel programs. TotalView provides source level debugging for serial, parallel, multi-process, multi-threaded, accelerator/GPU and hybrid applications written in C/C++ and Fortran. Most HPC platforms and systems are supported. Both a graphical user interface and command line interface are provided.

TAU (Tuning and Analysis Utilities) is a comprehensive profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, and Python. It is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements. All C++ language features are supported including templates and namespaces.


PapiEx is a PAPI-based program for measuring hardware performance events of an application using the command-line. It supports both PAPI preset events and native events. It supports multiple threads of execution as well, including pthreads and OpenMP threads. For MPI programs, PapiEx can gather statistics across tasks. PapiEx also measures the total time spent in I/O and MPI calls.


The PAPI Performance Application Programming Interface provides machine and operating system independent (portable) access to hardware performance counters found on most modern processors. Any of over 100 preset events can be counted through either a simple high level programming interface or a more complete low level interface from either C or Fortran.

memP is a parallel heap profiling library based on the mpiP MPI profiling tool. The intent of memP is to identify the heap allocation that causes a task to reach its memory in use high water mark (HWM) for each task in a parallel job. Currently, memP requires that all tasks call MPI_Init and MPI_Finalize.

There are two types of memP reports:

Intel's VTune Amplifier is a performance profiling tool for C, C++, and Fortran code that can identify where in the code time is being spent in both serial and threaded applications. For threaded applications, it can also determine the amount of concurrency and identify bottlenecks created by synchronization primitives.

Intel Inspector is a code correctness tool that can identify threading and memory errors in C, C++, and Fortran codes. This tool replaces Intel's old Thread Checker tool, adding memory debugging capabilities and a standalone GUI. Inspector works by performing dynamic analysis (that is, instrumenting and analyzing execution at run time). It can detect intermittent and non-deterministic errors, even if the errors do not manifest themselves.