Survey™

The Survey collector/analytics framework is a new generation, high-level, light-weight tool for HPC application performance metric collection

Product Overview

Survey is a broad collection and reporting tool with less impact than the more in-depth performance tools. Survey is a multi-platform Linux tool which targets collection of high-level performance metrics and analysis of applications running on both single node and large-scale platforms, including Cray platforms.

The data collected can serve as input to your existing analysis tools and dashboards covering many use cases.

A product demo and free trial of the Survey tool can be arranged by contacting us.

Use Cases

  • Utilize Survey to support application development efforts and help understand resource capabilities of computer architectures and software environments. Integrate into system and performance studies and planning process for system procurements.

  • Integrate Survey into your development framework (e.g. gitlabCI) to provide a continuous measure of performance impacts as development progresses in real time. If you are developing for multiple architectures, compilers, etc., Survey can be used to proactively alert you to potential issues that may impact performance.

  • Integrate Survey into periodic system test suites that ensure that the system is healthy. Survey collects performance and system metadata that can be used to track and monitor identified performance levels and surface potential outliers.

survey - Key Features

  • The Survey collector is designed to work on sequential, MPI, OpenMP, and hybrid codes and directly leverages several interfaces available for tools inside current MPI implementations including: MPICH, MVAPICH, MPT, and OpenMPI. It also supports multiple architectures and has been tested on machines based on Intel, AMD, ARM, and IBM P8/9 processors and integrated GPUs.

    • Is very lightweight with target goal of 1% overhead

    • Gathers multiple application performance metrics in one run

    • Gathers job metadata that includes job, hardware, and system

    • Gives a high-level performance overview (no mapping back to the source)

    • Identify potential areas that you may want to use a more detailed tool such as Open|SpeedShop, HPCToolkit, etc.

    • Creates data files for application metrics (.csv and .json) and metadata (json) for ingestion to local analysis frameworks

    • Pull specific metrics and metadata using our extractor

    • All raw per-thread of execution csv files are available after run (.dir that contains per thread files)

    • min, max, average output across threads of execution (including top-down for Intel)

  • executable

    • cmd line

    • linked libraries

    • launch - start/end

    • # threads/ranks

    hardware

    • cpu

    • memory

    • file systems

    • HW components

    system

    • operating system

    • resource limits

    • environmental variables

    • slurm info (RM)

    • Aggregate Metrics of all MPI ranks and OMP thread

    • Memory information (e.g. high water mark, memory allocation and free calls, allocation sizes)

    • Hardware counter information and derived metrics

    • Input/output information, I/O time, read/write times and byte counts

    • MPI information, MPI time, and percent across the threads of execution

    • OpenMP information, serial time, and time spent in OpenMP regions

  • The Survey framework has the capability to add external collection tools to build a data store that can cover additional aspects of the machine and environment (e.g. NVIDIA-smi integration). An application output collection process is also being integrated to collect and monitor application specific data.

  • Survey provides capability to extract from your collected data and has reporting capability for comparison across collection sets.