head 1.1; branch 1.1.1; access; symbols netbsd-11-0-RC4:1.1.1.5 netbsd-11-0-RC3:1.1.1.5 netbsd-11-0-RC2:1.1.1.5 netbsd-11-0-RC1:1.1.1.5 gcc-14-3-0:1.1.1.5 perseant-exfatfs-base-20250801:1.1.1.5 netbsd-11:1.1.1.5.0.10 netbsd-11-base:1.1.1.5 gcc-12-5-0:1.1.1.5 netbsd-10-1-RELEASE:1.1.1.5 perseant-exfatfs-base-20240630:1.1.1.5 gcc-12-4-0:1.1.1.5 perseant-exfatfs:1.1.1.5.0.8 perseant-exfatfs-base:1.1.1.5 netbsd-8-3-RELEASE:1.1.1.2 netbsd-9-4-RELEASE:1.1.1.4 netbsd-10-0-RELEASE:1.1.1.5 netbsd-10-0-RC6:1.1.1.5 netbsd-10-0-RC5:1.1.1.5 netbsd-10-0-RC4:1.1.1.5 netbsd-10-0-RC3:1.1.1.5 netbsd-10-0-RC2:1.1.1.5 netbsd-10-0-RC1:1.1.1.5 gcc-12-3-0:1.1.1.5 gcc-10-5-0:1.1.1.5 netbsd-10:1.1.1.5.0.6 netbsd-10-base:1.1.1.5 netbsd-9-3-RELEASE:1.1.1.4 gcc-10-4-0:1.1.1.5 cjep_sun2x-base1:1.1.1.5 cjep_sun2x:1.1.1.5.0.4 cjep_sun2x-base:1.1.1.5 cjep_staticlib_x-base1:1.1.1.5 netbsd-9-2-RELEASE:1.1.1.4 cjep_staticlib_x:1.1.1.5.0.2 cjep_staticlib_x-base:1.1.1.5 gcc-10-3-0:1.1.1.5 netbsd-9-1-RELEASE:1.1.1.4 gcc-9-3-0:1.1.1.5 gcc-7-5-0:1.1.1.4 phil-wifi-20200421:1.1.1.4 phil-wifi-20200411:1.1.1.4 is-mlppp:1.1.1.4.0.4 is-mlppp-base:1.1.1.4 phil-wifi-20200406:1.1.1.4 netbsd-8-2-RELEASE:1.1.1.2 gcc-8-4-0:1.1.1.4 netbsd-9-0-RELEASE:1.1.1.4 netbsd-9-0-RC2:1.1.1.4 netbsd-9-0-RC1:1.1.1.4 phil-wifi-20191119:1.1.1.4 gcc-8-3-0:1.1.1.4 netbsd-9:1.1.1.4.0.2 netbsd-9-base:1.1.1.4 phil-wifi-20190609:1.1.1.4 netbsd-8-1-RELEASE:1.1.1.2 netbsd-8-1-RC1:1.1.1.2 pgoyette-compat-merge-20190127:1.1.1.2.14.2 pgoyette-compat-20190127:1.1.1.4 gcc-7-4-0:1.1.1.4 pgoyette-compat-20190118:1.1.1.3 pgoyette-compat-1226:1.1.1.3 pgoyette-compat-1126:1.1.1.3 gcc-6-5-0:1.1.1.3 pgoyette-compat-1020:1.1.1.2 pgoyette-compat-0930:1.1.1.2 pgoyette-compat-0906:1.1.1.2 netbsd-7-2-RELEASE:1.1.1.1 pgoyette-compat-0728:1.1.1.2 netbsd-8-0-RELEASE:1.1.1.2 phil-wifi:1.1.1.2.0.16 phil-wifi-base:1.1.1.2 pgoyette-compat-0625:1.1.1.2 netbsd-8-0-RC2:1.1.1.2 pgoyette-compat-0521:1.1.1.2 pgoyette-compat-0502:1.1.1.2 pgoyette-compat-0422:1.1.1.2 netbsd-8-0-RC1:1.1.1.2 pgoyette-compat-0415:1.1.1.2 pgoyette-compat-0407:1.1.1.2 pgoyette-compat-0330:1.1.1.2 pgoyette-compat-0322:1.1.1.2 pgoyette-compat-0315:1.1.1.2 netbsd-7-1-2-RELEASE:1.1.1.1 pgoyette-compat:1.1.1.2.0.14 pgoyette-compat-base:1.1.1.2 gcc-6-4-0:1.1.1.2 netbsd-7-1-1-RELEASE:1.1.1.1 gcc-5-5-0:1.1.1.2 matt-nb8-mediatek:1.1.1.2.0.12 matt-nb8-mediatek-base:1.1.1.2 perseant-stdc-iso10646:1.1.1.2.0.10 perseant-stdc-iso10646-base:1.1.1.2 netbsd-8:1.1.1.2.0.8 netbsd-8-base:1.1.1.2 prg-localcount2-base3:1.1.1.2 prg-localcount2-base2:1.1.1.2 prg-localcount2-base1:1.1.1.2 prg-localcount2:1.1.1.2.0.6 prg-localcount2-base:1.1.1.2 pgoyette-localcount-20170426:1.1.1.2 bouyer-socketcan-base1:1.1.1.2 pgoyette-localcount-20170320:1.1.1.2 netbsd-7-1:1.1.1.1.0.14 netbsd-7-1-RELEASE:1.1.1.1 netbsd-7-1-RC2:1.1.1.1 netbsd-7-nhusb-base-20170116:1.1.1.1 bouyer-socketcan:1.1.1.2.0.4 bouyer-socketcan-base:1.1.1.2 pgoyette-localcount-20170107:1.1.1.2 netbsd-7-1-RC1:1.1.1.1 pgoyette-localcount-20161104:1.1.1.2 netbsd-7-0-2-RELEASE:1.1.1.1 localcount-20160914:1.1.1.2 netbsd-7-nhusb:1.1.1.1.0.12 netbsd-7-nhusb-base:1.1.1.1 pgoyette-localcount-20160806:1.1.1.2 pgoyette-localcount-20160726:1.1.1.2 pgoyette-localcount:1.1.1.2.0.2 pgoyette-localcount-base:1.1.1.2 gcc-5-4-0:1.1.1.2 netbsd-7-0-1-RELEASE:1.1.1.1 gcc-5-3-0:1.1.1.2 netbsd-7-0:1.1.1.1.0.10 netbsd-7-0-RELEASE:1.1.1.1 gcc-4-8-5-pre-gcc-old-import:1.1.1.1 netbsd-7-0-RC3:1.1.1.1 netbsd-7-0-RC2:1.1.1.1 post-gcc-4-8-5-merge:1.1.1.1 gcc-4-8-5:1.1.1.1 netbsd-7-0-RC1:1.1.1.1 gcc-4-8-4:1.1.1.1 gcc-4-8-20141009:1.1.1.1 tls-maxphys-base:1.1.1.1 tls-maxphys:1.1.1.1.0.8 netbsd-7:1.1.1.1.0.6 netbsd-7-base:1.1.1.1 gcc-4-8-3:1.1.1.1 yamt-pagecache:1.1.1.1.0.4 yamt-pagecache-base9:1.1.1.1 tls-earlyentropy:1.1.1.1.0.2 tls-earlyentropy-base:1.1.1.1 riastradh-xf86-video-intel-2-7-1-pre-2-21-15:1.1.1.1 riastradh-drm2-base3:1.1.1.1 gcc-4-8-3-pre-r208254:1.1.1.1 gcc-4-8-3-pre-r206687:1.1.1.1 FSF:1.1.1; locks; strict; comment @# @; 1.1 date 2014.03.01.08.41.30; author mrg; state Exp; branches 1.1.1.1; next ; commitid TtaB91QNTknAoYqx; 1.1.1.1 date 2014.03.01.08.41.30; author mrg; state Exp; branches 1.1.1.1.4.1 1.1.1.1.8.1; next 1.1.1.2; commitid TtaB91QNTknAoYqx; 1.1.1.2 date 2016.01.24.06.05.43; author mrg; state Exp; branches 1.1.1.2.14.1 1.1.1.2.16.1; next 1.1.1.3; commitid uWWfbLp08zOK79Sy; 1.1.1.3 date 2018.11.04.00.12.37; author mrg; state Exp; branches; next 1.1.1.4; commitid bulspy67pMB6EyYA; 1.1.1.4 date 2019.01.19.10.14.11; author mrg; state Exp; branches; next 1.1.1.5; commitid VQ8OwWIg5RS9kn8B; 1.1.1.5 date 2020.09.05.07.52.18; author mrg; state Exp; branches; next ; commitid ZRYA7IOuwfMjAPmC; 1.1.1.1.4.1 date 2014.03.01.08.41.30; author yamt; state dead; branches; next 1.1.1.1.4.2; commitid DX8bafDLmqEbpyBx; 1.1.1.1.4.2 date 2014.05.22.16.37.45; author yamt; state Exp; branches; next ; commitid DX8bafDLmqEbpyBx; 1.1.1.1.8.1 date 2014.03.01.08.41.30; author tls; state dead; branches; next 1.1.1.1.8.2; commitid jTnpym9Qu0o4R1Nx; 1.1.1.1.8.2 date 2014.08.19.23.54.46; author tls; state Exp; branches; next ; commitid jTnpym9Qu0o4R1Nx; 1.1.1.2.14.1 date 2018.11.26.01.50.57; author pgoyette; state Exp; branches; next 1.1.1.2.14.2; commitid Zj4q5SspGdKXto1B; 1.1.1.2.14.2 date 2019.01.26.21.59.32; author pgoyette; state Exp; branches; next ; commitid JKpcmvSjdT25dl9B; 1.1.1.2.16.1 date 2019.06.10.21.54.49; author christos; state Exp; branches; next ; commitid jtc8rnCzWiEEHGqB; desc @@ 1.1 log @Initial revision @ text @ Design

Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are only included from impl/profiler.h, which is the only file included from the public headers.

Wrapper Model

In order to get our instrumented library version included instead of the release one, we use the same wrapper model as the debug mode. We subclass entities from the release version. Wherever _GLIBCXX_PROFILE is defined, the release namespace is std::__norm, whereas the profile namespace is std::__profile. Using plain std translates into std::__profile.

Whenever possible, we try to wrap at the public interface level, e.g., in unordered_set rather than in hashtable, in order not to depend on implementation.

Mixing object files built with and without the profile mode must not affect the program execution. However, there are no guarantees to the accuracy of diagnostics when using even a single object not built with -D_GLIBCXX_PROFILE. Currently, mixing the profile mode with debug and parallel extensions is not allowed. Mixing them at compile time will result in preprocessor errors. Mixing them at link time is undefined.

Instrumentation

Instead of instrumenting every public entry and exit point, we chose to add instrumentation on demand, as needed by individual diagnostics. The main reason is that some diagnostics require us to extract bits of internal state that are particular only to that diagnostic. We plan to formalize this later, after we learn more about the requirements of several diagnostics.

All the instrumentation points can be switched on and off using -D[_NO]_GLIBCXX_PROFILE_<diagnostic> options. With all the instrumentation calls off, there should be negligible overhead over the release version. This property is needed to support diagnostics based on timing of internal operations. For such diagnostics, we anticipate turning most of the instrumentation off in order to prevent profiling overhead from polluting time measurements, and thus diagnostics.

All the instrumentation on/off compile time switches live in include/profile/profiler.h.

Run Time Behavior

For practical reasons, the instrumentation library processes the trace partially rather than dumping it to disk in raw form. Each event is processed when it occurs. It is usually attached a cost and it is aggregated into the database of a specific diagnostic class. The cost model is based largely on the standard performance guarantees, but in some cases we use knowledge about GCC's standard library implementation.

Information is indexed by (1) call stack and (2) instance id or address to be able to understand and summarize precise creation-use-destruction dynamic chains. Although the analysis is sensitive to dynamic instances, the reports are only sensitive to call context. Whenever a dynamic instance is destroyed, we accumulate its effect to the corresponding entry for the call stack of its constructor location.

For details, see paper presented at CGO 2009.

Analysis and Diagnostics

Final analysis takes place offline, and it is based entirely on the generated trace and debugging info in the application binary. See section Diagnostics for a list of analysis types that we plan to support.

The input to the analysis is a table indexed by profile type and call stack. The data type for each entry depends on the profile type.

Cost Model

While it is likely that cost models become complex as we get into more sophisticated analysis, we will try to follow a simple set of rules at the beginning.

  • Relative benefit estimation: The idea is to estimate or measure the cost of all operations in the original scenario versus the scenario we advise to switch to. For instance, when advising to change a vector to a list, an occurrence of the insert method will generally count as a benefit. Its magnitude depends on (1) the number of elements that get shifted and (2) whether it triggers a reallocation.

  • Synthetic measurements: We will measure the relative difference between similar operations on different containers. We plan to write a battery of small tests that compare the times of the executions of similar methods on different containers. The idea is to run these tests on the target machine. If this training phase is very quick, we may decide to perform it at library initialization time. The results can be cached on disk and reused across runs.

  • Timers: We plan to use timers for operations of larger granularity, such as sort. For instance, we can switch between different sort methods on the fly and report the one that performs best for each call context.

  • Show stoppers: We may decide that the presence of an operation nullifies the advice. For instance, when considering switching from set to unordered_set, if we detect use of operator ++, we will simply not issue the advice, since this could signal that the use care require a sorted container.

Reports

There are two types of reports. First, if we recognize a pattern for which we have a substitute that is likely to give better performance, we print the advice and estimated performance gain. The advice is usually associated to a code position and possibly a call stack.

Second, we report performance characteristics for which we do not have a clear solution for improvement. For instance, we can point to the user the top 10 multimap locations which have the worst data locality in actual traversals. Although this does not offer a solution, it helps the user focus on the key problems and ignore the uninteresting ones.

Testing

First, we want to make sure we preserve the behavior of the release mode. You can just type "make check-profile", which builds and runs the whole test suite in profile mode.

Second, we want to test the correctness of each diagnostic. We created a profile directory in the test suite. Each diagnostic must come with at least two tests, one for false positives and one for false negatives.

@ 1.1.1.1 log @import GCC 4.8 branch at r206687. highlights from: http://gcc.gnu.org/gcc-4.6/changes.html GCC now has stricter checks for invalid command-line options New -Wunused-but-set-variable and -Wunused-but-set-parameter warnings Many platforms have been obsoleted Link-time optimization improvements A new switch -fstack-usage has been added A new function attribute leaf was introduced A new warning, enabled by -Wdouble-promotion Support for selectively enabling and disabling warnings via #pragma GCC diagnostic has been added There is now experimental support for some features from the upcoming C1X revision of the ISO C standard Improved experimental support for the upcoming C++0x ISO C++ standard G++ now issues clearer diagnostics in several cases Updates for ARM, x86, MIPS, PPC/PPC64, SPARC Darwin, FreeBSD, Solaris 2, MinGW and Cygwin now all support __float128 on 32-bit and 64-bit x86 targets. [*1] highlights from: http://gcc.gnu.org/gcc-4.7/changes.html The -fconserve-space flag has been deprecated Support for a new parameter --param case-values-threshold=n was added Interprocedural and Link-time optimization improvements A new built-in, __builtin_assume_aligned, has been added A new warning option -Wunused-local-typedefs was added A new experimental command-line option -ftrack-macro-expansion was added Support for atomic operations specifying the C++11/C11 memory model has been added There is support for some more features from the C11 revision of the ISO C standard Improved experimental support for the new ISO C++ standard, C++11 Updates for ARM, x86, MIPS, PPC/PPC64, SH, SPARC, TILE* A new option (-grecord-gcc-switches) was added highlights from: http://gcc.gnu.org/gcc-4.8/changes.html GCC now uses C++ as its implementation language. This means that to build GCC from sources, you will need a C++ compiler that understands C++ 2003 DWARF4 is now the default when generating DWARF debug information A new general optimization level, -Og, has been introduced A new option -ftree-partial-pre was added The option -fconserve-space has been removed The command-line options -fipa-struct-reorg and -fipa-matrix-reorg have been removed Interprocedural and Link-time optimization improvements AddressSanitizer, a fast memory error detector, has been added [*2] A new -Wsizeof-pointer-memaccess warning has been added G++ now supports a -std=c++1y option for experimentation with features proposed for the next revision of the standard, expected around 2014 Improved experimental support for the new ISO C++ standard, C++11 A new port has been added to support AArch64 Updates for ARM, x86, MIPS, PPC/PPC64, SH, SPARC, TILE* [*1] we should support this too! [*2] we should look into this. https://code.google.com/p/address-sanitizer/ @ text @@ 1.1.1.2 log @import GCC 5.3.0. see these urls for details which are too large to include here: http://gcc.gnu.org/gcc-4.9/changes.html http://gcc.gnu.org/gcc-5/changes.html (note that GCC 5.x is a release stream like GCC 4.9.x, 4.8.x, etc.) the main issues we will have are: The default mode for C is now -std=gnu11 instead of -std=gnu89. ARM: The deprecated option -mwords-little-endian has been removed. The options -mapcs, -mapcs-frame, -mtpcs-frame and -mtpcs-leaf-frame which are only applicable to the old ABI have been deprecated. MIPS: The o32 ABI has been modified and extended. The o32 64-bit floating-point register support is now obsolete and has been removed. It has been replaced by three ABI extensions FPXX, FP64A, and FP64. The meaning of the -mfp64 command-line option has changed. It is now used to enable the FP64A and FP64 ABI extensions. @ text @d3 1 a3 1

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are @ 1.1.1.2.16.1 log @Sync with HEAD @ text @d2 2 a3 2 Design

Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are d63 1 a63 1 paper presented at @ 1.1.1.2.14.1 log @Sync with HEAD, resolve a couple of conflicts @ text @d2 2 a3 2 Design

Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are @ 1.1.1.2.14.2 log @Sync with HEAD @ text @d63 1 a63 1 paper presented at @ 1.1.1.3 log @import GCC 6.5.0. this is largely a maint release with no particularly features listed here: http://gcc.gnu.org/gcc-6/changes.html this fixes over 250 PRs in the GCC bugzilla: https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=6.5 @ text @d2 2 a3 2 Design

Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are @ 1.1.1.4 log @import GCC 7.4.0. main changes include: The non-standard C++0x type traits has_trivial_default_constructor, has_trivial_copy_constructor and has_trivial_copy_assign have been removed. On ARM targets (arm*-*-*), a bug introduced in GCC 5 that affects conformance to the procedure call standard (AAPCS) has been fixed. Many optimiser improvements DWARF-5 support. Many new and enhanced warnings. Warnings about format strings now underline the pertinent part of the string, and can offer suggested fixes. Several new warnings related to buffer overflows and buffer truncation. New __builtin_add_overflow_p, __builtin_sub_overflow_p, __builtin_mul_overflow_p built-ins added that test for overflow. The C++ front end has experimental support for all of the current C++17 draft. The -fverbose-asm option has been expanded to prints comments showing the source lines that correspond to the assembly. The gcc and g++ driver programs will now provide suggestions for misspelled arguments to command-line options. AArch64 specific: GCC has been updated to the latest revision of the procedure call standard (AAPCS64) to provide support for parameter passing when data types have been over-aligned. The ARMv8.2-A and ARMv8.3-A architecture are now supported. ARM specific: Support for the ARMv5 and ARMv5E architectures has been deprecated (which have no known implementations). A new command-line option -mpure-code has been added. It does not allow constant data to be placed in code sections. x86 specific: Support for the AVX-512 4FMAPS, 4VNNIW, VPOPCNTDQ and Software Guard Extensions (SGX) ISA extensions has been added. PPC specific: GCC now diagnoses inline assembly that clobbers register r2. RISC-V specific: Support for the RISC-V instruction set has been added. SH specific: Support for SH5/SH64 has been removed. Support for SH2A has been enhanced. @ text @d63 1 a63 1 paper presented at @ 1.1.1.5 log @initial import of GCC 9.3.0. changes include: - live patching support - shell completion help - generally better diagnostic output (less verbose/more useful) - diagnostics and optimisation choices can be emitted in json - asan memory usage reduction - many general, and specific to switch, inter-procedure, profile and link-time optimisations. from the release notes: "Overall compile time of Firefox 66 and LibreOffice 6.2.3 on an 8-core machine was reduced by about 5% compared to GCC 8.3" - OpenMP 5.0 support - better spell-guesser - partial experimental support for c2x and c++2a - c++17 is no longer experimental - arm AAPCS GCC 6-8 structure passing bug fixed, may cause incompatibility (restored compat with GCC 5 and earlier.) - openrisc support @ text @d63 1 a63 1 paper presented at @ 1.1.1.1.8.1 log @file profile_mode_design.html was added on branch tls-maxphys on 2014-08-19 23:54:46 +0000 @ text @d1 121 @ 1.1.1.1.8.2 log @Rebase to HEAD as of a few days ago. @ text @a0 121 Design

Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are only included from impl/profiler.h, which is the only file included from the public headers.

Wrapper Model

In order to get our instrumented library version included instead of the release one, we use the same wrapper model as the debug mode. We subclass entities from the release version. Wherever _GLIBCXX_PROFILE is defined, the release namespace is std::__norm, whereas the profile namespace is std::__profile. Using plain std translates into std::__profile.

Whenever possible, we try to wrap at the public interface level, e.g., in unordered_set rather than in hashtable, in order not to depend on implementation.

Mixing object files built with and without the profile mode must not affect the program execution. However, there are no guarantees to the accuracy of diagnostics when using even a single object not built with -D_GLIBCXX_PROFILE. Currently, mixing the profile mode with debug and parallel extensions is not allowed. Mixing them at compile time will result in preprocessor errors. Mixing them at link time is undefined.

Instrumentation

Instead of instrumenting every public entry and exit point, we chose to add instrumentation on demand, as needed by individual diagnostics. The main reason is that some diagnostics require us to extract bits of internal state that are particular only to that diagnostic. We plan to formalize this later, after we learn more about the requirements of several diagnostics.

All the instrumentation points can be switched on and off using -D[_NO]_GLIBCXX_PROFILE_<diagnostic> options. With all the instrumentation calls off, there should be negligible overhead over the release version. This property is needed to support diagnostics based on timing of internal operations. For such diagnostics, we anticipate turning most of the instrumentation off in order to prevent profiling overhead from polluting time measurements, and thus diagnostics.

All the instrumentation on/off compile time switches live in include/profile/profiler.h.

Run Time Behavior

For practical reasons, the instrumentation library processes the trace partially rather than dumping it to disk in raw form. Each event is processed when it occurs. It is usually attached a cost and it is aggregated into the database of a specific diagnostic class. The cost model is based largely on the standard performance guarantees, but in some cases we use knowledge about GCC's standard library implementation.

Information is indexed by (1) call stack and (2) instance id or address to be able to understand and summarize precise creation-use-destruction dynamic chains. Although the analysis is sensitive to dynamic instances, the reports are only sensitive to call context. Whenever a dynamic instance is destroyed, we accumulate its effect to the corresponding entry for the call stack of its constructor location.

For details, see paper presented at CGO 2009.

Analysis and Diagnostics

Final analysis takes place offline, and it is based entirely on the generated trace and debugging info in the application binary. See section Diagnostics for a list of analysis types that we plan to support.

The input to the analysis is a table indexed by profile type and call stack. The data type for each entry depends on the profile type.

Cost Model

While it is likely that cost models become complex as we get into more sophisticated analysis, we will try to follow a simple set of rules at the beginning.

  • Relative benefit estimation: The idea is to estimate or measure the cost of all operations in the original scenario versus the scenario we advise to switch to. For instance, when advising to change a vector to a list, an occurrence of the insert method will generally count as a benefit. Its magnitude depends on (1) the number of elements that get shifted and (2) whether it triggers a reallocation.

  • Synthetic measurements: We will measure the relative difference between similar operations on different containers. We plan to write a battery of small tests that compare the times of the executions of similar methods on different containers. The idea is to run these tests on the target machine. If this training phase is very quick, we may decide to perform it at library initialization time. The results can be cached on disk and reused across runs.

  • Timers: We plan to use timers for operations of larger granularity, such as sort. For instance, we can switch between different sort methods on the fly and report the one that performs best for each call context.

  • Show stoppers: We may decide that the presence of an operation nullifies the advice. For instance, when considering switching from set to unordered_set, if we detect use of operator ++, we will simply not issue the advice, since this could signal that the use care require a sorted container.

Reports

There are two types of reports. First, if we recognize a pattern for which we have a substitute that is likely to give better performance, we print the advice and estimated performance gain. The advice is usually associated to a code position and possibly a call stack.

Second, we report performance characteristics for which we do not have a clear solution for improvement. For instance, we can point to the user the top 10 multimap locations which have the worst data locality in actual traversals. Although this does not offer a solution, it helps the user focus on the key problems and ignore the uninteresting ones.

Testing

First, we want to make sure we preserve the behavior of the release mode. You can just type "make check-profile", which builds and runs the whole test suite in profile mode.

Second, we want to test the correctness of each diagnostic. We created a profile directory in the test suite. Each diagnostic must come with at least two tests, one for false positives and one for false negatives.

@ 1.1.1.1.4.1 log @file profile_mode_design.html was added on branch yamt-pagecache on 2014-05-22 16:37:45 +0000 @ text @d1 121 @ 1.1.1.1.4.2 log @sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments") @ text @a0 121 Design

Design

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are only included from impl/profiler.h, which is the only file included from the public headers.

Wrapper Model

In order to get our instrumented library version included instead of the release one, we use the same wrapper model as the debug mode. We subclass entities from the release version. Wherever _GLIBCXX_PROFILE is defined, the release namespace is std::__norm, whereas the profile namespace is std::__profile. Using plain std translates into std::__profile.

Whenever possible, we try to wrap at the public interface level, e.g., in unordered_set rather than in hashtable, in order not to depend on implementation.

Mixing object files built with and without the profile mode must not affect the program execution. However, there are no guarantees to the accuracy of diagnostics when using even a single object not built with -D_GLIBCXX_PROFILE. Currently, mixing the profile mode with debug and parallel extensions is not allowed. Mixing them at compile time will result in preprocessor errors. Mixing them at link time is undefined.

Instrumentation

Instead of instrumenting every public entry and exit point, we chose to add instrumentation on demand, as needed by individual diagnostics. The main reason is that some diagnostics require us to extract bits of internal state that are particular only to that diagnostic. We plan to formalize this later, after we learn more about the requirements of several diagnostics.

All the instrumentation points can be switched on and off using -D[_NO]_GLIBCXX_PROFILE_<diagnostic> options. With all the instrumentation calls off, there should be negligible overhead over the release version. This property is needed to support diagnostics based on timing of internal operations. For such diagnostics, we anticipate turning most of the instrumentation off in order to prevent profiling overhead from polluting time measurements, and thus diagnostics.

All the instrumentation on/off compile time switches live in include/profile/profiler.h.

Run Time Behavior

For practical reasons, the instrumentation library processes the trace partially rather than dumping it to disk in raw form. Each event is processed when it occurs. It is usually attached a cost and it is aggregated into the database of a specific diagnostic class. The cost model is based largely on the standard performance guarantees, but in some cases we use knowledge about GCC's standard library implementation.

Information is indexed by (1) call stack and (2) instance id or address to be able to understand and summarize precise creation-use-destruction dynamic chains. Although the analysis is sensitive to dynamic instances, the reports are only sensitive to call context. Whenever a dynamic instance is destroyed, we accumulate its effect to the corresponding entry for the call stack of its constructor location.

For details, see paper presented at CGO 2009.

Analysis and Diagnostics

Final analysis takes place offline, and it is based entirely on the generated trace and debugging info in the application binary. See section Diagnostics for a list of analysis types that we plan to support.

The input to the analysis is a table indexed by profile type and call stack. The data type for each entry depends on the profile type.

Cost Model

While it is likely that cost models become complex as we get into more sophisticated analysis, we will try to follow a simple set of rules at the beginning.

  • Relative benefit estimation: The idea is to estimate or measure the cost of all operations in the original scenario versus the scenario we advise to switch to. For instance, when advising to change a vector to a list, an occurrence of the insert method will generally count as a benefit. Its magnitude depends on (1) the number of elements that get shifted and (2) whether it triggers a reallocation.

  • Synthetic measurements: We will measure the relative difference between similar operations on different containers. We plan to write a battery of small tests that compare the times of the executions of similar methods on different containers. The idea is to run these tests on the target machine. If this training phase is very quick, we may decide to perform it at library initialization time. The results can be cached on disk and reused across runs.

  • Timers: We plan to use timers for operations of larger granularity, such as sort. For instance, we can switch between different sort methods on the fly and report the one that performs best for each call context.

  • Show stoppers: We may decide that the presence of an operation nullifies the advice. For instance, when considering switching from set to unordered_set, if we detect use of operator ++, we will simply not issue the advice, since this could signal that the use care require a sorted container.

Reports

There are two types of reports. First, if we recognize a pattern for which we have a substitute that is likely to give better performance, we print the advice and estimated performance gain. The advice is usually associated to a code position and possibly a call stack.

Second, we report performance characteristics for which we do not have a clear solution for improvement. For instance, we can point to the user the top 10 multimap locations which have the worst data locality in actual traversals. Although this does not offer a solution, it helps the user focus on the key problems and ignore the uninteresting ones.

Testing

First, we want to make sure we preserve the behavior of the release mode. You can just type "make check-profile", which builds and runs the whole test suite in profile mode.

Second, we want to test the correctness of each diagnostic. We created a profile directory in the test suite. Each diagnostic must come with at least two tests, one for false positives and one for false negatives.

@