From: Nicholas Nethercote Date: Sat, 8 Jun 2002 14:06:37 +0000 (+0000) Subject: Updated Cachegrind section for the CPUID-addition/vg_cachegen-removal. X-Git-Tag: svn/VALGRIND_1_0_3~74 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=9e8a864df1e8b0363a879553f45677c9270e4a41;p=thirdparty%2Fvalgrind.git Updated Cachegrind section for the CPUID-addition/vg_cachegen-removal. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@401 --- diff --git a/cachegrind/docs/manual.html b/cachegrind/docs/manual.html index 3b797dfce8..b5b1e2e963 100644 --- a/cachegrind/docs/manual.html +++ b/cachegrind/docs/manual.html @@ -1982,24 +1982,8 @@ info (the -g flag). But by contrast with normal Valgrind use, you probably do want to turn optimisation on, since you should profile your program as it will be normally run. -The three steps are: +The two steps are:
    -
  1. Generate a cache simulator for your machine's cache - configuration with the supplied vg_cachegen - program, and recompile Valgrind with make install. -

    - The default settings are for an AMD Athlon, and you will get - useful information with the defaults, so you can skip this step - if you want. Nevertheless, for accurate cache profiles you will - need use vg_cachegen to customise - cachegrind for your system. -

    - This step only needs to be done once, unless you are interested - in simulating different cache configurations (eg. first - concentrating on instruction cache misses, then on data cache - misses). -

  2. -

  3. Run your program with cachegrind in front of the normal command line invocation. When the program finishes, Valgrind will print summary cache statistics. It also collects @@ -2025,56 +2009,12 @@ The three steps are: The steps are described in detail in the following sections.

    - -

    7.3  Generating a cache simulator

    - -Although Valgrind comes with a pre-generated cache simulator, it most -likely won't match the cache configuration of your machine, so you -should generate a new simulator.

    - -You need to generate three files, one for each of the I1, D1 and L2 -caches. For each cache, you need to know the: -

      -
    • Cache size (bytes); -
    • Line size (bytes); -
    • Associativity. -
    - -vg_cachegen takes three options: -
      -
    • --I1=size,line_size,associativity -
    • --D1=size,line_size,associativity -
    • --L2=size,line_size,associativity -
    - -You can specify one, two or all three caches per invocation of -vg_cachegen. It checks that the configuration is sensible before -generating the simulators; to see the allowed values, run -vg_cachegen -h.

    - -An example invocation would be: - -

    - vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 -
    - -This simulates a machine with a 128KB split L1 2-way associative -cache, and a 256KB unified 8-way associative L2 cache. Both caches -have 64B lines.

    +

    7.3  Cache simulation specifics

    -If you don't know your cache configuration, you'll have to find it -out. (Ideally vg_cachegen could auto-identify your cache -configuration using the CPUID instruction, which could be done -automatically during installation, and this whole step could be -skipped.)

    - - -

    7.4  Cache simulation specifics

    - -vg_cachegen only generates simulations for a machine with -a split L1 cache and a unified L2 cache. This configuration is used -for all (modern) x86-based machines we are aware of. Old Cyrix CPUs -had a unified I and D L1 cache, but they are ancient history now.

    +Cachegrind uses a simulation for a machine with a split L1 cache and a unified +L2 cache. This configuration is used for all (modern) x86-based machines we +are aware of. Old Cyrix CPUs had a unified I and D L1 cache, but they are +ancient history now.

    The more specific characteristics of the simulation are as follows. @@ -2097,6 +2037,15 @@ The more specific characteristics of the simulation are as follows. from L1. Ditto AMD Durons and most modern VIAs.

  4. +The cache configuration simulated (cache size, associativity and line size) is +determined automagically using the CPUID instruction. If you have an old +machine that (a) doesn't support the CPUID instruction, or (b) supports it in +an early incarnation that doesn't give any cache information, then Cachegrind +will fall back to using a default configuration (that of a model 3/4 Athlon). +Cachegrind will tell you if this happens. You can manually specify one, two or +all three levels (I1/D1/L2) of the cache from the command line using the +--I1, --D1 and --L2 options.

    + Other noteworthy behaviour:

    If you are interested in simulating a cache with different properties, it is -not particularly hard to write your own cache simulator, or to modify existing -ones in vg_cachesim_I1.c, vg_cachesim_I1.c and -vg_cachesim_I1.c. We'd be interested to hear from anyone who -does. - +not particularly hard to write your own cache simulator, or to modify the +existing ones in vg_cachesim_I1.c, vg_cachesim_D1.c, +vg_cachesim_L2.c and vg_cachesim_gen.c. We'd be +interested to hear from anyone who does. -

    7.5  Profiling programs

    +

    7.4  Profiling programs

    Cache profiling is enabled by using the --cachesim=yes option to the valgrind shell script. Alternatively, it is probably more convenient to use the cachegrind script. -This automatically turns off Valgrind's memory checking functions, +Either way automatically turns off Valgrind's memory checking functions, since the cache simulation is slow enough already, and you probably don't want to do both at once.

    @@ -2173,7 +2121,7 @@ to the row's total).

    Combined instruction and data figures for the L2 cache follow that.

    -

    7.6  Output file

    +

    7.5  Output file

    As well as printing summary information, Cachegrind also writes line-by-line cache profiling information to a file named @@ -2193,6 +2141,30 @@ Things to note about the cachegrind.out file: of around 15 MB. + +

    7.6  Cachegrind options

    +Cachegrind accepts all the options that Valgrind does, although some of them +(ones related to memory checking) don't do anything when cache profiling.

    + +The interesting cache-simulation specific options are: + +

  5. --I1=<size>,<associativity>,<line_size>

    + --D1=<size>,<associativity>,<line_size>

    + --L2=<size>,<associativity>,<line_size>

    + [default: uses CPUID for cache configuration]

    + + Manually specifies the I1/D1/L2 cache configuration, where + size and line_size are measured in bytes. The + three items must be comma-separated, but with no space, eg: + +

    cachegrind --I1=65535,2,64
    + + You can specify one, two or three of the caches. Any level not manually + specified will be simulated using the configuration found in the normal + way (via the CPUID instruction, or failing that, via defaults). + + +

    7.7  Annotating C/C++ programs

    @@ -2517,7 +2489,7 @@ There are a couple of situations in which vg_annotate issues warnings. -

    7.10  Things to watch out for

    +

    7.11  Things to watch out for

    Some odd things that can occur during annotation:
      @@ -2600,7 +2572,7 @@ annotations that look like might be caused by a bug in the stabs reader, please let us know.

      -

      7.11  Accuracy

      +

      7.12  Accuracy

      Valgrind's cache profiling has a number of shortcomings:
        @@ -2640,12 +2612,8 @@ While these factors mean you shouldn't trust the results to be super-accurate, hopefully they should be close enough to be useful.

        -

        7.12  Todo

        +

        7.13  Todo

          -
        • Use CPUID instruction to auto-identify cache configuration during - installation. This would save the user from having to know their cache - configuration and using vg_cachegen.
        • -

        • Program start-up/shut-down calls a lot of functions that aren't interesting and just complicate the output. Would be nice to exclude these somehow.
        • diff --git a/coregrind/docs/manual.html b/coregrind/docs/manual.html index 3b797dfce8..b5b1e2e963 100644 --- a/coregrind/docs/manual.html +++ b/coregrind/docs/manual.html @@ -1982,24 +1982,8 @@ info (the -g flag). But by contrast with normal Valgrind use, you probably do want to turn optimisation on, since you should profile your program as it will be normally run. -The three steps are: +The two steps are:
            -
          1. Generate a cache simulator for your machine's cache - configuration with the supplied vg_cachegen - program, and recompile Valgrind with make install. -

            - The default settings are for an AMD Athlon, and you will get - useful information with the defaults, so you can skip this step - if you want. Nevertheless, for accurate cache profiles you will - need use vg_cachegen to customise - cachegrind for your system. -

            - This step only needs to be done once, unless you are interested - in simulating different cache configurations (eg. first - concentrating on instruction cache misses, then on data cache - misses). -

          2. -

          3. Run your program with cachegrind in front of the normal command line invocation. When the program finishes, Valgrind will print summary cache statistics. It also collects @@ -2025,56 +2009,12 @@ The three steps are: The steps are described in detail in the following sections.

            - -

            7.3  Generating a cache simulator

            - -Although Valgrind comes with a pre-generated cache simulator, it most -likely won't match the cache configuration of your machine, so you -should generate a new simulator.

            - -You need to generate three files, one for each of the I1, D1 and L2 -caches. For each cache, you need to know the: -

              -
            • Cache size (bytes); -
            • Line size (bytes); -
            • Associativity. -
            - -vg_cachegen takes three options: -
              -
            • --I1=size,line_size,associativity -
            • --D1=size,line_size,associativity -
            • --L2=size,line_size,associativity -
            - -You can specify one, two or all three caches per invocation of -vg_cachegen. It checks that the configuration is sensible before -generating the simulators; to see the allowed values, run -vg_cachegen -h.

            - -An example invocation would be: - -

            - vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 -
            - -This simulates a machine with a 128KB split L1 2-way associative -cache, and a 256KB unified 8-way associative L2 cache. Both caches -have 64B lines.

            +

            7.3  Cache simulation specifics

            -If you don't know your cache configuration, you'll have to find it -out. (Ideally vg_cachegen could auto-identify your cache -configuration using the CPUID instruction, which could be done -automatically during installation, and this whole step could be -skipped.)

            - - -

            7.4  Cache simulation specifics

            - -vg_cachegen only generates simulations for a machine with -a split L1 cache and a unified L2 cache. This configuration is used -for all (modern) x86-based machines we are aware of. Old Cyrix CPUs -had a unified I and D L1 cache, but they are ancient history now.

            +Cachegrind uses a simulation for a machine with a split L1 cache and a unified +L2 cache. This configuration is used for all (modern) x86-based machines we +are aware of. Old Cyrix CPUs had a unified I and D L1 cache, but they are +ancient history now.

            The more specific characteristics of the simulation are as follows. @@ -2097,6 +2037,15 @@ The more specific characteristics of the simulation are as follows. from L1. Ditto AMD Durons and most modern VIAs.

        +The cache configuration simulated (cache size, associativity and line size) is +determined automagically using the CPUID instruction. If you have an old +machine that (a) doesn't support the CPUID instruction, or (b) supports it in +an early incarnation that doesn't give any cache information, then Cachegrind +will fall back to using a default configuration (that of a model 3/4 Athlon). +Cachegrind will tell you if this happens. You can manually specify one, two or +all three levels (I1/D1/L2) of the cache from the command line using the +--I1, --D1 and --L2 options.

        + Other noteworthy behaviour:

          @@ -2119,19 +2068,18 @@ Other noteworthy behaviour:
        If you are interested in simulating a cache with different properties, it is -not particularly hard to write your own cache simulator, or to modify existing -ones in vg_cachesim_I1.c, vg_cachesim_I1.c and -vg_cachesim_I1.c. We'd be interested to hear from anyone who -does. - +not particularly hard to write your own cache simulator, or to modify the +existing ones in vg_cachesim_I1.c, vg_cachesim_D1.c, +vg_cachesim_L2.c and vg_cachesim_gen.c. We'd be +interested to hear from anyone who does. -

        7.5  Profiling programs

        +

        7.4  Profiling programs

        Cache profiling is enabled by using the --cachesim=yes option to the valgrind shell script. Alternatively, it is probably more convenient to use the cachegrind script. -This automatically turns off Valgrind's memory checking functions, +Either way automatically turns off Valgrind's memory checking functions, since the cache simulation is slow enough already, and you probably don't want to do both at once.

        @@ -2173,7 +2121,7 @@ to the row's total).

        Combined instruction and data figures for the L2 cache follow that.

        -

        7.6  Output file

        +

        7.5  Output file

        As well as printing summary information, Cachegrind also writes line-by-line cache profiling information to a file named @@ -2193,6 +2141,30 @@ Things to note about the cachegrind.out file: of around 15 MB.
      + +

      7.6  Cachegrind options

      +Cachegrind accepts all the options that Valgrind does, although some of them +(ones related to memory checking) don't do anything when cache profiling.

      + +The interesting cache-simulation specific options are: + +

    • --I1=<size>,<associativity>,<line_size>

      + --D1=<size>,<associativity>,<line_size>

      + --L2=<size>,<associativity>,<line_size>

      + [default: uses CPUID for cache configuration]

      + + Manually specifies the I1/D1/L2 cache configuration, where + size and line_size are measured in bytes. The + three items must be comma-separated, but with no space, eg: + +

      cachegrind --I1=65535,2,64
      + + You can specify one, two or three of the caches. Any level not manually + specified will be simulated using the configuration found in the normal + way (via the CPUID instruction, or failing that, via defaults). +
    + +

    7.7  Annotating C/C++ programs

    @@ -2517,7 +2489,7 @@ There are a couple of situations in which vg_annotate issues warnings. -

    7.10  Things to watch out for

    +

    7.11  Things to watch out for

    Some odd things that can occur during annotation:
      @@ -2600,7 +2572,7 @@ annotations that look like might be caused by a bug in the stabs reader, please let us know.

      -

      7.11  Accuracy

      +

      7.12  Accuracy

      Valgrind's cache profiling has a number of shortcomings:
        @@ -2640,12 +2612,8 @@ While these factors mean you shouldn't trust the results to be super-accurate, hopefully they should be close enough to be useful.

        -

        7.12  Todo

        +

        7.13  Todo

          -
        • Use CPUID instruction to auto-identify cache configuration during - installation. This would save the user from having to know their cache - configuration and using vg_cachegen.
        • -

        • Program start-up/shut-down calls a lot of functions that aren't interesting and just complicate the output. Would be nice to exclude these somehow.
        • diff --git a/docs/manual.html b/docs/manual.html index 3b797dfce8..b5b1e2e963 100644 --- a/docs/manual.html +++ b/docs/manual.html @@ -1982,24 +1982,8 @@ info (the -g flag). But by contrast with normal Valgrind use, you probably do want to turn optimisation on, since you should profile your program as it will be normally run. -The three steps are: +The two steps are:
            -
          1. Generate a cache simulator for your machine's cache - configuration with the supplied vg_cachegen - program, and recompile Valgrind with make install. -

            - The default settings are for an AMD Athlon, and you will get - useful information with the defaults, so you can skip this step - if you want. Nevertheless, for accurate cache profiles you will - need use vg_cachegen to customise - cachegrind for your system. -

            - This step only needs to be done once, unless you are interested - in simulating different cache configurations (eg. first - concentrating on instruction cache misses, then on data cache - misses). -

          2. -

          3. Run your program with cachegrind in front of the normal command line invocation. When the program finishes, Valgrind will print summary cache statistics. It also collects @@ -2025,56 +2009,12 @@ The three steps are: The steps are described in detail in the following sections.

            - -

            7.3  Generating a cache simulator

            - -Although Valgrind comes with a pre-generated cache simulator, it most -likely won't match the cache configuration of your machine, so you -should generate a new simulator.

            - -You need to generate three files, one for each of the I1, D1 and L2 -caches. For each cache, you need to know the: -

              -
            • Cache size (bytes); -
            • Line size (bytes); -
            • Associativity. -
            - -vg_cachegen takes three options: -
              -
            • --I1=size,line_size,associativity -
            • --D1=size,line_size,associativity -
            • --L2=size,line_size,associativity -
            - -You can specify one, two or all three caches per invocation of -vg_cachegen. It checks that the configuration is sensible before -generating the simulators; to see the allowed values, run -vg_cachegen -h.

            - -An example invocation would be: - -

            - vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 -
            - -This simulates a machine with a 128KB split L1 2-way associative -cache, and a 256KB unified 8-way associative L2 cache. Both caches -have 64B lines.

            +

            7.3  Cache simulation specifics

            -If you don't know your cache configuration, you'll have to find it -out. (Ideally vg_cachegen could auto-identify your cache -configuration using the CPUID instruction, which could be done -automatically during installation, and this whole step could be -skipped.)

            - - -

            7.4  Cache simulation specifics

            - -vg_cachegen only generates simulations for a machine with -a split L1 cache and a unified L2 cache. This configuration is used -for all (modern) x86-based machines we are aware of. Old Cyrix CPUs -had a unified I and D L1 cache, but they are ancient history now.

            +Cachegrind uses a simulation for a machine with a split L1 cache and a unified +L2 cache. This configuration is used for all (modern) x86-based machines we +are aware of. Old Cyrix CPUs had a unified I and D L1 cache, but they are +ancient history now.

            The more specific characteristics of the simulation are as follows. @@ -2097,6 +2037,15 @@ The more specific characteristics of the simulation are as follows. from L1. Ditto AMD Durons and most modern VIAs.

        +The cache configuration simulated (cache size, associativity and line size) is +determined automagically using the CPUID instruction. If you have an old +machine that (a) doesn't support the CPUID instruction, or (b) supports it in +an early incarnation that doesn't give any cache information, then Cachegrind +will fall back to using a default configuration (that of a model 3/4 Athlon). +Cachegrind will tell you if this happens. You can manually specify one, two or +all three levels (I1/D1/L2) of the cache from the command line using the +--I1, --D1 and --L2 options.

        + Other noteworthy behaviour:

          @@ -2119,19 +2068,18 @@ Other noteworthy behaviour:
        If you are interested in simulating a cache with different properties, it is -not particularly hard to write your own cache simulator, or to modify existing -ones in vg_cachesim_I1.c, vg_cachesim_I1.c and -vg_cachesim_I1.c. We'd be interested to hear from anyone who -does. - +not particularly hard to write your own cache simulator, or to modify the +existing ones in vg_cachesim_I1.c, vg_cachesim_D1.c, +vg_cachesim_L2.c and vg_cachesim_gen.c. We'd be +interested to hear from anyone who does. -

        7.5  Profiling programs

        +

        7.4  Profiling programs

        Cache profiling is enabled by using the --cachesim=yes option to the valgrind shell script. Alternatively, it is probably more convenient to use the cachegrind script. -This automatically turns off Valgrind's memory checking functions, +Either way automatically turns off Valgrind's memory checking functions, since the cache simulation is slow enough already, and you probably don't want to do both at once.

        @@ -2173,7 +2121,7 @@ to the row's total).

        Combined instruction and data figures for the L2 cache follow that.

        -

        7.6  Output file

        +

        7.5  Output file

        As well as printing summary information, Cachegrind also writes line-by-line cache profiling information to a file named @@ -2193,6 +2141,30 @@ Things to note about the cachegrind.out file: of around 15 MB.
      + +

      7.6  Cachegrind options

      +Cachegrind accepts all the options that Valgrind does, although some of them +(ones related to memory checking) don't do anything when cache profiling.

      + +The interesting cache-simulation specific options are: + +

    • --I1=<size>,<associativity>,<line_size>

      + --D1=<size>,<associativity>,<line_size>

      + --L2=<size>,<associativity>,<line_size>

      + [default: uses CPUID for cache configuration]

      + + Manually specifies the I1/D1/L2 cache configuration, where + size and line_size are measured in bytes. The + three items must be comma-separated, but with no space, eg: + +

      cachegrind --I1=65535,2,64
      + + You can specify one, two or three of the caches. Any level not manually + specified will be simulated using the configuration found in the normal + way (via the CPUID instruction, or failing that, via defaults). +
    + +

    7.7  Annotating C/C++ programs

    @@ -2517,7 +2489,7 @@ There are a couple of situations in which vg_annotate issues warnings. -

    7.10  Things to watch out for

    +

    7.11  Things to watch out for

    Some odd things that can occur during annotation:
      @@ -2600,7 +2572,7 @@ annotations that look like might be caused by a bug in the stabs reader, please let us know.

      -

      7.11  Accuracy

      +

      7.12  Accuracy

      Valgrind's cache profiling has a number of shortcomings:
        @@ -2640,12 +2612,8 @@ While these factors mean you shouldn't trust the results to be super-accurate, hopefully they should be close enough to be useful.

        -

        7.12  Todo

        +

        7.13  Todo

          -
        • Use CPUID instruction to auto-identify cache configuration during - installation. This would save the user from having to know their cache - configuration and using vg_cachegen.
        • -

        • Program start-up/shut-down calls a lot of functions that aren't interesting and just complicate the output. Would be nice to exclude these somehow.
        • diff --git a/memcheck/docs/manual.html b/memcheck/docs/manual.html index 3b797dfce8..b5b1e2e963 100644 --- a/memcheck/docs/manual.html +++ b/memcheck/docs/manual.html @@ -1982,24 +1982,8 @@ info (the -g flag). But by contrast with normal Valgrind use, you probably do want to turn optimisation on, since you should profile your program as it will be normally run. -The three steps are: +The two steps are:
            -
          1. Generate a cache simulator for your machine's cache - configuration with the supplied vg_cachegen - program, and recompile Valgrind with make install. -

            - The default settings are for an AMD Athlon, and you will get - useful information with the defaults, so you can skip this step - if you want. Nevertheless, for accurate cache profiles you will - need use vg_cachegen to customise - cachegrind for your system. -

            - This step only needs to be done once, unless you are interested - in simulating different cache configurations (eg. first - concentrating on instruction cache misses, then on data cache - misses). -

          2. -

          3. Run your program with cachegrind in front of the normal command line invocation. When the program finishes, Valgrind will print summary cache statistics. It also collects @@ -2025,56 +2009,12 @@ The three steps are: The steps are described in detail in the following sections.

            - -

            7.3  Generating a cache simulator

            - -Although Valgrind comes with a pre-generated cache simulator, it most -likely won't match the cache configuration of your machine, so you -should generate a new simulator.

            - -You need to generate three files, one for each of the I1, D1 and L2 -caches. For each cache, you need to know the: -

              -
            • Cache size (bytes); -
            • Line size (bytes); -
            • Associativity. -
            - -vg_cachegen takes three options: -
              -
            • --I1=size,line_size,associativity -
            • --D1=size,line_size,associativity -
            • --L2=size,line_size,associativity -
            - -You can specify one, two or all three caches per invocation of -vg_cachegen. It checks that the configuration is sensible before -generating the simulators; to see the allowed values, run -vg_cachegen -h.

            - -An example invocation would be: - -

            - vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 -
            - -This simulates a machine with a 128KB split L1 2-way associative -cache, and a 256KB unified 8-way associative L2 cache. Both caches -have 64B lines.

            +

            7.3  Cache simulation specifics

            -If you don't know your cache configuration, you'll have to find it -out. (Ideally vg_cachegen could auto-identify your cache -configuration using the CPUID instruction, which could be done -automatically during installation, and this whole step could be -skipped.)

            - - -

            7.4  Cache simulation specifics

            - -vg_cachegen only generates simulations for a machine with -a split L1 cache and a unified L2 cache. This configuration is used -for all (modern) x86-based machines we are aware of. Old Cyrix CPUs -had a unified I and D L1 cache, but they are ancient history now.

            +Cachegrind uses a simulation for a machine with a split L1 cache and a unified +L2 cache. This configuration is used for all (modern) x86-based machines we +are aware of. Old Cyrix CPUs had a unified I and D L1 cache, but they are +ancient history now.

            The more specific characteristics of the simulation are as follows. @@ -2097,6 +2037,15 @@ The more specific characteristics of the simulation are as follows. from L1. Ditto AMD Durons and most modern VIAs.

        +The cache configuration simulated (cache size, associativity and line size) is +determined automagically using the CPUID instruction. If you have an old +machine that (a) doesn't support the CPUID instruction, or (b) supports it in +an early incarnation that doesn't give any cache information, then Cachegrind +will fall back to using a default configuration (that of a model 3/4 Athlon). +Cachegrind will tell you if this happens. You can manually specify one, two or +all three levels (I1/D1/L2) of the cache from the command line using the +--I1, --D1 and --L2 options.

        + Other noteworthy behaviour:

          @@ -2119,19 +2068,18 @@ Other noteworthy behaviour:
        If you are interested in simulating a cache with different properties, it is -not particularly hard to write your own cache simulator, or to modify existing -ones in vg_cachesim_I1.c, vg_cachesim_I1.c and -vg_cachesim_I1.c. We'd be interested to hear from anyone who -does. - +not particularly hard to write your own cache simulator, or to modify the +existing ones in vg_cachesim_I1.c, vg_cachesim_D1.c, +vg_cachesim_L2.c and vg_cachesim_gen.c. We'd be +interested to hear from anyone who does. -

        7.5  Profiling programs

        +

        7.4  Profiling programs

        Cache profiling is enabled by using the --cachesim=yes option to the valgrind shell script. Alternatively, it is probably more convenient to use the cachegrind script. -This automatically turns off Valgrind's memory checking functions, +Either way automatically turns off Valgrind's memory checking functions, since the cache simulation is slow enough already, and you probably don't want to do both at once.

        @@ -2173,7 +2121,7 @@ to the row's total).

        Combined instruction and data figures for the L2 cache follow that.

        -

        7.6  Output file

        +

        7.5  Output file

        As well as printing summary information, Cachegrind also writes line-by-line cache profiling information to a file named @@ -2193,6 +2141,30 @@ Things to note about the cachegrind.out file: of around 15 MB.
      + +

      7.6  Cachegrind options

      +Cachegrind accepts all the options that Valgrind does, although some of them +(ones related to memory checking) don't do anything when cache profiling.

      + +The interesting cache-simulation specific options are: + +

    • --I1=<size>,<associativity>,<line_size>

      + --D1=<size>,<associativity>,<line_size>

      + --L2=<size>,<associativity>,<line_size>

      + [default: uses CPUID for cache configuration]

      + + Manually specifies the I1/D1/L2 cache configuration, where + size and line_size are measured in bytes. The + three items must be comma-separated, but with no space, eg: + +

      cachegrind --I1=65535,2,64
      + + You can specify one, two or three of the caches. Any level not manually + specified will be simulated using the configuration found in the normal + way (via the CPUID instruction, or failing that, via defaults). +
    + +

    7.7  Annotating C/C++ programs

    @@ -2517,7 +2489,7 @@ There are a couple of situations in which vg_annotate issues warnings. -

    7.10  Things to watch out for

    +

    7.11  Things to watch out for

    Some odd things that can occur during annotation:
      @@ -2600,7 +2572,7 @@ annotations that look like might be caused by a bug in the stabs reader, please let us know.

      -

      7.11  Accuracy

      +

      7.12  Accuracy

      Valgrind's cache profiling has a number of shortcomings:
        @@ -2640,12 +2612,8 @@ While these factors mean you shouldn't trust the results to be super-accurate, hopefully they should be close enough to be useful.

        -

        7.12  Todo

        +

        7.13  Todo

          -
        • Use CPUID instruction to auto-identify cache configuration during - installation. This would save the user from having to know their cache - configuration and using vg_cachegen.
        • -

        • Program start-up/shut-down calls a lot of functions that aren't interesting and just complicate the output. Would be nice to exclude these somehow.