<para>
<application>pg_test_timing</application> is a tool to measure the
timing overhead on your system and confirm that the system time never
- moves backwards. It simply reads the system clock over and over again
+ moves backwards. It reads supported clock sources over and over again
as fast as it can for a specified length of time, and then prints
- statistics about the observed differences in successive clock readings.
+ statistics about the observed differences in successive clock readings,
+ as well as which clock source will be used.
</para>
<para>
Smaller (but not zero) differences are better, since they imply both
<para>
This tool is also helpful to determine if
the <varname>track_io_timing</varname> configuration parameter is likely
- to produce useful results.
+ to produce useful results, and whether the
+ <acronym>TSC</acronym> clock source (see
+ <xref linkend="guc-timing-clock-source"/>) is available and if it will be
+ used by default.
</para>
</refsect1>
However, the largest observed difference is always shown.
</para>
<para>
- The example results below show that 99.99% of timing loops took between
- 8 and 31 nanoseconds, with the worst case somewhere between 32768 and
- 65535 nanoseconds. In the second block, we can see that typical loop
- time is 16 nanoseconds, and the readings appear to have full nanosecond
- precision.
+ On platforms that support the <acronym>TSC</acronym> clock source,
+ additional output sections are shown for the <command>RDTSCP</command>
+ instruction (used for general timing needs, such as
+ <varname>track_io_timing</varname>) and the <command>RDTSC</command>
+ instruction (used for <command>EXPLAIN ANALYZE</command>). At the end
+ of the output, the <acronym>TSC</acronym> frequency, which may either be
+ sourced from CPU information directly, or the alternate calibration
+ mechanism are shown, as well as whether the <acronym>TSC</acronym> clock
+ source will be used by default.
+ </para>
+
+ <para>
+ The example results below show system clock timing where 99.99% of loops
+ took between 16 and 63 nanoseconds. In the second block, we can see that
+ the typical loop time is 40 nanoseconds, and the readings appear to have
+ full nanosecond precision. Following the system clock results, the
+ <acronym>TSC</acronym> clock source results are shown, in the same fashion.
+ The <command>RDTSCP</command> instruction shows most loops completing in
+ 20–30 nanoseconds, while the <command>RDTSC</command> instruction is
+ the fastest with an average loop time of 20 nanoseconds. In this example
+ the <acronym>TSC</acronym> clock source will be used by default, but can be
+ disabled by setting <varname>timing_clock_source</varname> to
+ <literal>system</literal>.
</para>
<para>
<screen><![CDATA[
+System clock source: clock_gettime (CLOCK_MONOTONIC)
Testing timing overhead for 3 seconds.
-Average loop time including overhead: 16.40 ns
+Average loop time including overhead: 44.67 ns
+Histogram of timing durations:
+ <= ns % of total running % count
+ 0 0.0000 0.0000 0
+ 1 0.0000 0.0000 0
+ 3 0.0000 0.0000 0
+ 7 0.0000 0.0000 0
+ 15 0.0000 0.0000 0
+ 31 24.0606 24.0606 5385707
+ 63 75.8342 99.8948 16974658
+ 127 0.0900 99.9848 20143
+ 255 0.0069 99.9917 1542
+ 511 0.0014 99.9932 322
+ 1023 0.0003 99.9935 68
+ 2047 0.0001 99.9936 23
+ 4095 0.0036 99.9972 813
+ 8191 0.0018 99.9990 402
+ 16383 0.0005 99.9995 120
+ 32767 0.0001 99.9997 32
+ 65535 0.0001 99.9998 24
+
+Observed timing durations up to 99.9900%:
+ ns % of total running % count
+ 29 3.6921 3.6921 826442
+ 30 16.6755 20.3676 3732628
+ 31 3.6930 24.0606 826637
+ 40 75.7761 99.8368 16961658
+ 41 0.0019 99.8387 431
+...
+ 190 0.0003 99.9901 65
+...
+29657159 0.0000 100.0000 1
+
+Clock source: RDTSCP
+Average loop time including overhead: 37.32 ns
+Histogram of timing durations:
+ <= ns % of total running % count
+ 0 0.0000 0.0000 0
+ 1 0.0000 0.0000 0
+ 3 0.0000 0.0000 0
+ 7 0.0000 0.0000 0
+ 15 0.0000 0.0000 0
+ 31 99.9499 99.9499 26782299
+ 63 0.0381 99.9880 10220
+ 127 0.0008 99.9889 224
+ 255 0.0052 99.9941 1403
+ 511 0.0013 99.9954 340
+ 1023 0.0001 99.9954 17
+ 2047 0.0000 99.9955 7
+ 4095 0.0021 99.9976 569
+ 8191 0.0013 99.9989 357
+ 16383 0.0005 99.9994 128
+ 32767 0.0003 99.9996 70
+ 65535 0.0001 99.9997 19
+
+Observed timing durations up to 99.9900%:
+ ns % of total running % count
+ 20 16.9064 16.9064 4530201
+ 29 41.5214 58.4279 11125972
+ 30 41.5220 99.9499 11126126
+ 40 0.0089 99.9587 2374
+...
+ 130 0.0007 99.9902 181
+...
+18501572 0.0000 100.0000 1
+
+Fast clock source: RDTSC
+Average loop time including overhead: 27.12 ns
Histogram of timing durations:
<= ns % of total running % count
0 0.0000 0.0000 0
1 0.0000 0.0000 0
3 0.0000 0.0000 0
7 0.0000 0.0000 0
- 15 4.5452 4.5452 8313178
- 31 95.4527 99.9979 174581501
- 63 0.0001 99.9981 253
- 127 0.0001 99.9982 165
- 255 0.0000 99.9982 35
- 511 0.0000 99.9982 1
- 1023 0.0013 99.9994 2300
- 2047 0.0004 99.9998 690
- 4095 0.0000 99.9998 9
- 8191 0.0000 99.9998 8
- 16383 0.0002 100.0000 337
- 32767 0.0000 100.0000 2
- 65535 0.0000 100.0000 1
+ 15 1.2247 1.2247 456231
+ 31 98.7566 99.9813 36789785
+ 63 0.0109 99.9921 4049
+ 127 0.0029 99.9951 1087
+ 255 0.0008 99.9959 305
+ 511 0.0007 99.9966 279
+ 1023 0.0000 99.9966 7
+ 2047 0.0001 99.9967 22
+ 4095 0.0018 99.9985 673
+ 8191 0.0010 99.9995 383
+ 16383 0.0003 99.9998 94
+ 32767 0.0001 99.9999 38
+ 65535 0.0000 99.9999 9
Observed timing durations up to 99.9900%:
ns % of total running % count
- 15 4.5452 4.5452 8313178
- 16 58.3785 62.9237 106773354
- 17 33.6840 96.6078 61607584
- 18 3.1151 99.7229 5697480
- 19 0.2638 99.9867 482570
- 20 0.0093 99.9960 17054
+ 9 0.6316 0.6316 235290
+ 10 0.5931 1.2247 220941
+ 20 91.4328 92.6574 34061442
+ 29 3.6427 96.3001 1357007
+ 30 3.6811 99.9813 1371336
+ 40 0.0089 99.9902 3325
...
- 38051 0.0000 100.0000 1
+61594291 0.0000 100.0000 1
+
+TSC frequency in use: 2449228 kHz
+TSC frequency from calibration: 2448603 kHz
+TSC clock source will be used by default, unless timing_clock_source is set to 'system'.
]]></screen>
</para>
<simplelist type="inline">
<member><xref linkend="sql-explain"/></member>
+ <member><xref linkend="guc-timing-clock-source"/></member>
<member><ulink url="https://wiki.postgresql.org/wiki/Pg_test_timing">Wiki
discussion about timing</ulink></member>
</simplelist>
static void handle_args(int argc, char *argv[]);
-static uint64 test_timing(unsigned int duration);
+static void test_system_timing(void);
+#if PG_INSTR_TSC_CLOCK
+static void test_tsc_timing(void);
+#endif
+static uint64 test_timing(unsigned int duration, TimingClockSourceType source, bool fast_timing);
static void output(uint64 loop_count);
int
main(int argc, char *argv[])
{
- uint64 loop_count;
-
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_timing"));
progname = get_progname(argv[0]);
/* initialize timing infrastructure (required for INSTR_* calls) */
pg_initialize_timing();
- loop_count = test_timing(test_duration);
+ test_system_timing();
- output(loop_count);
+#if PG_INSTR_TSC_CLOCK
+ test_tsc_timing();
+#endif
return 0;
}
exit(1);
}
- printf(ngettext("Testing timing overhead for %u second.\n",
- "Testing timing overhead for %u seconds.\n",
+ printf(ngettext("Testing timing overhead for %u second.\n\n",
+ "Testing timing overhead for %u seconds.\n\n",
test_duration),
test_duration);
}
+/*
+ * This tests default (non-fast) timing code. A clock source for that is
+ * always available. Hence, we can unconditionally output the result.
+ */
+static void
+test_system_timing(void)
+{
+ uint64 loop_count;
+
+ loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_SYSTEM, false);
+ output(loop_count);
+}
+
+/*
+ * If on a supported architecture, test the TSC clock source. This clock
+ * source is not always available. In that case we print an informational
+ * message indicating as such.
+ *
+ * We first emit "slow" timings (RDTSCP on x86), which are used for higher
+ * precision measurements when the TSC clock source is enabled. We emit
+ * "fast" timings second (RDTSC on x86), which is used for faster timing
+ * measurements with lower precision.
+ */
+#if PG_INSTR_TSC_CLOCK
+static void
+test_tsc_timing(void)
+{
+ uint64 loop_count;
+ uint32 calibrated_freq;
+
+ printf("\n");
+ loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_TSC, false);
+ if (loop_count > 0)
+ {
+ output(loop_count);
+ printf("\n");
+
+ /* Now, emit fast timing measurements */
+ loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_TSC, true);
+ output(loop_count);
+ printf("\n");
+
+ printf(_("TSC frequency in use: %u kHz\n"), timing_tsc_frequency_khz);
+
+ calibrated_freq = pg_tsc_calibrate_frequency();
+ if (calibrated_freq > 0)
+ printf(_("TSC frequency from calibration: %u kHz\n"), calibrated_freq);
+ else
+ printf(_("TSC calibration did not converge\n"));
+
+ pg_set_timing_clock_source(TIMING_CLOCK_SOURCE_AUTO);
+ if (pg_current_timing_clock_source() == TIMING_CLOCK_SOURCE_TSC)
+ printf(_("TSC clock source will be used by default, unless timing_clock_source is set to 'system'.\n"));
+ else
+ printf(_("TSC clock source will not be used by default, unless timing_clock_source is set to 'tsc'.\n"));
+ }
+ else
+ printf(_("TSC clock source is not usable. Likely unable to determine TSC frequency. Are you running in an unsupported virtualized environment?\n"));
+}
+#endif
+
static uint64
-test_timing(unsigned int duration)
+test_timing(unsigned int duration, TimingClockSourceType source, bool fast_timing)
{
uint64 loop_count = 0;
instr_time start_time,
end_time,
prev,
cur;
+ const char *time_source = NULL;
+
+ if (!pg_set_timing_clock_source(source))
+ return 0;
+
+ time_source = PG_INSTR_SYSTEM_CLOCK_NAME;
+
+#if PG_INSTR_TSC_CLOCK
+ if (pg_current_timing_clock_source() == TIMING_CLOCK_SOURCE_TSC)
+ time_source = fast_timing ? PG_INSTR_TSC_CLOCK_NAME_FAST : PG_INSTR_TSC_CLOCK_NAME;
+#endif
+
+ if (fast_timing)
+ printf(_("Fast clock source: %s\n"), time_source);
+ else if (source == TIMING_CLOCK_SOURCE_SYSTEM)
+ printf(_("System clock source: %s\n"), time_source);
+ else
+ printf(_("Clock source: %s\n"), time_source);
/*
* Pre-zero the statistics data structures. They're already zero by
instr_time diff_time;
prev = cur;
- INSTR_TIME_SET_CURRENT(cur);
+
+ if (fast_timing)
+ INSTR_TIME_SET_CURRENT_FAST(cur);
+ else
+ INSTR_TIME_SET_CURRENT(cur);
diff_time = cur;
INSTR_TIME_SUBTRACT(diff_time, prev);