Add another example to the statistics docs (GH-107904)

author Raymond Hettinger <rhettinger@users.noreply.github.com>

Sun, 13 Aug 2023 13:01:23 +0000 (08:01 -0500)

committer GitHub <noreply@github.com>

Sun, 13 Aug 2023 13:01:23 +0000 (08:01 -0500)
author Raymond Hettinger <rhettinger@users.noreply.github.com>
Sun, 13 Aug 2023 13:01:23 +0000 (08:01 -0500)
committer GitHub <noreply@github.com>
Sun, 13 Aug 2023 13:01:23 +0000 (08:01 -0500)
diff --git a/Doc/library/kde_example.png b/Doc/library/kde_example.png

new file mode 100644 (file)

index 0000000..f450489

Binary files /dev/null and b/Doc/library/kde_example.png differ
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst

index 395b324c860389f58b568acbd85732887d786c5c..483ebea67f0c6df3e49a039bfdf2fd5a59dc63a7 100644 (file)
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@@ -922,6 +922,10 @@ of applications in statistics.
  :class:`NormalDist` Examples and Recipes
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
+
+Classic probability problems
+****************************
+
  :class:`NormalDist` readily solves classic probability problems.
  
  For example, given `historical data for SAT exams
@@ -947,6 +951,10 @@ Find the `quartiles <https://en.wikipedia.org/wiki/Quartile>`_ and `deciles
      >>> list(map(round, sat.quantiles(n=10)))
      [810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310]
  
+
+Monte Carlo inputs for simulations
+**********************************
+
  To estimate the distribution for a model than isn't easy to solve
  analytically, :class:`NormalDist` can generate input samples for a `Monte
  Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
@@ -963,6 +971,9 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
      >>> quantiles(map(model, X, Y, Z))       # doctest: +SKIP
      [1.4591308524824727, 1.8035946855390597, 2.175091447274739]
  
+Approximating binomial distributions
+************************************
+
  Normal distributions can be used to approximate `Binomial
  distributions <https://mathworld.wolfram.com/BinomialDistribution.html>`_
  when the sample size is large and when the probability of a successful
@@ -1000,6 +1011,10 @@ probability that the Python room will stay within its capacity limits?
      >>> mean(trial() <= k for i in range(10_000))
      0.8398
  
+
+Naive bayesian classifier
+*************************
+
  Normal distributions commonly arise in machine learning problems.
  
  Wikipedia has a `nice example of a Naive Bayesian Classifier
@@ -1054,6 +1069,48 @@ The final prediction goes to the largest posterior. This is known as the
    'female'
  
  
+Kernel density estimation
+*************************
+
+It is possible to estimate a continuous probability density function
+from a fixed number of discrete samples.
+
+The basic idea is to smooth the data using `a kernel function such as a
+normal distribution, triangular distribution, or uniform distribution
+<https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use>`_.
+The degree of smoothing is controlled by a single
+parameter, ``h``, representing the variance of the kernel function.
+
+.. testcode::
+
+   import math
+
+   def kde_normal(sample, h):
+       "Create a continous probability density function from a sample."
+       # Smooth the sample with a normal distribution of variance h.
+       kernel_h = NormalDist(0.0, math.sqrt(h)).pdf
+       n = len(sample)
+       def pdf(x):
+           return sum(kernel_h(x - x_i) for x_i in sample) / n
+       return pdf
+
+`Wikipedia has an example
+<https://en.wikipedia.org/wiki/Kernel_density_estimation#Example>`_
+where we can use the ``kde_normal()`` recipe to generate and plot
+a probability density function estimated from a small sample:
+
+.. doctest::
+
+   >>> sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2]
+   >>> f_hat = kde_normal(sample, h=2.25)
+   >>> xarr = [i/100 for i in range(-750, 1100)]
+   >>> yarr = [f_hat(x) for x in xarr]
+
+The points in ``xarr`` and ``yarr`` can be used to make a PDF plot:
+
+.. image:: kde_example.png
+   :alt: Scatter plot of the estimated probability density function.
+
  ..
     # This modelines must appear within the last ten lines of the file.
     kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
author	Raymond Hettinger <rhettinger@users.noreply.github.com>
	Sun, 13 Aug 2023 13:01:23 +0000 (08:01 -0500)
committer	GitHub <noreply@github.com>
	Sun, 13 Aug 2023 13:01:23 +0000 (08:01 -0500)
Doc/library/kde_example.png	[new file with mode: 0644]	patch \| blob
Doc/library/statistics.rst		patch \| blob \| blame \| history