### Author Topic: Descriptive Statistics by Bruno Schaefer  (Read 3936 times)

0 Members and 1 Guest are viewing this topic.

#### The Librarian

• Moderator
• Newbie
• Posts: 42 ##### Descriptive Statistics by Bruno Schaefer
« on: June 28, 2018, 06:41:36 AM »
Descriptive Statistics

Author: @BSpinoza Bruno Schaefer, Losheim am See, Germany
Author contact: bup.schaefer (.at.) web.de
Source: Submission
Version: 2018-06-16
Tags: [maths] [statistics]

Description:
This program calculates basic descriptive statistics of univariate data:
n, Std.error, sum, standard error, mean, geometrical mean, variance,
standard deviation, coefficient of variation, minimum, 1st quartile, median,
2rd quartile, maximum,skewness, kurtosis, and excess kurtosis.
A dataset must have at least 4 values.

Remarks to kurtosis and skewness:
For kurtosis and skewness the same equation as SPSS, PAST and Excel is used.
Slightly different results may occur using other programs, especially for
small sample sizes.
kurtosis: peak shape  > 3 (excess > 0) leptokurtic: distribution with tapered peak and fat tails
= 3 (excess = 0) mesokurtic: similar to normal bell-curved distribution
< 3 (excess < 0) platykurtic: flat distribution with thin tails
skewness: symmetry    > 0 skewed right: its right tail is longer and most of the distribution is at the left.
= 0 symmetrical (not skewed)
< 0 skewed left: the left tail is longer and most of the distribution is at the right

Note that this program includes extended ASCII characters and may not copy/paste correctly. If the interface does not draw correctly, use the attached source listing.

Source code:
Code: QB64: [Select]
1. 'PROGRAM: descriptiveStatistics.bas
2. '================= Descriptive Statistics  ================
3. '        written by Bruno Schaefer, Losheim am See, Germany
4. '                                       created: 15.12.2016
5. '                                   last review: 16.06.2018
6. '============================================================================================================
7. ' This programm calculates basic descriptive statistics of univariate data:
8. ' n, Std.error, sum, standard error, mean, geometrical mean, variance,
9. ' standard deviation, coefficient of variation, minimum, 1st quartile, median,
10. ' 2rd quartile, maximum,skewness, kurtosis, and excess kurtosis.
11. ' A dataset must have at least 4 values.
12. ' For kurtosis and skewness the same equation as SPSS, PAST and Excel is used.
13. ' Slightly different results may occur using other programs, especially for
14. ' small sample sizes.
15. ' kurtosis: peak shape  > 3 (excess > 0) leptokurtic: distribution with tapered peak and fat tails
16. '                       = 3 (excess = 0) mesokurtic: similar to normal bell-curved distribution
17. '                       < 3 (excess < 0) platykurtic: flat distribution with thin tails
18. ' skewness: symmetry    > 0 skewed right: its right tail is longer and most of the distribution is at the left.
19. '                       = 0 symmetrical (not skewed)
20. '                       < 0 skewed left: the left tail is longer and most of the distribution is at the right
21. '===============================================================================================================
22. _TITLE "descriptive statistics"
23. SCREEN _NEWIMAGE(680, 520, 256)
24. WEITER\$ = "y" 'loop variable
25. _CLIPBOARD\$ = "" 'clears the clipboard
26.     _LIMIT 30
27.         CLS , 14
28.         COLOR 0, 14
29.         PRINT " ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»   "
30.         PRINT " º  DESCRIPTIVE STATISTICS OF UNIVARIATE DATA  º   "
31.         PRINT " ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ   "
32.         PRINT "  number of values (n>3): ";
33.         COLOR 9, 14
34.         INPUT "", n 'input of the number of values
35.     LOOP UNTIL n > 3
36.     REDIM SHARED sample(n)
37.     FOR I = 1 TO n
38.         COLOR 0, 14
39.         PRINT "  value no. " + STR\$(I) + ": ";
40.         COLOR 12, 14
41.         INPUT "", Wert#
42.         sample(I) = Wert# '               fills the data array with values
43.     NEXT I
44.     ' ----- SORT of the values ----------
45.         ic = 0
46.         FOR I = 1 TO n - 1
47.             IF sample(I) > sample(I + 1) THEN
48.                 h = sample(I)
49.                 sample(I) = sample(I + 1)
50.                 sample(I + 1) = h
51.                 ic = 1
52.         NEXT I
53.     LOOP UNTIL ic = 0
54.     ' -----------  calculations and output of the results ------------
55.     COLOR 0, 14
56.     PRINT " =========================== RESULTS =================================="
57.     COLOR 2, 14
58.     PRINT "  n (number of values):          "; n
59.     PRINT "  sum (sum of values):           "; sum#(sample())
60.     PRINT "  standard error:                "; StdDev.s#(sample()) / SQR(n) ' stderr#(sample())
61.     PRINT "  range (xmax - xmin):           "; sample(UBOUND(sample)) - sample(LBOUND(sample))
62.     COLOR 12, 14
63.     PRINT "  mean:                          "; mean#(sample())
64.     PRINT "  geometrical mean:              "; geomean#(sample())
65.     PRINT "  root mean square RMS:          "; rms#(sample())
66.     PRINT "  variance (sample):             "; variance.s#(sample())
67.     PRINT "  std.dev. (sample):             "; StdDev.s#(sample()); " = "; _ROUND((StdDev.s#(sample()) * 100 / mean#(sample())) * 100) / 100; " %"
68.     PRINT "  coeff. of variation:           "; 100 * StdDev.s#(sample()) / mean#(sample())
69.     COLOR 9, 14
70.     PRINT "  variance (population):         "; variance.p#(sample())
71.     PRINT "  std.dev. (population):         "; StdDev.p#(sample()); " = "; _ROUND((StdDev.p#(sample()) * 100 / mean#(sample())) * 100) / 100; " %"
72.     PRINT "  coefficient of variation:      "; 100 * StdDev.p#(sample()) / mean#(sample())
73.     COLOR 6, 14
74.     PRINT "  minimum:                       "; sample(LBOUND(sample))
75.     PRINT "  1st quartile (percentile 25%): "; quantile#(sample(), 0.25)
76.     PRINT "  median (percentile 50%):       "; quantile#(sample(), 0.50)
77.     PRINT "  standard error of the median:  "; variance.p#(sample()) / SQR(n)
78.     PRINT "  3rd quartile (percentile 75%): "; quantile#(sample(), 0.75)
79.     PRINT "  maximum:                       "; sample(UBOUND(sample))
80.     PRINT "  interquartile range:           "; quantile#(sample(), 0.75) - quantile#(sample(), 0.25)
81.     COLOR 9, 14
82.     PRINT "  skewness (sample):             "; _ROUND(skew#(sample()) * 100000) / 100000
83.     PRINT "  kurtosis (sample):             "; _ROUND(kurt#(sample()) * 100000) / 100000
84.     PRINT "  excess kurtosis(sample):       "; _ROUND(kurt#(sample()) * 100000) / 100000 - 3
85.     PRINT "  skewness (population):         "; _ROUND(skew#(sample()) * (n - 2) / SQR(n * (n - 1)) * 100000) / 100000
86.     PRINT "  kurtosis (population):         "; _ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000
87.     PRINT "  excess kurtosis (population):  "; _ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000 - 3
88.     COLOR 0, 14
89.     PRINT " ======================================================================"
90.     DIM CrLf AS STRING * 2
91.     CrLf = CHR\$(13) + CHR\$(10)
92.     _CLIPBOARD\$ = _CLIPBOARD\$ + " ========================================= " + CrLf
93.     _CLIPBOARD\$ = _CLIPBOARD\$ + " DESCRIPTIVE STATISTICS OF UNIVARIATE DATA      " + CrLf
94.     _CLIPBOARD\$ = _CLIPBOARD\$ + " ========================================= " + CrLf
95.     _CLIPBOARD\$ = _CLIPBOARD\$ + " sorted data:" + CrLf
96.     FOR I = 1 TO n
97.         _CLIPBOARD\$ = _CLIPBOARD\$ + "    " + STR\$(sample(I)) + CrLf
98.     NEXT I
99.     _CLIPBOARD\$ = _CLIPBOARD\$ + " ---------------------------------------------------------" + CrLf
100.     _CLIPBOARD\$ = _CLIPBOARD\$ + " n (number of values):                  " + STR\$(n) + CrLf
101.     _CLIPBOARD\$ = _CLIPBOARD\$ + " sum (sum of values):                   " + STR\$(sum#(sample())) + CrLf
102.     _CLIPBOARD\$ = _CLIPBOARD\$ + " standard error:                        " + STR\$(StdDev.s#(sample()) / SQR(n)) + CrLf
103.     _CLIPBOARD\$ = _CLIPBOARD\$ + " range (xmax - xmin):                   " + STR\$(sample(UBOUND(sample)) - sample(LBOUND(sample))) + CrLf
104.     _CLIPBOARD\$ = _CLIPBOARD\$ + " mean:                                  " + STR\$(mean#(sample())) + CrLf
105.     _CLIPBOARD\$ = _CLIPBOARD\$ + " geometrical mean                       " + STR\$(geomean#(sample())) + CrLf
106.     _CLIPBOARD\$ = _CLIPBOARD\$ + " root mean square RMS:                  " + STR\$(rms#(sample())) + CrLf
107.     _CLIPBOARD\$ = _CLIPBOARD\$ + " variance (sample):                     " + STR\$(variance.s#(sample())) + CrLf
108.     _CLIPBOARD\$ = _CLIPBOARD\$ + " standard deviation (sample):           " + STR\$(StdDev.s#(sample())) + CrLf
109.     _CLIPBOARD\$ = _CLIPBOARD\$ + " standard deviation (sample) %:         " + STR\$(_ROUND((StdDev.s#(sample()) * 100 / mean#(sample())) * 100) / 100) + " %" + CrLf
110.     _CLIPBOARD\$ = _CLIPBOARD\$ + " coefficient of variation (sample):     " + STR\$(100 * StdDev.s#(sample()) / mean#(sample())) + CrLf
111.     _CLIPBOARD\$ = _CLIPBOARD\$ + " variance (population):                 " + STR\$(variance.p#(sample())) + CrLf
112.     _CLIPBOARD\$ = _CLIPBOARD\$ + " standard deviation(population):        " + STR\$(StdDev.p#(sample())) + CrLf
113.     _CLIPBOARD\$ = _CLIPBOARD\$ + " standard deviation (population) %:     " + STR\$(_ROUND((StdDev.p#(sample()) * 100 / mean#(sample())) * 100) / 100) + " %" + CrLf
114.     _CLIPBOARD\$ = _CLIPBOARD\$ + " coefficient of variation (population): " + STR\$(100 * StdDev.p#(sample()) / mean#(sample())) + CrLf
115.     _CLIPBOARD\$ = _CLIPBOARD\$ + " minimum:                               " + STR\$(sample(LBOUND(sample))) + CrLf
116.     _CLIPBOARD\$ = _CLIPBOARD\$ + " 1st quartile (25% percentile):         " + STR\$(quantile#(sample(), 0.25)) + CrLf
117.     _CLIPBOARD\$ = _CLIPBOARD\$ + " median: 2nd quartile (50% percentile): " + STR\$(quantile#(sample(), 0.50)) + CrLf
118.     _CLIPBOARD\$ = _CLIPBOARD\$ + " standard error of the median:          " + STR\$(variance.p#(sample()) / SQR(n)) + CrLf
119.     _CLIPBOARD\$ = _CLIPBOARD\$ + " 3rd quartile (75%) :                   " + STR\$(quantile#(sample(), 0.75)) + CrLf
120.     _CLIPBOARD\$ = _CLIPBOARD\$ + " maximum:                               " + STR\$(sample(UBOUND(sample))) + CrLf
121.     _CLIPBOARD\$ = _CLIPBOARD\$ + " interquartile range:                   " + STR\$(quantile#(sample(), 0.75) - quantile#(sample(), 0.25)) + CrLf
122.     _CLIPBOARD\$ = _CLIPBOARD\$ + " skewness (sample):                     " + STR\$(_ROUND(skew#(sample()) * 100000) / 100000) + CrLf
123.     _CLIPBOARD\$ = _CLIPBOARD\$ + " kurtosis (sample):                     " + STR\$(_ROUND(kurt#(sample()) * 100000) / 100000) + CrLf
124.     _CLIPBOARD\$ = _CLIPBOARD\$ + " excess kurtosis (sample):              " + STR\$(_ROUND(kurt#(sample()) * 100000) / 100000 - 3) + CrLf
125.     _CLIPBOARD\$ = _CLIPBOARD\$ + " skewness (population):                 " + STR\$(_ROUND(skew#(sample()) * (n - 2) / SQR(n * (n - 1)) * 100000) / 100000) + CrLf
126.     _CLIPBOARD\$ = _CLIPBOARD\$ + " kurtosis (population):                 " + STR\$(_ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000) + CrLf
127.     _CLIPBOARD\$ = _CLIPBOARD\$ + " excess kurtosis (population):          " + STR\$(_ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000 - 3) + CrLf
128.     _CLIPBOARD\$ = _CLIPBOARD\$ + " ---------------------------------------------------------" + CrLf
129.     PRINT " All results are stored in the clipboard!"
130.     PRINT " Do you want to start a new statistical evaluation  [y/n]? ";
131.     WEITER\$ = INKEY\$
132. LOOP WHILE (WEITER\$ = "y") OR (WEITER\$ = "Y")
133. COLOR 12, 14
134. LOCATE 10, 25: PRINT " E N D   O F   P R O G R A M "
135. LOCATE 12, 25: PRINT "         - - - -"
136. LOCATE 14, 25: PRINT "      Press any key ": PRINT
137. 'FUNCTIONS
138. '============= sum =========="
139. FUNCTION sum# (x())
140.     s# = 0
141.     FOR i = 1 TO n
142.         s# = s# + x(i)
143.     NEXT i
144.     sum# = s#
145. '============= mean =========="
146. FUNCTION mean# (x())
147.     mean# = sum#(x()) / n
148. '========= variance (sample) =========="
149. FUNCTION variance.s# (x())
150.     m# = mean#(x())
151.     s# = 0
152.     FOR i = 1 TO n
153.         s# = s# + (x(i) - mean#(x())) ^ 2
154.     NEXT i
155.     variance.s# = s# / (n - 1)
156. '========= variance population) =========="
157. FUNCTION variance.p# (x())
158.     m# = mean#(x())
159.     s = 0
160.     FOR i = 1 TO n
161.         s# = s# + (x(i) - mean#(x())) ^ 2
162.     NEXT i
163.     variance.p# = s# / n
164. '======= standard deviation (sample) ========"
165. FUNCTION StdDev.s# (x())
166.     StdDev.s# = SQR(variance.s#(x()))
167. '======= standard deviation (population) ========"
168. FUNCTION StdDev.p# (x())
169.     StdDev.p# = SQR(variance.p#(x()))
170. '============== median ====================="
171. FUNCTION median# (x())
172.     IF (n / 2) = INT(n / 2) THEN
173.         'even
174.         median# = (sample(n / 2) + sample((n / 2) + 1)) / 2
175.         'odd
176.         median# = sample((n + 1) / 2)
177. '============================ quantile ========================
178. FUNCTION quantile# (x(), a)
179.     rang# = a * (n - 1) + 1
180.     index% = INT(rang#)
181.     gewicht# = rang# - index%
182.     quantile# = x(index%) + gewicht# * (x(index% + 1) - x(index%))
183. '============================ skewness ========================
184. FUNCTION skew# (x())
185.     m# = mean#(x())
186.     s# = StdDev.s#(x())
187.     sk# = 0
188.     FOR J = 1 TO n
189.         sk# = sk# + ((x(J) - m#) / s#) ^ 3
190.     NEXT J
191.     IF s# <> 0 THEN
192.         skew# = sk# * (n / ((n - 1) * (n - 2)))
193.         skew# = 0
194. '============================ kurtosis ========================
195. FUNCTION kurt# (x())
196.     m# = mean#(x())
197.     s# = StdDev.s#(x())
198.     krt# = 0
199.     FOR j = 1 TO n
200.         krt# = krt# + ((x(j) - m#) / s#) ^ 4
201.     NEXT j
202.     IF s# <> 0 THEN
203.         kurt# = ((krt# * (n + 1) * n) / ((n - 1) * (n - 2) * (n - 3))) - ((3 * (n - 1) ^ 2) / ((n - 2) * (n - 3)))
204.         kurt# = 0
205. '====================== geometrical mean ========================
206. FUNCTION geomean# (x())
207.     gm# = 1
208.     FOR j = 1 TO n
209.         gm# = gm# * x(j)
210.     NEXT j
211.     geomean# = gm# ^ (1 / n)
212.
213. '============ mean square error ===================
214. FUNCTION rms# (x())
215.     ms# = 0
216.     FOR j = 1 TO n
217.         ms# = ms# + x(j) ^ 2
218.     NEXT j
219.     rms# = SQR(ms# / n)
220.

« Last Edit: March 06, 2020, 05:27:21 AM by Qwerkey »