Author Topic: Descriptive Statistics by Bruno Schaefer  (Read 710 times)

Offline The Librarian

  • Moderator
Descriptive Statistics by Bruno Schaefer
« on: June 28, 2018, 07:41:36 AM »
Descriptive Statistics

Author: Bruno Schaefer (a.k.a BSpinoza), Losheim am See, Germany
Author contact: bup.schaefer (.at.) web.de
Source: Submission
Version: 2018-06-16
Tags: [maths] [statistics]

Description:
This program calculates basic descriptive statistics of univariate data:
         n, Std.error, sum, standard error, mean, geometrical mean, variance,
         standard deviation, coefficient of variation, minimum, 1st quartile, median,
         2rd quartile, maximum,skewness, kurtosis, and excess kurtosis.
A dataset must have at least 4 values.
 
Remarks to kurtosis and skewness:
    For kurtosis and skewness the same equation as SPSS, PAST and Excel is used.
    Slightly different results may occur using other programs, especially for
    small sample sizes.
    kurtosis: peak shape  > 3 (excess > 0) leptokurtic: distribution with tapered peak and fat tails
                                    = 3 (excess = 0) mesokurtic: similar to normal bell-curved distribution
                                    < 3 (excess < 0) platykurtic: flat distribution with thin tails
     skewness: symmetry    > 0 skewed right: its right tail is longer and most of the distribution is at the left.
                                         = 0 symmetrical (not skewed)
                                         < 0 skewed left: the left tail is longer and most of the distribution is at the right


Note that this program includes extended ASCII characters and may not copy/paste correctly. If the interface does not draw correctly, use the attached source listing.

Source code:
Code: [Select]
'PROGRAM: descriptiveStatistics.bas
'================= Descriptive Statistics  ================
'        written by Bruno Schaefer, Losheim am See, Germany
'                                       created: 15.12.2016
'                                   last review: 16.06.2018
'============================================================================================================
' This programm calculates basic descriptive statistics of univariate data:
' n, Std.error, sum, standard error, mean, geometrical mean, variance,
' standard deviation, coefficient of variation, minimum, 1st quartile, median,
' 2rd quartile, maximum,skewness, kurtosis, and excess kurtosis.
' A dataset must have at least 4 values.
' For kurtosis and skewness the same equation as SPSS, PAST and Excel is used.
' Slightly different results may occur using other programs, especially for
' small sample sizes.
' kurtosis: peak shape  > 3 (excess > 0) leptokurtic: distribution with tapered peak and fat tails
'                       = 3 (excess = 0) mesokurtic: similar to normal bell-curved distribution
'                       < 3 (excess < 0) platykurtic: flat distribution with thin tails
' skewness: symmetry    > 0 skewed right: its right tail is longer and most of the distribution is at the left.
'                       = 0 symmetrical (not skewed)
'                       < 0 skewed left: the left tail is longer and most of the distribution is at the right
'===============================================================================================================
_TITLE "descriptive statistics"
SCREEN _NEWIMAGE(680, 520, 256)
WEITER$ = "y" 'loop variable
_CLIPBOARD$ = "" 'clears the clipboard
COMMON SHARED n AS INTEGER
OPTION BASE 1
DO
    _LIMIT 30
    DO
        CLS , 14
        COLOR 0, 14
        PRINT " ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»   "
        PRINT " º  DESCRIPTIVE STATISTICS OF UNIVARIATE DATA  º   "
        PRINT " ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ   "
        PRINT "  number of values (n>3): ";
        COLOR 9, 14
        INPUT "", n 'input of the number of values
    LOOP UNTIL n > 3
    REDIM SHARED sample(n)
    FOR I = 1 TO n
        COLOR 0, 14
        PRINT "  value no. " + STR$(I) + ": ";
        COLOR 12, 14
        INPUT "", Wert#
        sample(I) = Wert# '               fills the data array with values
    NEXT I
    ' ----- SORT of the values ----------
    DO
        ic = 0
        FOR I = 1 TO n - 1
            IF sample(I) > sample(I + 1) THEN
                h = sample(I)
                sample(I) = sample(I + 1)
                sample(I + 1) = h
                ic = 1
            END IF
        NEXT I
    LOOP UNTIL ic = 0
    ' -----------  calculations and output of the results ------------
    CLS
    COLOR 0, 14
    PRINT
    PRINT " =========================== RESULTS =================================="
    COLOR 2, 14
    PRINT "  n (number of values):          "; n
    PRINT "  sum (sum of values):           "; sum#(sample())
    PRINT "  standard error:                "; StdDev.s#(sample()) / SQR(n) ' stderr#(sample())
    PRINT "  range (xmax - xmin):           "; sample(UBOUND(sample)) - sample(LBOUND(sample))
    COLOR 12, 14
    PRINT "  mean:                          "; mean#(sample())
    PRINT "  geometrical mean:              "; geomean#(sample())
    PRINT "  root mean square RMS:          "; rms#(sample())
    PRINT "  variance (sample):             "; variance.s#(sample())
    PRINT "  std.dev. (sample):             "; StdDev.s#(sample()); " = "; _ROUND((StdDev.s#(sample()) * 100 / mean#(sample())) * 100) / 100; " %"
    PRINT "  coeff. of variation:           "; 100 * StdDev.s#(sample()) / mean#(sample())
    COLOR 9, 14
    PRINT "  variance (population):         "; variance.p#(sample())
    PRINT "  std.dev. (population):         "; StdDev.p#(sample()); " = "; _ROUND((StdDev.p#(sample()) * 100 / mean#(sample())) * 100) / 100; " %"
    PRINT "  coefficient of variation:      "; 100 * StdDev.p#(sample()) / mean#(sample())
    COLOR 6, 14
    PRINT "  minimum:                       "; sample(LBOUND(sample))
    PRINT "  1st quartile (percentile 25%): "; quantile#(sample(), 0.25)
    PRINT "  median (percentile 50%):       "; quantile#(sample(), 0.50)
    PRINT "  standard error of the median:  "; variance.p#(sample()) / SQR(n)
    PRINT "  3rd quartile (percentile 75%): "; quantile#(sample(), 0.75)
    PRINT "  maximum:                       "; sample(UBOUND(sample))
    PRINT "  interquartile range:           "; quantile#(sample(), 0.75) - quantile#(sample(), 0.25)
    COLOR 9, 14
    PRINT "  skewness (sample):             "; _ROUND(skew#(sample()) * 100000) / 100000
    PRINT "  kurtosis (sample):             "; _ROUND(kurt#(sample()) * 100000) / 100000
    PRINT "  excess kurtosis(sample):       "; _ROUND(kurt#(sample()) * 100000) / 100000 - 3
    PRINT "  skewness (population):         "; _ROUND(skew#(sample()) * (n - 2) / SQR(n * (n - 1)) * 100000) / 100000
    PRINT "  kurtosis (population):         "; _ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000
    PRINT "  excess kurtosis (population):  "; _ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000 - 3
    COLOR 0, 14
    PRINT " ======================================================================"
    DIM CrLf AS STRING * 2
    CrLf = CHR$(13) + CHR$(10)
    _CLIPBOARD$ = _CLIPBOARD$ + " ========================================= " + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " DESCRIPTIVE STATISTICS OF UNIVARIATE DATA      " + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " ========================================= " + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " sorted data:" + CrLf
    FOR I = 1 TO n
        _CLIPBOARD$ = _CLIPBOARD$ + "    " + STR$(sample(I)) + CrLf
    NEXT I
    _CLIPBOARD$ = _CLIPBOARD$ + " ---------------------------------------------------------" + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " n (number of values):                  " + STR$(n) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " sum (sum of values):                   " + STR$(sum#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " standard error:                        " + STR$(StdDev.s#(sample()) / SQR(n)) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " range (xmax - xmin):                   " + STR$(sample(UBOUND(sample)) - sample(LBOUND(sample))) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " mean:                                  " + STR$(mean#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " geometrical mean                       " + STR$(geomean#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " root mean square RMS:                  " + STR$(rms#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " variance (sample):                     " + STR$(variance.s#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation (sample):           " + STR$(StdDev.s#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation (sample) %:         " + STR$(_ROUND((StdDev.s#(sample()) * 100 / mean#(sample())) * 100) / 100) + " %" + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " coefficient of variation (sample):     " + STR$(100 * StdDev.s#(sample()) / mean#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " variance (population):                 " + STR$(variance.p#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation(population):        " + STR$(StdDev.p#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation (population) %:     " + STR$(_ROUND((StdDev.p#(sample()) * 100 / mean#(sample())) * 100) / 100) + " %" + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " coefficient of variation (population): " + STR$(100 * StdDev.p#(sample()) / mean#(sample())) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " minimum:                               " + STR$(sample(LBOUND(sample))) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " 1st quartile (25% percentile):         " + STR$(quantile#(sample(), 0.25)) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " median: 2nd quartile (50% percentile): " + STR$(quantile#(sample(), 0.50)) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " standard error of the median:          " + STR$(variance.p#(sample()) / SQR(n)) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " 3rd quartile (75%) :                   " + STR$(quantile#(sample(), 0.75)) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " maximum:                               " + STR$(sample(UBOUND(sample))) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " interquartile range:                   " + STR$(quantile#(sample(), 0.75) - quantile#(sample(), 0.25)) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " skewness (sample):                     " + STR$(_ROUND(skew#(sample()) * 100000) / 100000) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " kurtosis (sample):                     " + STR$(_ROUND(kurt#(sample()) * 100000) / 100000) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " excess kurtosis (sample):              " + STR$(_ROUND(kurt#(sample()) * 100000) / 100000 - 3) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " skewness (population):                 " + STR$(_ROUND(skew#(sample()) * (n - 2) / SQR(n * (n - 1)) * 100000) / 100000) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " kurtosis (population):                 " + STR$(_ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " excess kurtosis (population):          " + STR$(_ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000 - 3) + CrLf
    _CLIPBOARD$ = _CLIPBOARD$ + " ---------------------------------------------------------" + CrLf
    PRINT
    PRINT " All results are stored in the clipboard!"
    PRINT " Do you want to start a new statistical evaluation  [y/n]? ";
    SLEEP
    WEITER$ = INKEY$
LOOP WHILE (WEITER$ = "y") OR (WEITER$ = "Y")
COLOR 12, 14
CLS
LOCATE 10, 25: PRINT " E N D   O F   P R O G R A M "
LOCATE 12, 25: PRINT "         - - - -"
LOCATE 14, 25: PRINT "      Press any key ": PRINT
SLEEP
SYSTEM
END
'FUNCTIONS
'============= sum =========="
FUNCTION sum# (x())
    s# = 0
    FOR i = 1 TO n
        s# = s# + x(i)
    NEXT i
    sum# = s#
END FUNCTION
'============= mean =========="
FUNCTION mean# (x())
    mean# = sum#(x()) / n
END FUNCTION
'========= variance (sample) =========="
FUNCTION variance.s# (x())
    m# = mean#(x())
    s# = 0
    FOR i = 1 TO n
        s# = s# + (x(i) - mean#(x())) ^ 2
    NEXT i
    variance.s# = s# / (n - 1)
END FUNCTION
'========= variance population) =========="
FUNCTION variance.p# (x())
    m# = mean#(x())
    s = 0
    FOR i = 1 TO n
        s# = s# + (x(i) - mean#(x())) ^ 2
    NEXT i
    variance.p# = s# / n
END FUNCTION
'======= standard deviation (sample) ========"
FUNCTION StdDev.s# (x())
    StdDev.s# = SQR(variance.s#(x()))
END FUNCTION
'======= standard deviation (population) ========"
FUNCTION StdDev.p# (x())
    StdDev.p# = SQR(variance.p#(x()))
END FUNCTION
'============== median ====================="
FUNCTION median# (x())
    IF (n / 2) = INT(n / 2) THEN
        'even
        median# = (sample(n / 2) + sample((n / 2) + 1)) / 2
    ELSE
        'odd
        median# = sample((n + 1) / 2)
    END IF
END FUNCTION
'============================ quantile ========================
FUNCTION quantile# (x(), a)
    rang# = a * (n - 1) + 1
    index% = INT(rang#)
    gewicht# = rang# - index%
    quantile# = x(index%) + gewicht# * (x(index% + 1) - x(index%))
END FUNCTION
'============================ skewness ========================
FUNCTION skew# (x())
    m# = mean#(x())
    s# = StdDev.s#(x())
    sk# = 0
    FOR J = 1 TO n
        sk# = sk# + ((x(J) - m#) / s#) ^ 3
    NEXT J
    IF s# <> 0 THEN
        skew# = sk# * (n / ((n - 1) * (n - 2)))
    ELSE
        skew# = 0
    END IF
END FUNCTION
'============================ kurtosis ========================
FUNCTION kurt# (x())
    m# = mean#(x())
    s# = StdDev.s#(x())
    krt# = 0
    FOR j = 1 TO n
        krt# = krt# + ((x(j) - m#) / s#) ^ 4
    NEXT j
    IF s# <> 0 THEN
        kurt# = ((krt# * (n + 1) * n) / ((n - 1) * (n - 2) * (n - 3))) - ((3 * (n - 1) ^ 2) / ((n - 2) * (n - 3)))
    ELSE
        kurt# = 0
    END IF
END FUNCTION
'====================== geometrical mean ========================
FUNCTION geomean# (x())
    gm# = 1
    FOR j = 1 TO n
        gm# = gm# * x(j)
    NEXT j
    geomean# = gm# ^ (1 / n)
END FUNCTION

'============ mean square error ===================
FUNCTION rms# (x())
    ms# = 0
    FOR j = 1 TO n
        ms# = ms# + x(j) ^ 2
    NEXT j
    rms# = SQR(ms# / n)
END FUNCTION

« Last Edit: August 09, 2018, 12:15:28 AM by odin »