Statistical Software --- STAT-F-413 ---
Maarten Jansen and Toufik Zahaf
Objectives
-
Retrieve and analyse your own real data
-
Use at least two different software systems and two different types of analyses
(typically ANOVA and regression, but others are equally welcome: principle
component analysis etc.)
-
Find your data
-
at a company, hospital, banks, insurance company: this option is by far the
best. If you get data, then also try to get to know what sort of business
questions the company/organization is trying to answer: use the data to respond
to the questions.
-
Otherwise (but less preferable) on the internet, e.g.: government data
(such as statbel.gov.be) This option has the drawback that it is
harder to be original and harder to focus on specific business questions.
The data should be original, in the sense that they must not be popular in
scientific papers or textbooks as illustration of a method.
-
Number of births per communality
-
Macro-economical data; per country, european, regional, provinces etc.
-
Socio-economical data
Not allowed:
-
Time series: time dependence of your data is allowed (longitudinal), but time
must not be the dominant explanatory variable
-
Birth weights of babies
Deadlines
Submission of paper with model description, analyses:
Monday May 5, 2025,14h.
Oral discussion Wednesday May 14 2025; 20 minutes:
(STRICT maximum of) 13 minutes presentation + 7 minutes discussion
In the August session, the deadline for submission of the paper is the first day of the session.
The date of the oral defense will be proposed in due time, most probably after
submission of the paper (please do not contact us asking for the oral defense
date before)
Available material
-
Class schedule, time table
-
Introduction and descriptive statistics
Slides introduction
Slides Descriptive Statistics
-
ANOVA and SAS OnDemand for Academics: Enterprise Guide
Follow this link at SAS
to create a profile (user id) at the SAS website and again to register
as student for this course. (Registration as student is by "buying" a license
for 0 euro)
-
The R project for Statistical computing
-
These slides include the following data files:
-
-
This page links to a list
of packages that can be downloaded into your installation of R.
Some of these packages come with documentation (i.e., an online book), which is listed in the link behind the
Contributed documentation
How to install a package on your computer?
-
Download and installation proceed at once from within R using the command
install.packages("mypackage",dependencies = TRUE)
Mind the quotes and mypackage is of course the package name selected
from the list (case sensitive)
-
In Windows R opens by default as a Graphical User Interface (the R-Gui), which allows you to install packages by clicking on the packages-button in
GUI-menu, which starts a dialogue
-
After installation of such an optional package, this package has to be
loaded in every R-session (i.e.,every time you start R). This is done
with
library(mypackage)
(No quotes this time). The package is installed in a directory under the
library directory of the R installation.
-
After loading the package, you need to load the data that come with it. This is
with the command
data(mydataset)
(where mydataset has to be replaced with the correct name of the data
set that you want to use)
A class on R
We will implement in R a routine that performs a simulation study about the
construction of confidence intervals for the mean of a normal random variable
with unknown standard deviation. We will implement this using vectorization
where possible in order to avoid unnecessary loops. We also plot the resulting
confidence intervals, thereby visualizing the issue whether the CI contains
the true parameter value.
A raw version of the code
This page is maintained by Maarten Jansen
(maarten.jansen-AT-ulb.ac.be)
URL: https://maarten.jansen.web.ulb.be/teaching/STAT-F-413/index.html