Statistics of data science
Currently the demand for professionals who have some knowledge on data science is very wide. This demand means that the professional has to be recycled or initiated in those basic knowledge of statistics that are necessary in Data Science. This course aims to meet that need.
The dates of the course are from 7 to 21 March 2022.
schedule: Monday, Wednesday and Thursday from 11.30am to 1.30pm.
The course is aimed at PhD students, university students and professionals interested in getting into Data Science.
Advanced knowledge of statistics is not required. Programming skills in R or other software are not required, but basic knowledge in the use of computers is necessary.
During the course, Google Colab, which does not require prior installation, will be used to introduce R functions that will help to introduce statistical concepts. To follow the course we recommend the use of a laptop or desktop computer (instead of a tablet) with the Chrome browser installed.
The course is delivered synchronously online. It lasts 3 weeks, with theoretical and practical classes lasting 2 hours. All the explanation sessions will be carried out in the Zoom application and these will be recorded and available to the students.
In the sessions, exercises will be proposed for the students to work on their own, given that personal work is fundamental in programming. These proposed exercises, already solved, will be made available to the students after a few days. Students will have the possibility of resolving their doubts in the tutorials that will be arranged by e-mail.
The course is divided into three blocks:
- Block I. Basic concepts of probability and statistics. Most commonly used distributions.
- Block II. Statistical inference: Confidence intervals and hypothesis testing.
- Block III. Linear and logistic regression
The course programme is as follows:
- Session 1: Monday, 7 March. Basic concepts of probability and statistics. Definition of probability. Discrete and continuous random variable. Concept of probability distribution. Descriptive statistics.
- Session 2: Wednesday, 9 March. Presentation of some widely used probability distributions and introduction to the concepts associated with them.
- Session 3: Thursday, 10 March. Statistical Inference. Point estimation and confidence intervals. Hypothesis testing using parametric tests for two populations, both independent and paired.
- Session 4: Monday, 14 March. Review of concepts from the previous session and extension to non-parametric tests for two populations, both independent and paired. Contingency tables.
- Session 5: Wednesday 16 March. Parametric and non-parametric contrast for more than two independent populations. ANOVA and Kruskal-Wallis H-tests.
- Session 6: Thursday, 17 March. Introduction to the concept of correlation and regression. Simple and multiple linear regression.
- Session 7: Monday, 21 March. Logistic regression.
The evaluation will be carried out by means of 3 questionnaires subject test with questions on each of the blocks covered in the sessions. You will have several days to answer each questionnaire and you will be able to ask your doubts about them in the tutoring sessions.
The final grade of the course will be calculated as the arithmetic average of the 3 questionnaires. At the end of the course, an accrediting certificate will be sent to those who have passed the course.
grade: a minimum of attendance (80%) of the sessions will be required to be considered C.
The steps to formalise enrollment in the course are as follows:
1. Complete and send the pre-registration form to form before Thursday, 24 February 2022 at 14.00h.
2. An email will be sent out on Friday, 25th February informing whether or not you have been admitted to the course. It will specify the definitive schedule of the course.
3. In the case of having been admitted, you will be provided with the link to the payment gateway in order to formalise the enrollment. It is necessary that the enrollment is formalised before Wednesday, March 2nd at 14.00h, otherwise, your place will be sent to another candidate.
Course fees:
-
General public: €350
-
Employees of the University of Navarra (75% discount, not cumulative): 87,50 €.
-
University of Navarra students (15% discount, not cumulative): 297,50€.
-
large family (8% discount, not cumulative): 322€.
The delivery of the course will be subject to a minimum number of students and two groups will be made if there are enough students.