DISCIPLINE - COMPUTER SCIENCE
DISCIPLINE – COMPUTER SCIENCE
TYPE OF STUDY – LECTURES
As part of the first 3h block, the basic concepts of data mining will be introduced to students. Various data mining tasks will be highlighted and characterized. The Cross Industry Standard Process for Data Mining (CRISP) scheme will be presented. Issues related to data preprocessing will be discussed. Finally, methods for assessing the quality of data mining models will be presented.
As part of block three hours on clustering algorithms, students will learn about the concept of the cluster/centroid, distance, density, and similarity. The topics discussed will include: classification of clustering algorithms (hierarchical, partitional, density-based), taking into account detailed analysis of selected algorithms within each group, i.e., AHC, k-means, k-modes, DBSCAN. A discussion of computational complexity will exemplify each algorithm. Moreover, an indication of the advantages and disadvantages of such an algorithm will be presented. Finally, methods of assessing the quality indices of created clusters will be proposed.
The subject matter, within 3h block, will concern the discussion and application of reducts (tests) as well as decision trees and rules in data analysis. During the course, examples of problems that can be represented in the form of a decision table will be presented, as well as the use of decision trees, decision rules and reducts as tools (algorithms) to solve these problems. Issues related to supervised machine learning, i.e. the use of trees, rules and reducts in data classification, will also be presented.
Knowledge of at least one programming language is an indispensable element of knowledge that every IT specialist should have. Current trends in the development of programming languages, apart from improving code efficiency and increasing the possibilities, are also aimed at facilitating the writing of programs. A simplification for programmers is the simplification of programming languages, and the possibility of using numerous freamworks. One of the languages that from the beginning was focused on the ease of writing programs and simplified the structure of the code is Python. During the lectures, students will be introduced to the basics of this language. The lectures will also discuss the advantages of the Python language and its possibilities, both in terms of creating advanced desktop programs and web applications. Additional possibilities of the Python language, such as the possibility of data analysis and presentation, will also be discussed.
In research, the ability to justify the statistical significance of the formulated hypotheses is essential. The lecture aims to familiarize students with the basic concepts of statistical inference and the available software for performing statistical tests. Another goal is to teach students how to choose the appropriate test depending on samples and hypotheses and interpret the obtained results. When selecting the test, the dependency/independence of samples, number of samples, normality of distributions and homogeneity of variances will be considered. The test types considered will include: Z-test, t-test, Wilcoxon test, McNemar test, analysis of variance F test, Friedman test, Cochran test, Welch test, Mann-Whitney test, Kruskal-Wallis test, Chi^2 test.
Modern metaheuristics are mainly used to solve combinatorial optimization problems. Among these techniques, we distinguish evolutionary algorithms, ant colony systems, differential evolution, particle swarm optimization and many others. A good designer of such metaheuristics must construct them with an awareness of the diversification and intensification of the solution space as well as skillfully choosing parameter values that determine their effectiveness
As part of the first 3h block, the basic concepts of data mining will be introduced to students. Various data mining tasks will be highlighted and characterized. The Cross Industry Standard Process for Data Mining (CRISP) scheme will be presented. Issues related to data preprocessing will be discussed. Finally, methods for assessing the quality of data mining models will be presented.
As part of block three hours on clustering algorithms, students will learn about the concept of the cluster/centroid, distance, density, and similarity. The topics discussed will include: classification of clustering algorithms (hierarchical, partitional, density-based), taking into account detailed analysis of selected algorithms within each group, i.e., AHC, k-means, k-modes, DBSCAN. A discussion of computational complexity will exemplify each algorithm. Moreover, an indication of the advantages and disadvantages of such an algorithm will be presented. Finally, methods of assessing the quality indices of created clusters will be proposed.
The subject matter, within 3h block, will concern the discussion and application of reducts (tests) as well as decision trees and rules in data analysis. During the course, examples of problems that can be represented in the form of a decision table will be presented, as well as the use of decision trees, decision rules and reducts as tools (algorithms) to solve these problems. Issues related to supervised machine learning, i.e. the use of trees, rules and reducts in data classification, will also be presented.
Data visualization is one of the basic data presentation techniques. Well prepared, with appropriately selected techniques, visualization allows you to show many dependencies in the data, it is also much more readable for the user than the presentation of data, for example in tabular form. During the lectures, students will learn about modern approaches to visual data presentation. The lectures will present various approaches to data presentation, from static charts (describing data) to dynamic visualizations that help in finding dependencies in the data.
Knowledge of at least one programming language is an indispensable element of knowledge that every IT specialist should have. Current trends in the development of programming languages, apart from improving code efficiency and increasing the possibilities, are also aimed at facilitating the writing of programs. A simplification for programmers is the simplification of programming languages, and the possibility of using numerous freamworks. One of the languages that from the beginning was focused on the ease of writing programs and simplified the structure of the code is Python. During the lectures, students will be introduced to the basics of this language. The lectures will also discuss the advantages of the Python language and its possibilities, both in terms of creating advanced desktop programs and web applications. Additional possibilities of the Python language, such as the possibility of data analysis and presentation, will also be discussed.
In research, the ability to justify the statistical significance of the formulated hypotheses is essential. The lecture aims to familiarize students with the basic concepts of statistical inference and the available software for performing statistical tests. Another goal is to teach students how to choose the appropriate test depending on samples and hypotheses and interpret the obtained results. When selecting the test, the dependency/independence of samples, number of samples, normality of distributions and homogeneity of variances will be considered. The test types considered will include: Z-test, t-test, Wilcoxon test, McNemar test, analysis of variance F test, Friedman test, Cochran test, Welch test, Mann-Whitney test, Kruskal-Wallis test, Chi^2 test.