SIA’s asymmetric rules approximation to hierarchical clustering in Learning Analytics: mathematical issues Rubén Pazmiño1, Francisco Garcia2, Miguel Conde3 1 School of Mathematics & Data Science Research Group, Escuela Superior Politécnica de Chimborazo. Ecuador. E-mail: rpazmino@espoch.edu.ec 2 Department of Computer Science, University of Salamanca, Spain. E-mail: fgarcia@usal.es 3 Department of Mechanical, Computer Science and Aerospace Engineering, University of León, Spain. E-mail: miguel.conde@unileon.es Introduction We use the definition set out in the first international Conference on learning analytics and Knowledge and assumed by the Society for Learning Analytics Research: “Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs.”1 Bichsel, proposes an analytics maturity model used to evaluate the progress in the use of academic and learning analytics. In the progress, there are positive results, but most institutions are below 80% level. Most institutions also scored low for data analytics tools, reporting, and expertise [2]. In addition, a task with the methods of Data Mining and Learning Analytics is analyze them (precision, accuracy, sensitivity, coherence, fitness measures, cosine, confidence, lift, similarity weights) to optimize and adapt them [9]. Learning Analytics (LA) was and continues to be an emerging technology [7]. The time necessary to implement Horizon is one year or less, but how many institutions, teachers, learners and data analytics tools are ready?. The principal aim of this paper is to give mathematical issues of SIA's asymmetric rules for formal approximate to hierarchical clustering in LA. Learning Analytics (LA) and clustering Cluster in Learning Analytics is and remains as an emerging method, as shown in the following scientific articles: Papamitsiu [11] to examine the literature on experimental case studies conducted in the domain Learning Analytics and Educational Data Mining, from 2008 to 2013 and to find that in Learning Analytics 60% of literature using classification or cluster, and 40% regression, text mining, association rule mining, social network analysis, discovery with models, visualization or statistics. A recent study [6] show that the current methods used in Learning Analytics are decision tree, clustering, association rules, time sequence analysis and visualization techniques and [6] show that Non- hierarchical algorithm are 73% (K-means, C-means, Fuzzy K-means, K-prototypes ,Fuzzy Clustering) and hierarchical type algorithm are 27% (Agglomerative Clustering, Markov Clustering, Discrete Markov Model). The novelty of the approach is the possibility to use additional options of SIA’s asymmetric rules in LA’s clustering. Statistical Implicative Analysis (SIA) and asymmetric rules. Statistical implicative analysis is a non-symmetric method of analyzing data crossing subjects or objects with variables of any type: Boolean, numerical, modal, vectorial, sequential, interval, fuzzy 1https://tekri.athabascau.ca/analytics/ and rank2. Statistical Implicative Analysis [8] was created for Regis Gras [7], 48 years ago SIA is a statistical theory which provides a group of data analytics tools to extract knowledge. The approach is performed starting from the generation of asymmetric rules [5] similar to dendrograms used in the hierarchical clusters [14]. But can asymmetric rules be used like a hierarchical cluster? An intuitive approximation between asymmetric rules and hierarchical clusters was given in [13], this is a visual perception between simple white and black images, one of the conclusions is that the 69.14% the participants in the experiment agrees or strongly agrees with the kind of grouping presented by the hierarchy trees and asymmetric rules in Statistical Implicative Analysis. In Elia paper [4] is performed a comparative example between hierarchical clustering of variables, implicative statistical analysis and confirmatory factor, the concept of function is addressed by the teaching, analyzing the level of understanding that students present in this type of abstract definitions. The outcomes of the three methods were found to coincide and to complement each other. Anastasiadou in order to study the appropriate approach that a teacher should use when teaching the theory concerning probability distributions, compares two statistical tools principal components analysis and asymmetric rules, components analysis. In the conclusions she writes Hierarchical Clustering of Variables and Implicative Statistical show stable and similar results but each one has its advantages and different prospective [1]. [10] compares the implicit methods, hierarchical clustering, and confirmatory factor analysis in the study of the learning of the geometric figure by 6th graders. The paper concludes that the outcomes of the three methods were found to coincide. Some new possibilities to complement the asymmetric rules are shown in [12], we can use supplementary variables to know what are the subjects, or classes of subjects are more responsible for computed implications, contribution indicates which subjects are more representative of implication and typicality indicates the typical subjects. All previous research shows an approximation between the asymmetric rules of SIA and other hierarchical methods, but they are not formal approximations. In this paper, we want to identify the formal way to demonstrate that symmetric rules can be considered a hierarchical cluster method. We also make contributions about which formal demonstrations to perform and some alternatives. Math issues[3] 1) Let V be a finite not empty set of binary variables, prove that (V, α) is an indexed hierarchy, where α= c(a, b )=[1-(-p log2p-(1-p)log2(1-p))2]1/2 if p>0.5, otherwise =0, c(a, b) is the cohesion of a R- rule a→b of degree 1. 2) For all x the binary relation Rx on V, iRxj if i,jϵC, being α(C) ≤x, is an equivalence relation 3) Let V be a finite not empty set of binary variables, prove that exist µ, such that (V, µ) is a ultra- metric space. If 1), 2) and 3) are true, then we can represent (V, µ) by a dendrogram with V-ends Acknowledgements: University of Salamanca and PhD Programme on Education in the Knowledge Society. 2 https://fr.wikipedia.org/wiki/Analyse_statistique_implicative References [1] ANASTASIADOU, S., 2010. Pre-service teachers’ performance on the learn-ing of probability distributions and the role of projects: A multilevel statistical analysis ASI5, 21. [2] BICHSEL, J., 2012. Analytics in higher education: Benefits, barriers, progress, and recommendations. EDUCAUSE Center for Applied Research. [3] CUADRAS, C.M., 2007. Nuevos métodos de análisis multivariante. CMC Editions. [4] ELIA, I. and GAGATSIS, A., 2008. A comparison between the hierarchical clustering of variables, implicative statistical analysis and confirmatory factor analysis. In Statistical Implicative Analysis Springer, 131-162. [5] GRAS, R., COUTURIER, R., GUILLET, F., and SPAGNOLO, F., 2005. Extraction de règles en incertain par la méthode statistique implicative. Comptes rendus des 12èmes Rencontres de la Société Francophone de Classification , 148-151. [6] GWO-JEN HWANG, H.-C.C.C.Y., 2017. Objectives, methodologies and research issues of learning analytics. INTERACTIVE LEARNING ENVIRONMENTS, 2017 25 , 2, 143–146. DOI= http://dx.doi.org/10.1080/10494820.2017.1287338. [7] KITCHENHAM, B., PRETORIUS, R., BUDGEN, D., BRERETON, O.P., TURNER, M., NIAZI, M., and LINKMAN, S., 2010. Systematic literature reviews in software engineering–a tertiary study. Information and Software Technology 52 , 8, 792-805. [8] KOTSIANTIS, S. and KANELLOPOULOS, D., 2006. Association rules mining: A recent overview. GESTS International Transactions on Computer Science and Engineering 32, 1, 71-82. [9] LI, K.C., LAM, H.K., and LAM, S.S., 2015. A Review of Learning Analytics in Educational Research. In International Conference on Technology in Education Springer, 173-184. [10] MICHAEL, P., ELIA, I., GAGATSIS, A., and KALOGIROU, P., 2010. Examining primary school students’ operative apprehension of geometrical figures thr ough a comparison between the hierarchical clustering of variables, implicative statistical analysis and confirmatory factor analysis ASI5, 19. [11] PAPAMITSIOU, Z.K. and ECONOMIDES, A.A., 2014. Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence. Educational Technology & Society 17 , 4, 49-64. [12] PAZMIÑO-MAJI, R.A., GARCÍA-PEÑALVO, F.J., and CONDE-GONZÁLEZ, M.A., 2016. Approximation of statistical implicative analysis to learning analytics : a systematic review. In Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality ACM, 355-376. [13] PAZMIÑO-MAJI, R.A., GARCÍA-PEÑALVO, F.J., and CONDE-GONZÁLEZ, M.A., 2017. Is it possible to apply Statistical Implicative Analysis in hierarchical cluster Analysis? First issues and answers. In Congreso Internacional de Ciencia y Tecnología., P. GIADE Ed., ESPOCH, Riobamba, Ecuador, 63-66. [14] RITSCHARD, G., 2005. De l’usage de la statistique implicative dans les arbres de classification. Troisieme Rencontre Internationale-Analyse Statistique Implicative , 305-316.