i Automatic generation of software interfaces for supporting decision- making processes. An application of domain engineering & machine learning Ph.D. Thesis Ph.D. Programme of Computer Science Department of Computer Science and Automation University of Salamanca (https://ror.org/02f40zc51) Salamanca, Spain Candidate Andrea Vázquez Ingelmo Supervisors Francisco José García-Peñalvo, Ph.D. Roberto Therón Sánchez, Ph.D. Francisco José García-Peñalvo, Ph.D., Full Professor in the Department of Computer Science and Automation at the University of Salamanca (Spain) and Roberto Therón Sánchez, Ph.D., Full Professor in the same department and university, as supervisors of the Ph.D. thesis entitled “Automatic generation of software interfaces for supporting decision-making processes. An application of domain engineering and machine learning” developed by Andrea Vázquez Ingelmo Hereby declare that This Ph.D. thesis, developed in the context of the Ph.D. Programme of Computer Science of the Department of Computer Science and Automation at the University of Salamanca (https://ror.org/02f40zc51), presents enough merits (theoretical and practical) evaluated through the proper assessment, publications, and original proposals to be presented and defended publicly. Salamanca, Spain, June 2022. Francisco José García-Peñalvo, Ph.D. University of Salamanca Roberto Therón Sánchez, Ph.D. University of Salamanca Automatic generation of software interfaces for supporting decision- making processes. An application of domain engineering & machine learning Ph.D. Thesis Ph.D. Programme of Computer Science Department of Computer Science and Automation University of Salamanca (https://ror.org/02f40zc51) Salamanca, Spain Supervisors Francisco José García-Peñalvo, Ph.D. Roberto Therón Sánchez, Ph.D. Candidate Andrea Vázquez Ingelmo June 2022 Acknowledgments First, I want to thank my thesis supervisors, from who I learned —and continue learning— the most. This would not have been possible without the support and guidance, both at professional and personal levels, of my brilliant colleagues (and friends) at the GRIAL Research Group, where I basically found my passion, and I cannot be thankful enough for it. To my amazing family —grandma, aunts, uncles, cousins… and pets—, especially my parents, who always give me all the support —and meals— anyone could wish for. And to my awesome friends; to Bea, an unparalleled source of good times; the same applies to Marta and our never-ending talks —specially about our addiction to board games and restaurants— and, of course, to our Blasco Holidays with Marina, undoubtedly one of my favorite times of the year. To Isa and our random chats about literally anything, and to Irene, and our therapeutical voice messages. To Misol and the quality time we spent whenever we are together, particularly at the faculty’s cafeteria — special mention to Vicente—. To Dena and the countless times she has made me laugh just by laughing herself. To the outstanding plans with Clara, Raquel, and Carmen, and to the cooking days with Laura and Jano —where they cook, and I watch—. In short, to all the people who have been, explicitly or implicitly, part of my life during this period. I also want to mention a person that has been following my work before I even started the Ph.D. degree —probably because I told her about it during an interminable thread of direct messages on Twitter—. It seems destiny wanted to join us again before I could end this thesis for me to mention you in these acknowledgments, MJ ♡. Each of you has made this journey incredibly easy. Finally, to all those people who indirectly helped me to clear my mind in the meantime: Avril, Nicholas, Angelina, Lydia, Jennifer, Priyanka, Claire, Victoria, Sophie, Deena, Danielle, Nicole, Milagros, Kristen, Regina, Drew, Samantha, David, Lalisa, Cameron, Joe, Alba, Kevin, Blair, Vincent, Charlotte, Paul, Robert, Robyn, Michael, Chelo, María, Ronald, Lucy, and a long etc. Abstract Data analysis is a key process to foster knowledge generation in particular domains or fields of study. With a strong informative foundation derived from the analysis of collected data, decision-makers can make strategic choices with the aim of obtaining valuable benefits in their specific areas of action. However, given the steady growth of data volumes, data analysis needs to rely on powerful tools to enable knowledge extraction. Information dashboards offer a software solution to analyze large volumes of data visually to identify patterns and relations and make decisions according to the presented information. But decision-makers may have different goals and, consequently, different necessities regarding their dashboards. Moreover, the variety of data sources, structures, and domains can hamper the design and implementation of these tools. This Ph.D. Thesis tackles the challenge of improving the development process of information dashboards and data visualizations while enhancing their quality and features in terms of personalization, usability, and flexibility, among others. Several research activities have been carried out to support this thesis. First, a systematic literature mapping and review was performed to analyze different methodologies and solutions related to the automatic generation of tailored information dashboards. The outcomes of the review led to the selection of a model- driven approach in combination with the software product line paradigm to deal with the automatic generation of information dashboards. In this context, a meta-model was developed following a domain engineering approach. This meta-model represents the skeleton of information dashboards and data visualizations through the abstraction of their components and features and has been the backbone of the subsequent generative pipeline of these tools. The meta-model and generative pipeline have been tested through their integration in different scenarios, both theoretical and practical. Regarding the theoretical dimension of the research, the meta-model has been successfully integrated with other meta-model to support knowledge generation in learning ecosystems, and as a framework to conceptualize and instantiate information dashboards in different domains. In terms of the practical applications, the focus has been put on how to transform the meta-model into an instance adapted to a specific context, and how to finally transform this later model into code, i.e., the final, functional product. These practical scenarios involved the automatic generation of dashboards in the context of a Ph.D. Programme, the application of Artificial Intelligence algorithms in the process, and the development of a graphical instantiation platform that combines the meta-model and the generative pipeline into a visual generation system. Finally, different case studies have been conducted in the employment and employability, health, and education domains. The number of applications of the meta-model in theoretical and practical dimensions and domains is also a result itself. Every outcome associated to this thesis is driven by the dashboard meta-model, which also proves its versatility and flexibility when it comes to conceptualize, generate, and capture knowledge related to dashboards and data visualizations. Keywords: Data Visualization, Information Visualization, Information Dashboards, Model-Driven Development, Model-Driven Architecture, Software Product Lines, Meta-Modeling, Knowledge Discovery, Graphical User Interfaces, Human-Computer Interaction. i Table of Contents 1 Introduction ............................................................................................................................... 1 1.1 Motivation of this research .................................................................................................... 2 1.2 Context of this research .......................................................................................................... 4 1.3 Hypotheses and goals ............................................................................................................. 9 1.4 Methodology .......................................................................................................................... 20 1.4.1 Action-research methodology ......................................................................................................... 21 1.4.2 Systematic Literature Review .......................................................................................................... 22 1.4.3 Meta-modeling and Software Product Lines ................................................................................ 23 1.5 Document structure ............................................................................................................... 25 2 State-of-the-art of information dashboards and tailoring capabilities ................... 27 2.1 Methodology .......................................................................................................................... 28 2.1.1 Review and planning process ............................................................................................................... 28 2.1.2 Data extraction process .......................................................................................................................... 34 2.2 Results of the systematic literature mapping ................................................................... 37 2.3 Results of the SLR ................................................................................................................. 45 2.4 Conclusions ............................................................................................................................ 54 3 Dashboard meta-model .................................................................................................... 57 3.1 Dashboard meta-model ........................................................................................................ 58 3.1.1 Basic layout ........................................................................................................................................ 61 3.1.2 Including the components’ specification ....................................................................................... 62 3.1.3 Including user characteristics .......................................................................................................... 66 3.1.4 Detailing goals and tasks ................................................................................................................. 68 3.1.5 Including interaction ........................................................................................................................ 70 3.1.6 Including data domain characteristics ........................................................................................... 71 3.1.7 Final version of the meta-model ..................................................................................................... 74 3.2 Generative pipeline ............................................................................................................... 75 3.2.1 Methodology ..................................................................................................................................... 76 3.2.2 Approach ........................................................................................................................................... 77 3.3 Conclusions ............................................................................................................................ 79 ii 4 Results ................................................................................................................................. 83 4.1 Meta-model validation ......................................................................................................... 84 4.2 Theoretical applications ....................................................................................................... 87 4.2.1. Integration with other meta-models ................................................................................................... 87 4.2.1. Meta-model as a conceptual framework ............................................................................................ 92 4.3 Practical applications .......................................................................................................... 102 4.3.1. M1-M0 transformations: Automatic generation of code ................................................................ 104 4.3.2. M2-M1 transformations: Combining meta-modeling with AI ...................................................... 109 4.3.3. MetaViz: a graphical instantiation application ............................................................................... 113 4.4 Integrations in real-world scenarios ................................................................................ 118 4.4.1. Employment and employability........................................................................................................ 119 4.4.2. Health .................................................................................................................................................... 124 4.4.3. Education .............................................................................................................................................. 134 4.5 Conclusions .......................................................................................................................... 139 5 Discussion ......................................................................................................................... 141 6 Conclusions ....................................................................................................................... 147 6.1 Future work ........................................................................................................................... 150 6.2 Ph.D. thesis’ outcomes ........................................................................................................ 151 References .................................................................................................................................. 167 7 Appendixes ........................................................................................................................ 183 7.1 Appendix A. Scaffolding the OEEU’s data-driven ecosystem to analyze the employability of Spanish graduates ............................................................................................. 185 7.2 Appendix B. Improving the OEEU’s data-driven technological ecosystem’s interoperability with GraphQL ..................................................................................................... 207 7.3 Appendix C. How different versions of layout and complexity of web forms affect users after they start it? A pilot experience .................................................................................. 217 7.4 Appendix D. Enabling adaptability in web forms based on user characteristics detection through A/B testing and machine learning ................................................................ 229 7.5 Appendix E. Domain engineering for generating dashboards to analyze employment and employability in the academic context ......................................................... 247 7.6 Appendix F. Addressing fine-grained variability in user-centered software product lines: a case study on dashboards .................................................................................................. 255 7.7 Appendix G. Tailored information dashboards: A systematic mapping of the literature ............................................................................................................................................. 267 iii 7.8 Appendix H. Information dashboards and tailoring capabilities – A systematic literature review ................................................................................................................................ 277 7.9 Appendix I. Automatic generation of software interfaces for supporting decision- making processes. An application of domain engineering and machine learning ............. 295 7.10 Appendix J. Capturing high-level requirements of information dashboards components through meta-modeling ............................................................................................ 303 7.11 Appendix K. Extending a dashboard meta-model to account for users’ characteristics and goals for enhancing personalization .......................................................... 313 7.12 Appendix L. Taking advantage of the software product line paradigm to generate customized user interfaces for decision-making processes: a case study on university employability .................................................................................................................................... 323 7.13 Appendix M. Dashboard meta-model for knowledge management in technological ecosystem: a case study in healthcare ........................................................................................... 355 7.14 Appendix N. Connecting domain-specific features to source code: towards the automatization of dashboard generation ..................................................................................... 371 7.15 Appendix O. Representing Data Visualization Goals and Tasks Through Meta- Modeling to Tailor Information Dashboards ............................................................................. 387 7.16 Appendix P. A Meta-Model Integration for Supporting Knowledge Discovery in Specific Domains: A Case Study in Healthcare .......................................................................... 407 7.17 Appendix Q. Aggregation Bias: A Proposal to Raise Awareness Regarding Inclusion in Visual Analytics ......................................................................................................... 429 7.18 Appendix R. A meta-model to develop learning ecosystems with support for knowledge discovery and decision-making processes.............................................................. 441 7.19 Appendix S. Generating Dashboards Using Fine-Grained Components: A Case Study for a PhD Programme ........................................................................................................... 449 7.20 Appendix T. Specifying information dashboards’ interactive features through meta-model instantiation ................................................................................................................ 463 7.21 Appendix U. A Dashboard to Support Decision-Making Processes in Learning Ecosystems: A Metamodel Integration ......................................................................................... 479 7.22 Appendix V. Beneficios de la aplicación del paradigma de líneas de productos software para generar dashboards en contextos educativos ..................................................... 489 7.23 Appendix W. Towards a Technological Ecosystem to Provide Information Dashboards as a Service: A Dynamic Proposal for Supplying Dashboards Adapted to Specific Scenarios ............................................................................................................................. 509 7.24 Appendix X. Following up the progress of doctoral students and advisors' workload through data visualizations: A case study in a PhD programme............................................. 525 7.25 Appendix Y. Proof‐of‐concept of an information visualization classification approach based on their fine‐grained features ........................................................................... 541 iv 7.26 Appendix Z. A Meta-Modeling Approach To Take Into Account Data Domain Characteristics and Relationships In Information Visualizations .......................................... 559 7.27 Appendix AA. A platform for management and visualization of medical data and medical imaging ................................................................................................................................ 573 7.28 Appendix AB. A platform to support the visual analysis of the SALMANTICOR study outcomes: conveying cardiological data to lay users ...................................................... 581 7.29 Appendix AC. Application of Artificial Intelligence Algorithms Within the Medical Context for Non-Specialized Users: the CARTIER-IA Platform ............................................. 595 7.30 Appendix AD. User-Centered Design Approach for a Machine Learning Platform for Medical Purposes ....................................................................................................................... 605 7.31 Appendix AE. Bringing machine learning closer to non-experts: proposal of a user- friendly machine learning tool in the healthcare domain ........................................................ 621 7.32 Appendix AF. Fostering Decision-Making Processes in Health Ecosystems through Visual Analytics and Machine Learning ...................................................................................... 635 7.33 Appendix AG. Content-validation questionnaire of a meta-model to ease the learning of data visualization concepts ........................................................................................ 651 7.34 Appendix AH. A proposal to measure the understanding of data visualization elements in visual analytics applications .................................................................................... 661 7.35 Appendix AI. KoopaML: A graphical platform for building machine learning pipelines adapted to health professionals ................................................................................... 673 7.36 Appendix AJ. MetaViz - A graphical meta-model instantiator for generating information dashboards and visualizations ................................................................................ 685 7.37 Appendix AK. Resumen extendido: Generación automática de interfaces software para el soporte a la toma de decisiones. Aplicación de ingeniería de dominio y machine learning ............................................................................................................................................... 701 v Table of Figures Figure 1. A spectrum of data visualization tools. Source: [15]. ........................................................... 3 Figure 2. Conceptual map of the objectives and associated research. Source: own elaboration. . 11 Figure 3. Levels of the MDA framework. Source: own elaboration. ................................................ 24 Figure 4. Phases and results of the first review process carried out on January 22, 2019, using the PRISMA 2009 flow diagram. Source: own elaboration....................................................................... 36 Figure 5. Phases and results of the updated review process carried out on May 18, 2022, using the PRISMA 2020 flow diagram. Source: own elaboration. ............................................................... 37 Figure 6. Distribution of papers per year. Source: own elaboration. ............................................... 38 Figure 7. Classification of the retrieved solutions in terms of their tailoring method. Source: [48], own elaboration. ....................................................................................................................................... 46 Figure 8. Correspondence of the MDA framework levels with the followed approach in the dashboards and data visualizations domain. Source: own elaboration. .......................................... 59 Figure 9. Iterations in the development of the dashboard meta-model. Source: own elaboration. .................................................................................................................................................................... 60 Figure 10. Dashboard meta-model (Initial version). Source: own elaboration, published in [70]. .................................................................................................................................................................... 61 Figure 11. Dashboards' components specification (increment #1). Source: own elaboration, published in [50]. ...................................................................................................................................... 62 Figure 12. Identification of commonalities in data visualizations following a domain engineering approach. Source: own elaboration. ................................................................................ 66 Figure 13. Users’ specification (increment #2). Source: own elaboration, published in [51]. ....... 67 Figure 14. Goals and tasks characterization (increment #3). Source: own elaboration, published in [52]. ........................................................................................................................................................ 69 Figure 15. Interaction patterns specification (increment #4). Source: own elaboration, published in [53]. ........................................................................................................................................................ 70 Figure 16. Example of the effect of different scales when visualizing data. Both figures represent the same set of data points. Source: own elaboration, based on the example from [150], published in [54]. ...................................................................................................................................... 72 Figure 17. Specification of context and domain characteristics in the meta-model (increment #5). Source: own elaboration, published in [54]. ......................................................................................... 74 Figure 18. Final version of the dashboard meta-model [154]. Source: own elaboration, published in https://zenodo.org/record/7037624. .............................................................................................. 75 vi Figure 19. Using macros to label and materialize fine-grained features of information dashboards into code. Source: own elaboration. ................................................................................. 78 Figure 20. Excerpt of a code template rendering process. Source: own elaboration. ..................... 79 Figure 21. Meta-models organized in the four-layer metamodel architecture. Source: own elaboration, published in [59]. ................................................................................................................ 88 Figure 22. Connection between both meta-models. Source: own elaboration, published in [59]. 89 Figure 23. Methodology employed to integrate the learning ecosystem meta-model and the dashboard meta-model organized in the four-layer meta-model architecture of the OMG. Source: own elaboration, published in [60]. ......................................................................................... 91 Figure 24. Dashboard component prototype for the health ecosystem. Source: own elaboration, published in [60]. ...................................................................................................................................... 92 Figure 25. Healthcare ecosystem architecture using the C4 model. Source: own elaboration, published in [61]. ...................................................................................................................................... 93 Figure 26. M1 model for a dashboard following the user goals and tasks. Source: own elaboration, published in [61]. ................................................................................................................ 95 Figure 27. M1 model for a scatter chart. Source: own elaboration, published in [61]. ................... 96 Figure 28. Schematic view of the dashboard generator service architecture. Source: own elaboration, published in [64]. .............................................................................................................. 102 Figure 29. Location of the different models following the MDA architecture [89]. Source: own elaboration. ............................................................................................................................................. 103 Figure 30. An information dashboard generated using the SPL approach with LA data. Source: own elaboration, published in [65]. ..................................................................................................... 105 Figure 31. Ph.D. student dashboard proposal. Source: own elaboration, published in [66]. ...... 106 Figure 32. Ph.D. advisor dashboard proposal. Source: own elaboration, published in [66]. ...... 107 Figure 33. PhD manager dashboard proposal. Source: own elaboration, published in [66]....... 107 Figure 34. Visual representation of the SUS questionnaire results regarding the PhD portal data visualizations' usability and learnability scores. Source: own elaboration, published in [67]. ... 109 Figure 35. A demonstration of the generation process output, with the different visualizations (left side of the figure) followed by their configuration and the labeling functionalities (right side of the figure). Source: own elaboration, published in [68]. ...................................................... 111 Figure 36. Methodology followed to create the training dataset. Source: own elaboration, published in [68]. .................................................................................................................................... 112 Figure 37. MetaViz architecture and connections among its different modules. The right section of the figure shows the MDA architecture layers adapted to this context. The color of each module matches the MDA layer that it addresses. Source: own elaboration. ............................... 114 vii Figure 38. MetaViz interface. The left section of the figure shows the MDA architecture layers adapted to this context. The color of each container matches the MDA layer that it addresses. Source: own elaboration. ....................................................................................................................... 115 Figure 39. Business Process Model of the instantiation workflow. Source: own elaboration. .... 117 Figure 40. Example instantiation of the dashboard meta-model and the generated visualizations using MetaViz. Source: own elaboration. ........................................................................................... 118 Figure 41. Feature model for the OEEU dashboards product line. Source: own elaboration, published in [70] ..................................................................................................................................... 121 Figure 42. Feature model for the scatter chart component. Source: own elaboration, published in [70] ............................................................................................................................................................ 121 Figure 43. Excerpt of the specification of a dashboard using the DSL. Source: own elaboration, published in [70] ..................................................................................................................................... 122 Figure 44. Example dashboard generated through the SPL. Source: own elaboration, published in [70]. ...................................................................................................................................................... 123 Figure 45. Screenshot of a manual segmentation (left) and the AI algorithm output (right). Source: own elaboration, published in [36]. ....................................................................................... 125 Figure 46. Example pipeline to train a Random Forest classifier. Source: own elaboration. ...... 128 Figure 47. Generated dashboard to explore training datasets. Source: own elaboration, published in [35]. .................................................................................................................................... 129 Figure 48. Generated dashboard to explore training results. Source: own elaboration. ............. 129 Figure 49. Risk factors exploration view. Source: own elaboration, published in [37]. ............... 132 Figure 50. Advanced exploration of variables. The leftmost section provides a summary of each variable and the possibility of dragging them into the middle section to craft a visualization. The rightmost section displays the generated visualizations. Source: own elaboration, published in [37]. ...................................................................................................................................................... 133 Figure 51. Indications regarding the scale concept in the data visualization domain. Source: own elaboration. ............................................................................................................................................. 136 Figure 52. Indications regarding the channels/encodings concept in the data visualization domain. Source: own elaboration. ....................................................................................................... 136 Figure 53. Questions regarding visualizations created in Tableau (a) and MetaViz (b). Source: own elaboration. ..................................................................................................................................... 138 Figure 54. Summary of the outcomes and associated publications (with their related appendix number) of the present thesis. .............................................................................................................. 142 ix Table of Tables Table 1. Authors' addressing variability on dashboards. Source: own elaboration. ..................... 39 Table 2. Papers grouped by type of publication. Source: own elaboration. ................................... 40 Table 3. Papers grouped by target domain. Source: own elaboration. ............................................ 40 Table 4. Papers grouped by variability factors. Source: own elaboration. ...................................... 41 Table 5. Papers grouped by the target of the variability. Source: own elaboration. ...................... 42 Table 6. Papers grouped by variability stage. Source: own elaboration. ......................................... 43 Table 7. Papers grouped by variability methods. Source: own elaboration. .................................. 44 Table 8. Papers grouped in terms of testing. Source: own elaboration. .......................................... 45 Table 9. Classification of the solutions regarding their tailoring capabilities. Source: own elaboration. ............................................................................................................................................... 48 Table 10. Summary of user-testing methods applied in the retrieved articles. Source: own elaboration. ............................................................................................................................................... 53 Table 11. Results of the expert validation. Attributes key: CL=Clarity, CO=Coherence, RL=Relevance. Scores key: H=High, M=Medium, L=Low. Source: own elaboration. .................. 86 Table 12. Classification report of the Random Forest classifier. Source: own elaboration, published in [68]. .................................................................................................................................... 112 1 1 Introduction Our world is data. Our lives are data. Our actions are data. Everything we do, everything we say, everything we think is data. Even the most mundane process is led by data we collect through our senses and data we compute through our brains. While technical concepts like “data-driven” and “decision-making” can be seen as business jargon, it is necessary to be aware that these concepts influence every aspect of our day-to-day. We carry out decision-making processes when organizing our time to catch the bus promptly or when looking at weather reports to decide if we should bring an umbrella with us on a cloudy day. We are, consciously or not, making data-driven decisions constantly. But although we face these decisions every day, understanding data is not a simple task nor a straightforward process that can be taken for granted. Several mechanisms are triggered [1] to assist us in the journey of generating brand-new knowledge from tiny morsels of data [2-4]. However, these mechanisms are not infallible. In fact, living in a highly connected society, where data streams are 2 Chapter 1 continuously generated, can hamper these processes due to information overload [5, 6]. In this context, technological support is crucial to ease the generation of knowledge in such a convoluted environment with significant quantities of data. Data visualizations and information dashboards are powerful allies when it comes to understanding complex datasets [7-12]. These tools consist of visual displays where raw data is transformed and mapped to visual elements through their properties, such as their position, color, size, etc. Still, the necessity of drawing conclusions quickly to be part of the current debate can lower our guard when it comes to gaining quality insights from data. This context is a breeding ground for unfair practices that take advantage of this need for immediacy by introducing fake facts, for instance. Moreover, our perception mechanisms can also be tricked –purposely or not– to gain wrong conclusions by taking advantage of cognitive biases [13] or even manipulated data [7, 14]. Data analysis tools such as data visualizations and dashboards must be aware of all these potential issues and overcome them to provide the best experience and insightful, honest outcomes from data. But how can we efficiently introduce these notions and concepts into the design and development processes of these tools? 1.1 Motivation of this research As introduced, data visualizations and information dashboards are powerful but also complex. A lot of elements need to be involved to deliver effective visual displays of information. Data visualizations and dashboards are, in the end, a set of visual elements arranged and characterized following the input data. But these elements influence each other, and there are concepts that are not even on the screen but are crucial, such as the user characteristics. Due to this complexity, it is important to rely on expert knowledge when developing data visualizations and dashboards. 3 Introduction However, not every time that users want to create a visualization a domain expert will be available for them to guide the process and apply appropriate design principles. Several tools have tried to tackle this issue by assisting the user and automating the generation of data visualizations and dashboards through the implementation of generative processes that capture and apply experts’ knowledge adapting the visual elements depending on the data and context. This is the case with commercial tools like Tableau, Microsoft Excel, Google Charts, etc. Although these platforms are very powerful, there is still a problem related to transferring the expert knowledge to practitioners, and with the expressiveness of the obtained visualizations (Figure 1). On the other hand, declarative and imperative programming libraries can improve the expressiveness of the developed data visualizations and dashboards, but, in this case, they usually come with a steep learning curve that hampers the implementation process. Figure 1. A spectrum of data visualization tools. Source: [15]. In this sense, a generative dashboard pipeline should merge the best of interactive systems and programming languages, offering a good experience for non- expert users, but also a powerful specification to understand every element involved in the final product. 4 Chapter 1 1.2 Context of this research This thesis has been developed in the GRoup of InterAction and e-Learning (GRIAL) [16, 17] research group and in the context of the Ph.D. Programme of Computer Science of the Department of Computer Science and Automation at the University of Salamanca. GRIAL is a Recognized Research Group of the University of Salamanca and a Recognized Group of Excellence by the Regional Council of Castille and León. The group is formed by many researchers from different fields of knowledge. Most members have a technical or a pedagogical profile, but there are also members with expertise in e-Learning project management, Humanities, Sciences, etc. The research activity of the group in these last few years has ranged from purely technical and computing projects to the development of pedagogical methodologies and models of reference in the field of online learning which have gained international recognition and awards. The main work lines of the GRIAL research group involve: • Digital humanities. • eLearning methodologies. • ICT and educational innovation. • Information science. • Interactive learning systems. • Learning Technologies. • Quality and assessment in education. • Social responsibility and inclusion. • Strategic management of knowledge and technology. • Technological ecosystems. • Visual analytics. 5 Introduction • Web engineering and software architecture. The present thesis is the result of multiple projects in which the author has been involved since joining the GRIAL research group in 2016, namely: • Technological ecosystem for the University Employability and Employment Observatory (OEEU) of the UNESCO Chair in University Management and Policy of the Polytechnic University of Madrid [18-22]. Collaboration from 2014 to 2018 with the UNESCO Chair of University Management and Policy of the Polytechnic University of Madrid for the implementation of the technological ecosystem for the University Employability and Employment Observatory, with which the I Barometer of University Employability and Employment in Spain [23] and the Barometer of University Employability and Employment (Master's Edition 2017) [24] have been developed, both of them funded by the Fundación la Caixa. • TE-CUIDA, proposal of a TEchnological Ecosystem to support caregivers (ref. SA061P17) [25, 26]. Project funded by the Ministry of Education of the Junta de Castilla y León in the program of support for research projects co- financed by the European Regional Development Fund. Its duration is from 26-7-2017 to 31-12-2019. Resolution ORDER EDU/986/2017, November 8. It seeks to provide support to caregivers, both formal and informal, to improve the quality of care and even reduce the caregiver's burden, thereby facilitating that the elderly person, especially if they have loss of autonomy, can maintain their residence in the community environment and in their own home and maintain the best possible care possible. • A Digital Ecosystem Framework for an Interoperable NEtwork-based Society. (DEFINES) (ref. TIN2016-80172-R) [27-32]. Project funded by the Ministry of Economy and Competitiveness in the 2016 call for R&D&I projects of the State Program for Research, Development and Innovation Oriented to the Challenges of Society, with a duration from 1-1-2017 to 31-12- 2020. It pursues two main objectives. On the one hand, to propose a 6 Chapter 1 technological ecosystem to support services for corporate knowledge management. On the other hand, to transform the current knowledge management processes and achieve a better adaptation to the context of the Digital Society. Digital Society. • Visual Analytics and Machine Learning for decision-making in Health ecosystems. (AVisSA) (ref. PID2020-118345RB-I00) [33-35]. Project funded by the Ministry of Science and Innovation in the 2020 call for R&D&I projects of the State Program for Research, Development, and Innovation 2017-2020, with a duration from 1-10-2021 to 30-09-2025. This project aims at tackling the development of a system of automatic dashboard generation (meta- dashboard) with Domain Engineering and Artificial Intelligence (AI) techniques, to obtain dashboards adapted to each case and application domain from the flow of data in technological ecosystems that automatically adapts to the needs of analysis and knowledge management in heterogeneous contexts. The medical domain is taken as a reference due to its complexity and the diversity of information management needs, which appear in the different medical specialties, to improve these processes within the health system, with a remarkable impact on the decision-making processes. • Design and implementation of a technological ecosystem for research and intelligent data analysis in the Cardiology Department of the Hospital Clínico Universitario de Salamanca [35-40]. Collaboration since 2019 with the Cardiology Department of the Hospital Clínico Universitario de Salamanca for the implementation of different platforms related to data management and AI applications in the medical domain. In terms of the doctoral theses developed within the GRIAL research group, two of them should be highlighted because of their close relationship with the present research. 7 Introduction First, the research carried out by Juan Cruz-Benito "On data-driven systems analyzing, supporting and enhancing Human-Computer Interaction", supervised by Dr. Francisco José García-Peñalvo and Dr. Roberto Therón-Sánchez, both from the University of Salamanca. The main objective of this work is to explore how the collection and analysis of user-computer interaction information can be performed to improve such interaction in several typical scenarios: highly interactive and immersive scenarios, scenarios with many users, scenarios with excessive information and high task complexity [41, 42]. Second, the research carried out by Alicia García-Holgado "Integration analysis of solutions based on software as a service to implement Educational Technological Ecosystems", supervised by Dr. Francisco José García-Peñalvo from the University of Salamanca. This research is focused on providing an architectural framework that allows improving the definition, development, and sustainability of technological ecosystems for learning through model driven engineering [43-46]. This work has also been crucial due to the decision of following a model driven approach during the whole development process of this thesis. Regarding the PhD Program in Computer Science (https://doctorado.usal.es/es/doctorado/ingenier%C3%ADa-inform%C3%A1tica), it has been proposed by the Department of Computer Science and Automation in collaboration with the Department of Applied Mathematics of the University of Salamanca. The programme aims at training in different areas of research in the field of Computer Science, especially those related to the research lines of the groups that are integrated in the proposal such as: Intelligent systems, software engineering and knowledge engineering, human-computer interaction, data mining, visual analytics and information visualization, robotics, intelligent control, cryptography and information security, mathematical modeling, and numerical analysis. Two research stays were carried out during the development of this thesis. First, a virtual internship from July 1, 2021, to October 10, 2021, at Østfold University College, Computer Science Department (Halden, Norway). This research stay was focused on validating the meta-model. 8 Chapter 1 Second, as part of the international research stay required to obtain the International Mention for the Ph.D., the author was a visiting scholar at the Department of Computer Graphics Technology of Purdue University (West Lafayette, Indiana, United States of America) from January 10, 2022, to April 14, 2022. The research was related to data visualization applications and the results can be consulted in the last case study of this thesis. Moreover, the author received three awards for publications related to her thesis: 1. Best paper award in the track International Workshop on Software Engineering for E-Learning (ISELEAR’17) within the International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM) 2017 held in Cádiz, Spain between October 18-20, 2017. Award granted for the paper “Improving the OEEU's data-driven technological ecosystem's interoperability with GraphQL” developed jointly to J. Cruz- Benito and F. J. García-Peñalvo. 2. Best paper award in the track International Workshop on Software Engineering for E-Learning (ISELEAR’18) within the International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM) 2018 held in Salamanca, Spain between October 24-26, 2018. Award granted for the paper “Domain engineering for generating dashboards to analyze employment and employability in the academic context” developed jointly to F. J. García-Peñalvo and R. Therón. 3. Best paper award in the track International Workshop on Software Engineering for E-Learning (ISELEAR’19) within the International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM) 2019 held in León, Spain between October 16-18, 2019. Award granted for the paper “Capturing high-level requirements of information dashboards’ components through meta-modeling” developed jointly to F. J. García-Peñalvo and R. Therón. 9 Introduction 4. Conference best paper of the Learning Analytics Summer Institute (LASI Spain) Salamanca, Spain between June 20-21, 2022. Award granted for the paper “A proposal to measure the understanding of data visualization elements in visual analytics applications” developed jointly to F. J. García- Peñalvo, R. Therón, V. Byrd, and J. D. Camba. Finally, from the economic point of view, this doctoral thesis has received funding from the Spanish Ministry of Education and Vocational Training under an FPU fellowship (FPU17/03276). 1.3 Hypotheses and goals This research aims at exploring the benefits, downsides, and applications of the automatization of information dashboards and data visualizations development processes. Although automatizing processes could yield several gains in different dimensions, it is crucial to understand how automatizing the implementation of dashboards influences their performance as well as their functional and non- functional features. After all these considerations, the main hypothesis of this work is stated as follows: H1. Automatizing the development of tailored user interfaces for supporting decision-making processes increments their benefits in terms of functional and non- functional features. In other words, the primary goal of the research is to design and implement a generative framework for the automatic and systematic development of information dashboards, as well as to discuss the insights reached from automatizing the generation of these tools. The generative framework needs to involve tailoring mechanisms to adapt the layout, visual design, data sources, and interaction mechanisms. Through this approach, the focus is on fostering 10 Chapter 1 individualization, usability, and flexibility to maximize the benefits derived from the generated tools. A series of sub-objectives have been posed to reach the main goal. These sub- objectives can be categorized into four main phases: 1. Conceptualization: • Identification of common characteristics of information dashboards at a meta-level. • Identification of connection mechanisms to enable a model-driven approach to build concrete products of the Software Product Line (SPL). 2. Implementation: • Implementation of mechanisms that foster interoperability to allow the connection of different data sources. • Definition and implementation of reusable and configurable core assets to generate specific products of the SPL. 3. Validation: • Evaluation of the SPL at a generative and functional level. • Evaluation of the generated dashboards in terms of usability and expressiveness. • Study of the automatic adaptation of the dashboards through AI mechanisms. 4. Applications: • Study of the integration of the dashboards SPL within different technological ecosystems and case studies. 11 Introduction The outcomes of each phase have been published in different research articles that will be summarized throughout this thesis. Figure 2 shows a conceptual map of the phases and their related research. As can be observed, the conceptualization- implementation-validation-application processes are iterative, following the action- research methodology, which will be detailed in the following subsection. The continuous evaluation of the outcomes at every stage has resulted in the improvement of the meta-model and the generative pipeline of information dashboards. Figure 2. Conceptual map of the objectives and associated research. Source: own elaboration. Figure 2 presents the previous objectives and a classification of the outcomes from each sub-objective in terms of the publications that are part of this thesis driven by the articles published during its development: 1. Conceptualization: 1.1. Systematic Literature Review (SLR): • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Tailored information dashboards: A systematic mapping of the literature," in Proceedings of the XX International Conference on Human Computer Interaction 12 Chapter 1 (Donostia, Gipuzkoa, Spain — June 25 - 28, 2019) Article Number 26, New York, NY, USA: ACM, 2019. doi: 10.1145/3335595.3335628 [47]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, and R. Therón, "Information Dashboards and Tailoring Capabilities - A Systematic Literature Review," IEEE Access, vol. 7, pp. 109673-109688, 2019, doi: 10.1109/ACCESS.2019.2933472 [48]. 1.2. Application of the domain engineering paradigm to the dashboards’ domain1: • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Domain engineering for generating dashboards to analyze employment and employability in the academic context," in TEEM’18 Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality (Salamanca, Spain, October 24th-26th, 2018), F. J. García- Peñalvo, Ed. pp. 896-901, New York, NY, USA: ACM, 2018. doi: 10.1145/3284179.3284329 [49]. 1.3. Construction of the meta-model for information dashboards and visualizations: • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Capturing high- level requirements of information dashboards’ components through meta- modeling," in TEEM’19 Proceedings of the Seventh International Conference on Technological Ecosystems for Enhancing Multiculturality (Leon, Spain, October 16th-18th, 2019), M. Á. Conde-González, F. J. Rodríguez-Sedano, C. Fernández-Llamas and F. J. García-Peñalvo, Eds. ICPS: ACM International Conference Proceedings Series, pp. 815-821, New York, NY, USA: ACM, 2019. doi: 10.1145/3362789.3362837 [50]. • A. Vázquez Ingelmo, F. J. García-Peñalvo, R. Therón Sánchez, and M. Á. Conde González, "Extending a dashboard meta-model to account for users’ 1 This publication appears as a case study in the domain of employment and employability, however, it is also framed within the conceptualization phase, so it supports both sections of this work. 13 Introduction characteristics and goals for enhancing personalization,", Proceedings of LASI-SPAIN 2019. Learning Analytics Summer Institute Spain 2019: Learning Analytics in Higher Education (Vigo, Spain, June 27-28, 2019). CEUR Workshop Proceedings Series, 2019. [Online]. Available: http://hdl.handle.net/10366/139803 [51]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, R. Therón, and M. Á. Conde, "Representing Data Visualization Goals and Tasks through Meta-Modeling to Tailor Information Dashboards," Applied Sciences, vol. 10, no. 7, 2306, 2020. [Online]. Available: https://www.mdpi.com/2076-3417/10/7/2306 [52]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, R. Therón, and A. García- Holgado, "Specifying information dashboards’ interactive features through meta-model instantiation," in Proceedings of LASI-SPAIN 2020. Learning Analytics Summer Institute Spain 2020: Learning Analytics. Time for Adoption? (Valladolid, Spain, June 15-16, 2020), A. Martínez-Monés, A. Álvarez, M. Caeiro-Rodríguez, and Y. Dimitriadis Eds., (CEUR Workshop Proceedings Series, no. 2671). Aachen, Germany: CEUR-WS.org, 2020, pp. 47-59 [53]. • A. Vázquez-Ingelmo, A. García-Holgado, F. J. García-Peñalvo and R. Therón, "A Meta-modeling Approach to Take into Account Data Domain Characteristics and Relationships in Information Visualizations," in Trends and Innovations in Information Systems and Technologies, WorldCIST 2021, vol. 2, Á. Rocha, H. Adeli, G. Dzemyda, F. Moreira and A. M. Ramalho Correia, Eds. Advances in Intelligent Systems and Computing Series Series, no. 1366, pp. 570-580, Cham, Switzerland: Springer Nature, 2021. doi: 10.1007/978-3- 030-72651-5_54 [54]. 2. Implementation: 2.1. Application of the SPL paradigm following the obtained meta-model entities and relationships: 14 Chapter 1 • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Automatic generation of software interfaces for supporting decision-making processes. An application of domain engineering and machine learning," in TEEM’19 Proceedings of the Seventh International Conference on Technological Ecosystems for Enhancing Multiculturality (Leon, Spain, October 16th-18th, 2019), M. Á. Conde-González, F. J. Rodríguez-Sedano, C. Fernández-Llamas and F. J. García-Peñalvo, Eds. ICPS: ACM International Conference Proceedings Series, pp. 1007-1011, New York, NY, USA: ACM, 2019. doi: 10.1145/3362789.3362923 [55]. 2.2. Code templates as a method to materialize the SPL variability points: • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Addressing Fine- Grained Variability in User-Centered Software Product Lines: A Case Study on Dashboards," in Knowledge in Information Systems and Technologies, vol. 1, Á. Rocha, H. Adeli, L. P. Reis and S. Costanzo, Eds. Advances in Intelligent Systems and Computing, no. AISC 930, pp. 855-864, Switzerland: Springer Nature, 2019. doi: 10.1007/978-3-030-16181-1_80 [56]. 3. Validation: 3.1. Expert validation of the dashboard meta-model: • A. Vázquez-Ingelmo, A. García-Holgado, F. J. García-Peñalvo, R. Therón, and R. Colomo-Palacios, "Content-validation questionnaire of a meta- model to ease the learning of data visualization concepts," presented at the Learning Analytics Summer Institute Spain 2022 (LASI Spain 22), Salamanca, Spain, 20-21 June, 2022 [57]. 4. Applications: 4.1. Theoretical applications: 4.1.1. Integration of different meta-models: holistic meta-model: 15 Introduction • A. Vázquez-Ingelmo, A. García-Holgado, F. J. García-Peñalvo, and R. Therón, "A meta-model to develop learning ecosystems with support for knowledge discovery and decision-making processes," in 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), 24- 27 June 2020 2020, pp. 1-6, doi: 10.23919/CISTI49556.2020.9140986 [58]. • A. Vázquez-Ingelmo, A. García-Holgado, F. J. García-Peñalvo and R. Therón, "A Dashboard to Support Decision-Making Processes in Learning Ecosystems: A Metamodel Integration," in Proceedings of the 2020 European Symposium on Software Engineering - ESSE 2020 (November 6-8, 2020, Rome, Italy). International Conference Proceedings Series, pp. 80-87, New York, NY, USA: ACM, 2020. doi: 10.1145/3393822.3432326 [59]. • A. Vázquez-Ingelmo, A. García-Holgado, F. J. García-Peñalvo, and R. Therón, "A Meta-Model Integration for Supporting Knowledge Discovery in Specific Domains: A Case Study in Healthcare," Sensors, vol. 20, no. 15, 4072, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/15/4072 [60]. 4.1.2. Using the meta-model as a framework: • A. Vázquez-Ingelmo, A. García-Holgado, F. J. García-Peñalvo, and R. Therón, "Dashboard Meta-Model for Knowledge Management in Technological Ecosystem: A Case Study in Healthcare," Proceedings, vol. 31, no. 1, 2019, doi: 10.3390/proceedings2019031044 [61]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Aggregation Bias: A Proposal to Raise Awareness Regarding Inclusion in Visual Analytics," in Trends and Innovations in Information Systems and Technologies, WorldCIST 2020, vol. 3, Á. Rocha, H. Adeli, L. P. Reis, S. Costanzo, I. Orovic and F. Moreira, Eds. Advances in Intelligent Systems and Computing Series Series, no. AISC 1161, pp. 409-417, Cham, 16 Chapter 1 Switzerland: Springer Nature, 2020. doi: 10.1007/978-3-030-45697-9_40 [62]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, R. Therón, D. Amo Filvà, and D. Fonseca Escudero, "Connecting domain-specific features to source code: towards the automatization of dashboard generation," Cluster Computing, vol. 23, no. 3, pp. 1803-1816, 2020, doi: 10.1007/s10586-019- 03012-1 [63]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, and R. Therón, "Towards a Technological Ecosystem to Provide Information Dashboards as a Service: A Dynamic Proposal for Supplying Dashboards Adapted to Specific Scenarios," Applied Sciences, vol. 11, no. 7, art. 3249, 2021, doi: 10.3390/app11073249. [64] 4.2. Practical applications: 4.2.1. Generation of information dashboards: • A. Vázquez-Ingelmo and R. Therón, "Beneficios de la aplicación del paradigma de líneas de productos software para generar dashboards en contextos educativos," RIED. Revista Iberoamericana de Educación a Distancia, vol. 23, no. 2, pp. 169-185, 2020, doi: 10.5944/ried.23.2.26389 [65]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo and R. Therón, "Generating Dashboards Using Fine-Grained Components: A Case Study for a PhD Programmes," in Learning and Collaboration Technologies. Design, Experiences. 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part I, P. Zaphiris and A. Ioannou, Eds. Lecture Notes in Computer Science, no. 12205, pp. 303-314, Cham, Switzerland: Springer Nature, 2020. doi: 10.1007/978-3-030-50513-4_23 [66]. 17 Introduction • A. Vázquez Ingelmo, A. García-Holgado, H. Hernández-Payo, F. J. García-Peñalvo, and R. Therón Sánchez, "Following up the progress of doctoral students and advisors’ workload through data visualizations: a case study in a PhD program," Proceedings of LASI-SPAIN 2021. Learning Analytics Summer Institute Spain 2021: Learning Analytics in times of COVID-19: Opportunity from crisis (Barcelona, Spain, July 7-9, 2021). CEUR Workshop Proceedings Series, 2021. [Online]. Available: http://ceur- ws.org/Vol-3029/paper06.pdf [67]. 4.2.2. Application of Machine Learning to the generation of information dashboards: • Vázquez-Ingelmo, A., García-Holgado, A., García-Peñalvo, F. J., & Therón, R. “Proof-of-concept of an information visualization classification approach based on their fine-grained features,” Expert Systems, e12872, 2022, doi: 10.1111/exsy.12872 [68]. 4.2.3. Design and implementation of a graphical instantiation platform based on the dashboard meta-model and following the model-driven architecture • A. Vázquez Ingelmo, F. J. García-Peñalvo, and R. Therón, "MetaViz - A graphical meta-model instantiator for generating information dashboards and visualizations," Journal of King Saud University - Computer and Information Science, In Press, doi: https://doi.org/10.1016/j.jksuci.2022.09.015 [69]. 4.3. Integration in real-world environments: 4.3.1. Integration of the generative pipeline of dashboards in the employment and employability context: • A. Vázquez-Ingelmo, J. Cruz-Benito, F. J. García-Peñalvo, and M. Martín- González, "Scaffolding the OEEU's Data-Driven Ecosystem to Analyze the Employability of Spanish Graduates," in Global Implications of 18 Chapter 1 Emerging Technology Trends, F. J. García-Peñalvo Ed. Hershey, PA, USA: IGI Global, 2018, pp. 236-255 [18]. • A. Vázquez-Ingelmo, J. Cruz-Benito, and F. J. García-Peñalvo, "Improving the OEEU's data-driven technological ecosystem's interoperability with GraphQL," in Proceedings of the 5th International Conference on Technological Ecosystems for Enhancing Multiculturality, Cádiz, Spain, 2017, New York, NY, USA: Association for Computing Machinery, p. Article 89, doi: 10.1145/3144826.3145437 [21]. • J. Cruz-Benito, J. C. Sánchez-Prieto, A. Vázquez-Ingelmo, R. Therón, F. J. García-Peñalvo, and M. Martín-González, "How Different Versions of Layout and Complexity of Web Forms Affect Users After They Start It? A Pilot Experience," Cham, 2018: Springer International Publishing, in Trends and Advances in Information Systems and Technologies, pp. 971- 979 [19]. • J. Cruz-Benito, A. Vázquez-Ingelmo, J. C. Sánchez-Prieto, R. Therón, F. J. García-Peñalvo, and M. Martín-González, "Enabling Adaptability in Web Forms Based on User Characteristics Detection Through A/B Testing and Machine Learning," IEEE Access, vol. 6, pp. 2251-2265, 2018, doi: 10.1109/ACCESS.2017.2782678 [41]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, and R. Therón, "Domain engineering for generating dashboards to analyze employment and employability in the academic context," presented at the Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality, Salamanca, Spain, 2018. [Online]. Available: https://doi.org/10.1145/3284179.3284329 [49]. • A. Vázquez-Ingelmo, F. J. García-Peñalvo, and R. Therón, "Taking advantage of the software product line paradigm to generate customized user interfaces for decision-making processes: a case study on university 19 Introduction employability," PeerJ Computer Science, vol. 5, e203, 2019, doi: 10.7717/peerj-cs.203 [70]. 4.3.2. Integration of the generative pipeline of dashboards in the health context: • A. Vázquez-Ingelmo et al., "A platform for management and visualization of medical data and medical imaging," in Proceedings TEEM’20. Eighth International Conference on Technological Ecosystems for Enhancing Multiculturality (Salamanca, Spain, October 21st - 23rd, 2020), F. J. García-Peñalvo, Ed. ICPS: ACM International Conference Proceedings Series, New York, NY, USA: ACM, 2020. doi: 10.1145/3434780.3436652. [40]. • F. J. García-Peñalvo et al., "Application of Artificial Intelligence Algorithms Within the Medical Context for Non-Specialized Users: the CARTIER-IA Platform," International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 6, pp. 46-53, 2021, doi: 10.9781/ijimai.2021.05.005 [36]. • A. García-Holgado et al., "User-Centered Design Approach for a Machine Learning Platform for Medical Purpose," in HCI-COLLAB 2021, Sao Paulo, Brazil, 8-10, September 2021, Cham, Switzerland: Springer International Publishing, in Human-Computer Interaction, pp. 237-249, doi: 10.1007/978-3-030-92325-9_18 [38]. • A. Vázquez-Ingelmo et al., "Bringing Machine Learning Closer to Non- Experts: Proposal of a User-Friendly Machine Learning Tool in the Healthcare Domain," in Proceedings TEEM’21. Ninth International Conference on Technological Ecosystems for Enhancing Multiculturality (Barcelona, Spain, October 27th – 29th, 2021) ICPS: ACM International Conference Proceedings Series, pp. 324-329, New York, USA: ACM, 2021. doi: 10.1145/3486011.3486469 [39]. 20 Chapter 1 • A. Vázquez-Ingelmo et al., "A platform to support the visual analysis of the SALMANTICOR study outcomes: conveying cardiological data to lay users," in Proceedings TEEM’21. Ninth International Conference on Technological Ecosystems for Enhancing Multiculturality (Barcelona, Spain, October 27th – 29th, 2021). ICPS: ACM International Conference Proceedings Series, pp. 335-341, New York, USA: ACM, 2021. doi: 10.1145/3486011.3486471 [37]. • F. J. García-Peñalvo, A. Vázquez-Ingelmo, and A. García-Holgado, "Fostering Decision-Making Processes in Health Ecosystems through Visual Analytics and Machine Learning," presented at the 9th International Conference on Learning and Collaboration Technologies, Virtual, June 28, 2022 [34]. • F. J. García-Peñalvo et al., "KoopaML: A graphical platform for building machine learning pipelines adapted to health professionals," International Journal of Interactive Multimedia and Artificial Intelligence, In Press [35]. 4.3.3. Integration of the generative pipeline of dashboards in the education context: • A. Vázquez-Ingelmo, F. J. García-Peñalvo, R. Therón, V. Byrd, and J. D. Camba, "A proposal to measure the understanding of data visualization elements in visual analytics applications," presented at the Learning Analytics Summer Institute Spain 2022 (LASI Spain 22), Salamanca, Spain, 20-21 June, 2022 [71]. 1.4 Methodology This subsection covers the methodologies employed throughout the thesis, including the general methodological framework to carry out the research as well as the paradigms followed to design and develop the generative pipeline of information dashboards. An overview of the followed methodology has been published and can 21 Introduction be consulted in Appendix I. Automatic generation of software interfaces for supporting decision-making processes. An application of domain engineering and machine learning [55]. 1.4.1 Action-research methodology Due to the mixed nature of the artifacts and proposed scenarios of this research, this thesis has been carried out following an iterative process where the knowledge gained through past experiences and the outcomes of the different cycles is crucial for the following stages. The Action-Research methodological framework [72] will be followed to accomplish this process. Kemmis posed Action-Research [73] as an inquiry method carried out by the participants in social situations with the aim of improving and understanding their own social practices and their contexts. Later, McTaggart & Kemmis described the characteristics of this methodology. The Action-Research methodology is based on a cyclic spiral of research and actions composed of a series of phases and sequences [74]. Therefore, Action-Research is an iterative process where each cycle provides an output that will be the input for the next cycle. The iterative nature of the methodology enables the researcher to address previously identified problems, thus obtaining more refined solutions. However, as previously represented in Figure 2, it is necessary to formalize the problem to be addressed to be able to start the Action-Research cycles. Similar problems and previously developed solutions have been studied to understand the context and the current state of the field. The methodology used for this step (a SLR) is detailed in the next section. Once the problem is formalized, two Action-Research cycles are proposed to develop a proposal for generating dashboards and evaluating them in real contexts. Evaluation is necessary to obtain feedback to improve the proposal. 22 Chapter 1 The chosen framework for software development is an agile approach based on SCRUM [75]. This framework provides the necessary processes, rules, practices, roles, and artifacts to increase the productivity of development teams through an iterative and incremental software development cycle [76]. A mixed methods research approach has been employed to evaluate the artifacts. The research has been conducted using both quantitative and qualitative methods [77], leveraging the two perspectives to obtain a wider view of the results to face the next Action-Research cycles. 1.4.2 Systematic Literature Review As introduced above, an SLR [78] is a powerful method to gain knowledge about previous solutions and similar problems. The SLR helps in the contextualization of the problem to be solved and provides new research lines by identifying weaknesses and strengths in previous solutions. The SLR is conducted under the guidelines proposed by Kitchenham [79]. Following the [79, 80] guidelines, the SLR is composed of three main phases: planning, conducting, and reporting the study. However, before planning the review, a preliminary search was performed to verify that there were no recent reviews about the target topic. If any recent SLR were found, there would not be any necessity to conduct a new one. This preliminary search was performed using different electronic databases (Scopus, Web of Science (WoS), IEEE Xplorer and Springer) and using terms related to literature reviews (“SLR”, “systematic literature review”, etc.), as well as terms related to the target of the review (“dashboards”). The result of the previous search confirmed that, at the time of performing the queries, there were not any previous SLR about tailored dashboards, so the necessity of performing a literature review was justified. 23 Introduction 1.4.3 Meta-modeling and Software Product Lines Given the complexity of the dashboards' design processes, it is necessary to understand their domain deeply. Dashboards can present different features, different visual designs, different purposes, etc. However, dashboards also share common features that are always present. These common features can be abstracted to obtain generic schemas or models that can help with the domain understanding and systematic reuse of software components. The technique for identifying shared properties and variabilities within a specific domain is called domain engineering [81-86]. Domain engineering is based on knowledge reuse regarding some specific domain. This approach is an essential phase of the SPL paradigm [83, 87]. This methodology allows the reuse of software components and their configuration to match certain requirements; that is why identifying common features and variabilities is an essential step. Once the domain has been studied, it is possible to develop a generic model (a meta-model) that captures every abstract property of dashboards, as well as the relationships among the identified entities. Meta-models are crucial artifacts in model-driven development (MDD) paradigms [88-90], as they allow mapping entities from high-abstraction levels to more detailed entities and even source code through transformations. The Object Management Group (OMG) proposes the model-driven architecture (MDA) as a guideline to implement this approach. This architecture provides a framework for software development that employs models to describe and define the target system [91]. The main difference between MDD and MDA is that the OMG proposal uses a set of standards: meta-object facility (MOF), unified modeling language (UML), XML (Extensible Markup Language) metadata interchange (XMI), and query/view/transformation (QVT). 24 Chapter 1 The MDA framework is composed by four architectural layers. Each layer represents a level of abstraction of the represented entities. The most abstract layer (M3 level) is known as the meta-meta-model level. This layer defines basic structures and concepts to represent less-abstract layers as well as itself, and it can be implemented with the mentioned MOF standard. The M2 level, namely meta-model level, complies with the meta-meta-model and represents abstract entities and relationships. Meta-models can be seen as Domain Specific Languages (DSL) that express common and generic features of the target domain. The M1 level, defines models that instantiate and specify the abstract features contained in the meta-model, and its syntaxis must comply with the M2 level. Finally, the M0 level represent real-world applications based on a previously defined M1 model. Figure 3 summarizes the MDA framework layers. Figure 3. Levels of the MDA framework. Source: own elaboration. 25 Introduction The combination of the MDD and SPL paradigms increases productivity in terms of development processes, but also knowledge reuse, as the study of the domain is represented by the meta-model. Moreover, both methodologies provide mechanisms to address several requirements from different profiles and contexts, which is crucial in this domain. 1.5 Document structure This manuscript is organized into 6 chapters and 36 appendixes. The present section introduces the dissertation with the research context, motivation, goals, and methodologies followed to carry out this research. Chapter 2 presents the state-of-the-art of automatic generation of information dashboards with tailoring capabilities. A SLR and mapping were carried out to frame the possibilities of creating a generative pipeline focused on improving the user experience and understanding. Chapter 3 details one of the main artifacts of this research: the dashboard meta- model. This chapter describes the domain engineering process and the iterations made until reaching the final version of the meta-model. Chapter 4 presents the results derived from the validation and application of the dashboard meta-model into different contexts, including theoretical, practical, and real-world applications. Chapter 5 discusses all the obtained results, while Chapter 6 presents the general conclusions of the research, future research lines, and the achievements obtained by the author while carrying out the thesis. The first 35 appendixes, on the other hand, include every published paper with the results derived from this thesis, while the last appendix (Appendix AK. Resumen extendido: Generación automática de interfaces software para el soporte a la toma de decisiones. Aplicación de ingeniería de dominio y machine learning) contains an extended abstract of this document in Spanish. 27 2 State-of-the-art of information dashboards and tailoring capabilities As introduced in the previous chapter, tailoring capabilities are vital factors to tackle the fact that there is no “one size fits all” related to information dashboards because not every user has the same knowledge, goals, interests, or preferences when visualizing data. However, before planning new methodologies for the automatic generation of tailored information dashboards, it is crucial to analyze previous findings that deal with this issue to find caveats and challenges new solutions could address. For these reasons, a SLR on how tailoring capabilities are achieved in the domain of information dashboards [92-94] has been carried out. This process aims to provide a comprehensive view of existing solutions, their limitations, and methods employed to offer suitable dashboard configurations to specific users. Apart from the obtained landscape of solutions, the other main outcome of the literature review is a critical analysis of the methodologies and architectures found in the selected papers. This kind of analysis offers a good starting point for designing 28 Chapter 2 and implementing the first proposal of a system for the automatic generation of information dashboards. The mapping and SLR can be consulted at Appendix G. Tailored information dashboards: A systematic mapping of the literature [47] and Appendix H. Information dashboards and tailoring capabilities – A systematic literature review [48], respectively. This section provides the updated analysis and outcomes of the systematic review2. 2.1 Methodology The systematic process to conduct the literature review follows the SLR methodology by Kitchenham and Carters [79, 80]. The SLR has been complemented with a systematic literature mapping following the method proposed in [95]. The mapping results provide a quantitative analysis regarding the state-of-the-art of the target domain. The SLR comprises three main phases: planning, conducting, and reporting the study [79, 80]. This subsection describes the protocol followed during the SLR to enable the traceability of the outcomes. 2.1.1 Review and planning process Research questions The questions raised to analyze the state-of-the-art of tailoring capabilities in information dashboards are organized into three categories: technical aspects (RQ1- RQ4), artificial intelligence (AI) application (RQ5), evaluation of the solutions (RQ6). • RQ1. How have existing dashboard solutions tackled the necessity of tailoring capabilities? 2 The update was carried out on May 18, 2022. 29 State-of-the-art • RQ2. Which methods have been applied to support tailoring capabilities within the dashboards’ domain? • RQ3. How do the proposed solutions manage the dashboard’s requirements? • RQ4. Can the proposed solutions be transferred to different domains? • RQ5. Has any artificial intelligence approach been applied to the dashboards’ tailoring processes, and, if applicable, how have these approaches been involved in the dashboards’ tailoring processes? • RQ6. How mature are tailored dashboards regarding their evaluation? On the other hand, the mapping questions focus on categorizing and analyzing the collected solutions quantitatively. • MQ1. How many studies have been published over the years? • MQ2. Who are the most active authors in the area? • MQ3. What type of papers are published? • MQ4. To which contexts have been the variability processes applied? (BI, learning analytics, etc.). • MQ5. Which are the factors that condition the dashboards’ variability process? • MQ6. What is the target of the variability process? (Visual components, KPIs, interaction, the entire dashboard, etc.). • MQ7. At which development stage is the variability achieved? • MQ8. Which methods have been used for enabling variability? • MQ9. How many studies have tested their proposed solutions in real-world environments? 30 Chapter 2 The review scope was defined following the PICOC method [96] along with the posed research questions. • Population (P): Software solutions. • Intervention (I): Provide support to tailor (information) dashboards. • Comparison (C): No comparison intervention in this study, as the primary goal of the present SLR is to analyze existing approaches regarding tailoring capabilities and gain knowledge about them. • Outcomes (O): Information dashboard proposals. • Context (C): Environments related to data visualization and (or) decision making (in academia, industry, etc.). Inclusion and exclusion criteria A series of inclusion (IC) and exclusion criteria (EC) to select articles that could answer the RQs and dismiss those unrelated to the review scope. • IC1. The paper describes a dashboard solution (proposal, architecture, software design, model, tool, etc.) AND • IC2. The solution is applied to information dashboards AND • IC3. The solution supports or addresses tailoring capabilities (customization, personalization, adaptation, variation) regarding information dashboards AND • IC4. The tailoring capabilities of the dashboard are related to its design, components, or KPIs AND • IC5. The papers are written in English or Spanish AND • IC6. The articles are published in peer-reviewed Journals, Books, or Conferences AND 31 State-of-the-art • IC7. The publication is the most recent or complete of the set of related publications regarding the same study. The exclusion criteria are derived from the inclusion criteria as their opposite. • EC1. The paper does not describe a dashboard solution (proposal, architecture, software design, model, tool, etc.) OR • EC2. The solution is not applied to information dashboards OR • EC3. The solution does not support or address tailoring capabilities (customization, personalization, adaptation, variation) regarding information dashboards OR • EC4. The tailoring capabilities of the dashboard are not related to its design, components, or KPIs OR • EC5. The papers are not written in English or Spanish OR • EC6. The articles are not published in peer-reviewed Journals, Books, or Conferences OR • EC7. The publication is not the most recent or complete of the set of related publications regarding the same study. Search strategy Four electronic databases (Scopus, Web of Science (WoS), IEEE Xplore, and SpringerLink) were selected to perform the search. The selection process used the following criteria: • It is a reference database in the research scope. • It is a relevant database in the research context of this literature review. • It allows using similar search strings to the rest of the selected databases and using Boolean operators to enhance the outcomes of the retrieval process. 32 Chapter 2 The search concepts employed to build the search query are detailed in [48]. The search was conducted on January 22, 2019, for the first version and on May 18, 2022, for the updated version. Query strings Scopus TITLE-ABS-KEY ((meta-dashboard*) OR ((dashboard*) W/10 (custom* OR personal* OR adapt* OR flexib* OR config* OR tailor* OR context-aware OR generat* OR compos* OR select* OR template* OR driven)) OR ((dashboard*) AND ( (heterogeneous OR different OR diverse OR dynamic) W/0 (requirement* OR stakeholder* OR user* OR need* OR task* OR necess*)) )) AND NOT TITLE-ABS-KEY (car OR vehicle OR automo*) AND NOT DOCTYPE(cr) Web of Science TS=((meta-dashboard*) OR ((dashboard*) NEAR/10 (custom* OR personal* OR adapt* OR flexib* OR config* OR tailor* OR context-aware OR generat* OR compos* OR select* OR template* OR driven)) OR ((dashboard*) AND ((heterogeneous OR different OR diverse OR dynamic) NEAR/0 (requirement* OR stakeholder* OR user* OR need* OR task* OR necess*)))) NOT TS= ( car OR vehicle OR automo* ) IEEE Xplore (((meta-dashboard) OR ((dashboard) NEAR/10 (custom* OR personal* OR adapt* OR flexib* OR tailor OR tailored OR configurable OR context-aware OR generation OR generated OR generative OR composed OR composition OR selection OR selecting OR template OR driven)) OR ((dashboard) AND ((heterogeneous OR different OR diverse OR dynamic) NEAR/0 (requirement OR stakeholder OR user OR need OR task OR necessities)))) AND NOT (car OR vehicle OR automo*)) SpringerLink ((meta-dashboard*) OR ((dashboard*) NEAR/10 (custom* OR personal* OR adapt* OR flexib* OR config* OR tailor* OR context-aware OR generat* OR compos* OR select* OR template* OR driven)) OR ((dashboard*) AND ((heterogeneous OR different OR diverse OR dynamic) NEAR/0 (requirement* OR stakeholder* OR user* OR need* OR task* OR necess*)))) Quality criteria Another set of criteria was also defined to assess each work’s quality to answer the RQs, before including them in the final literature review. Each criterion can be scored with three values: 1 (the paper meets the criterion), 0.5 (the paper partially meets the criterion), and 0 (the paper does not meet the criterion). 33 State-of-the-art 1. The research goals of the work are focused on addressing the variability, adaptability, customization, or personalization of an information dashboard to improve individual user experience (UX). • Partial: not every research goal tries to address UX through tailoring capabilities. 2. A software solution that supports the variability of the dashboard components is presented. • Partial: the software supports customization of the dashboard but is not the focus 3. A model, framework, architecture, or any software engineering artifact that addresses the dashboard components' variation and interaction methods is adequately exposed. • Partial: a model, framework, architecture, or any software engineering artifact is exposed but not detailed, i.e., the nature of the referred elements is mentioned, but their internal structures and details are not further explained. 4. The employed methods or paradigms to achieve tailoring capabilities are appropriately described. • Partial: the employed methods or paradigms to achieve tailoring capabilities are partially described, i.e., the methodology is mentioned, not detailed in the application context. 5. The context or domain of application of the dashboard is described. • Partial: the context or domain of application is mentioned but not detailed. 6. The proposed solution has been tested with real users. • Partial: real users have used it and tested its functionality, but no further testing has been performed. 7. Issues or limitations regarding the proposed solution are identified. • Partial: problems or limitations are mentioned but not detailed. 34 Chapter 2 Each paper can obtain a maximum of 7 points regarding its quality following this methodology. This 0-to-7 score was transformed into a 0-to-10 scale, and the seven value was chosen as the threshold for including a paper into the final synthesis. If on a 0-to-10 scale, a paper obtains a score of fewer than seven points, it will be dismissed from the review as it did not meet a minimum quality to answer the stated research questions. 2.1.2 Data extraction process Once the search was performed –on January 22, 2019, for the first version and on May 18, 2022, for the updated version–, the paper selection process was carried out through the following procedure: 1. The raw results (i.e., the records obtained from each selected database) were gathered in a GIT repository3 and arranged into a spreadsheet4. A total of 2185 papers were retrieved: 595 (254 + 341) from Web of Science, 1035 (501 + 534) from Scopus, 192 (97 + 95) from IEEE Xplore, and 363 (182 + 181) from SpringerLink. 2. After organizing the records, duplicate works were removed. Specifically, 755 records were removed, retaining 1430 works (65.45% of the raw records) for the next phase. 3. The maintained papers were analyzed by reading their titles, abstracts, and keywords and applying the inclusion and exclusion criteria. 1327 papers were discarded as they didn't meet the requirements, retaining 103 articles (7.20% of the unique papers retrieved) for the next phase. 4. The selected 103 papers were read in detail and further analyzed. The papers were scored regarding their quality to answer the research questions using the quality assessment checklist described in the previous section. One paper 3 https://github.com/AndVazquez/slr-tailored-dashboards/tree/master/update-2022 4 https://bit.ly/3KvwygV 35 State-of-the-art was added after checking the references of the assessed works, leaving 104 records for this quality assessment phase. 5. After applying the quality criteria, a total of 30 papers (2.10% of the unique papers retrieved and 29.12% of the full text assessed articles) were selected for the present review. Although 36 papers were above the 7-score threshold, six records were finally discarded during the last phase. This exclusion was because the six works were previous or partial versions of other studies found within the retrieved records. The decision was to keep the more complete and/or more recent work. The update process added 9 more works to the original SLR’s included articles (23). However, two of these 9 works were more complete or recent versions of the originally 23 articles included, so the two old versions [49, 97] were replaced with the newer ones, resulting in the final 30 included papers. The PRISMA flow diagram has been employed to detail the data extraction process. Specifically, the PRISMA 2009 [98] flow diagram was used for the first version of the SLR (Figure 4), and the detailed paper selection can be consulted in Appendix H. Information dashboards and tailoring capabilities – A systematic literature review [48]. In the case of the updated version, the PRISMA 2020 [99, 100] guidelines were followed. Figure 5 shows the PRISMA 2020 flow diagram for the updated version of the SLR and the paper selection procedure described at the beginning of this subsection. 36 Chapter 2 Figure 4. Phases and results of the first review process carried out on January 22, 2019, using the PRISMA 2009 flow diagram. Source: own elaboration. 37 State-of-the-art Figure 5. Phases and results of the updated review process carried out on May 18, 2022, using the PRISMA 2020 flow diagram. Source: own elaboration. 2.2 Results of the systematic literature mapping This section presents the updated mapping results of the collected records. A Jupyter notebook (http://jupyter.org) based on the work developed by Cruz-Benito http://bit.ly/2tS9JgF was employed to support the analysis process of the raw data. MQ1. How many studies have been published over the years? The results cover from 2011 to 2018, with a work placed in 2007 [101]. A few records were published in 2011 [102, 103], 2012 [104], 2013 [105], 2014 [106-108] and 2016 [109, 110]. However, most records are distributed between 2017 [111-116] and 2018 [117- 38 Chapter 2 121], with six and seven papers, respectively. The update of the SLR also shows that this field has the period of the first SLR (2018) and the current version (2022), with two works from 2019 [70, 122], three from 2020 [123-125], and four from 2021 [68, 126- 128]. The number of selected papers per year can be consulted in Figure 6. Figure 6. Distribution of papers per year. Source: own elaboration. MQ2. Who are the most active authors in the area? Five authors have more than one record within the retrieved results. On the one hand, Kintz presents a model-driven solution for generating dashboards [104, 113]; in one case, it shows the semantic description language. The other one gives an extension to consider user roles in the dashboards’ generation process. 39 State-of-the-art On the other hand, Van Hoecke is one of the authors of two publications regarding dynamic monitoring dashboards through semantic technologies [110, 123]. Finally, Vázquez-Ingelmo, García-Peñalvo, and Therón (author and supervisors of this thesis, respectively) describe the application of the SPL paradigm for generating employability dashboards [70] and also for classifying potentially misleading visualizations [68]. The rest of the authors appear only once in this mapping study. Table 1 shows all authors and their number of papers in the scope of this literature mapping. Some authors also had more than one paper related to tailored dashboards. However, they were omitted because of the exclusion criteria EC7, so the most recent and complete paper about their study made it to the final phase. Table 1. Authors' addressing variability on dashboards. Source: own elaboration. Author Total García-Peñalvo, F.J.; Kintz, M.; Therón, R.; Van Hoecke, S; Vázquez-Ingelmo, A. 2 Amer-Yahia, S.; Arjun,, S.; B. Mayer; Barros, R.; Bastidas, V.; Bederson, B.B.; Belo, O.; Bezerianos, A.; Borges, M. R. S.; Bose, J.; Bouarour, N.; Cabrera, C.; Cardoso, A.; Castelnovo, C.; Chan, A.L.; Chowdhary, P.; Chua Zhen Liang, D.; Chua, G.G.; Ciucanu, R.; Collet, P.; Correia, H.; Da Col, S.; Dabbebi, I.; Danaisawat, K.; Dantas, V.; De Paepe, D.; Elias, M.; Elmqvist, N.; Filonik D., Medland R., Foth M., Rittenbruch M.; Furtado, V.; García-Holgado, A.; Garlatti, S.; George, S.; Gilliot, J.M.; Guo, S.; Hautte, S.V.; Hruška, T.; Huys, C.; Hynek, J.; Iksal, S.; Janssens, O.; Ji, M.; Karstens E.; Khunkornsiri, T.; Kochanowski, M.; Koetter, F.; Kukolj, S.; Kumar, K.; Lavoue, E.; Logre, I.; Magnoni, L.; Majstorović, B.; Mak, M.T.; Mariani, L.; May, M.; McGuinness, D.L.; Michel, C.; Mihaila, G.; Min Chim Lim, P.; Miotto, G.L.; Mobilio, M.; Moens, P.; Mosser, S.; Nascimento, B. S.; Noonpakdee, W.; Ongenae, F.; Orlovskyi, D., Kopp, A.; Palpanas, T.; Pastushenko, O.; Petasis G.; Phothichai, A.; Pinel, F.; Pinheiro, P.; R. Weinreich; Radovanović, S.; Riganelli, O.; Riveill, M.; Rodrigues, P.; Rojas, E.; Santos, H.; Siong Ng, W.; Sloper, J.E.; Soare, M.; Soni, S. K.; Sousa Pinto, J.; Steenwinckel, B.; Triantafillou A.; Tundo, A.; Van Herwegen, J.; Verborgh, R.; Verstichel, S.; Vieira Teixeira, C.J.; Vivacqua, A. S.; Yalcin, M.A.; de Walle, R.V. 1 40 Chapter 2 MQ3. What type of papers has been published? Each consulted electronic database provides the metadata to answer this mapping question. According to the inclusion and exclusion criteria, only peer-reviewed papers (either in journals, conferences, books, or workshops) are included. The complete list of types regarding the analyzed records can be consulted in Table 2. Table 2. Papers grouped by type of publication. Source: own elaboration. Type Total Papers Conference paper 21 [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [115] [116] [117] [120] [121] [122] [125] [126] [127] Article 9 [101] [114] [118] [119] [70] [123] [124] [128] [68] MQ4. To which contexts have been the variability processes applied? Dashboards can be used in any domain; the only requirement is to have enough data to visualize. Regarding customizable and/or personalized dashboards, Business Intelligence (BI) is the most common application domain, followed by Internet of Things (IoT), services monitoring, Learning Analytics (LA), and generic solutions (Table 3). Table 3. Papers grouped by target domain. Source: own elaboration. Domain Total Papers Business Intelligence 9 [101, 103, 104, 106, 113, 115, 117, 122, 126] IoT 4 [108, 110, 123, 124] Services monitoring 3 [112, 118, 125] Learning Analytics 2 [114, 116] Generic 2 [119, 127] 41 State-of-the-art Domain Total Papers Communication 1 [68] Disaster situations 1 [120] Economics 1 [121] Emergency management 1 [109] Energy monitoring 1 [105] Interface evaluation 1 [128] Microservices monitoring 1 [111] Physics 1 [103] Sensor monitoring 1 [107] Social sciences 1 [70] MQ5. Which are the factors that condition the dashboards’ variability process? One of the first steps to perform a variability process is determining the factors that will condition the dashboards’ variation, i.e., the customization and/or personalization stage inputs. Most of the included papers use user preferences to modify the dashboard appearance and functionality (Table 4). Table 4. Papers grouped by variability factors. Source: own elaboration. Factor Total Papers User preferences 21 [68, 70, 102, 103, 105, 107, 109-112, 114, 115, 117-119, 122-125, 127, 128] Data structure 6 [115, 116, 119, 121, 126, 127] Business process 3 [101, 104, 113] User role 2 [101, 113] Design guidelines 2 [117, 128] 42 Chapter 2 Factor Total Papers Usage profiles 1 [106] Data sources 2 [108, 110] Goals 2 [104, 113] User description 1 [116] Analysis scenario 1 [116] User abilities 1 [120] MQ6. What is the target of the variability process? Variability processes have a target that will change or be modified after the variation has been accomplished. In the case of dashboards, several elements could be the target of the variation: visualization types, layout, displayed data, visual design (i.e., color palettes, font sizes, etc.), and even interaction (pan, zoom, etc.) or functionalities (filters, exportation, etc.). Table 5 lists the different variability targets identified in the included papers. Table 5. Papers grouped by the target of the variability. Source: own elaboration. Target Total Papers Visualization types 28 [68, 70, 101-112, 114-119, 121-124, 126-128] Layout 24 [68, 70, 101-112, 114-119, 124-128] Displayed data 27 [68, 70, 101-119, 121-125, 127] Visual design 2 [118, 120] Interaction 2 [104, 120] Functionalities 1 [70] 43 State-of-the-art MQ7. At which development stage is the variability achieved? The modification of dashboard features can be performed at different stages. In this case, four steps were identified: compile-time, run-time, pre-configuration time (i.e., a phase before the creation of the dashboard in which the end-user or any other stakeholder defines its configuration), and user-configuration time (i.e., at run-time, but the user is responsible for the configuration of its dashboard). Pre-configuration and user-configuration seem preferred to customize or personalize the dashboards (Table 6). Table 6. Papers grouped by variability stage. Source: own elaboration. Stage Total Papers Pre-configuration 13 [68, 70, 101, 104, 107, 113, 115, 117, 118, 122, 124, 125, 128] User-configuration 9 [102, 103, 105, 109, 111, 112, 114, 121, 123] Run-time 8 [106, 108, 110, 116, 119, 120, 126, 127] Compile-time 1 [108] MQ8. Which methods have been used for enabling variability? A set of methods have been identified through the included papers. The most repeated method consists of configuration wizards to allow users to tailor their dashboards. Some solutions give extra support to these wizards with visual mapping to ease the selection of proper visualizations given the data structure to be visualized [103, 109, 119, 121]. Other common methods are configuration files, agents, SPL, and model-driven development. The detailed list of methods can be consulted in Table 7. 44 Chapter 2 Table 7. Papers grouped by variability methods. Source: own elaboration. Method Total Papers Configuration wizard 10 [102, 103, 105, 109, 111, 114, 119, 121-123] Visual mapping 5 [103, 109, 119, 121, 126] Model-driven 5 [101, 104, 113, 124, 125] Configuration files 4 [112, 118, 122, 128] SPL 3 [68, 70, 107] Agents 2 [106, 108] Pre-defined templates 2 [101, 117] Semantic reasoner 2 [110, 123] Inclusive user modeling 1 [120] Context-aware generator 1 [116] Indicator ontology 1 [115] Knowledge graphs 1 [115] Machine Learning 1 [127] MQ9. How many studies have tested their proposed solutions in real environments? The last mapping question is regarding the performed tests on the included dashboard solutions. Most (16) of the solutions have been tested in real-world scenarios involving real data and real users, while 6 have not been tested with real users or real data (Table 8). Four solutions have been partially tested in a real-world scenario, i.e., they have been tested with real data but not with real users or vice versa. 45 State-of-the-art Table 8. Papers grouped in terms of testing. Source: own elaboration. Tested? Total Papers Yes 13 [102, 103, 105, 107-110, 113, 116, 117, 119-121] Partially 11 [68, 70, 101, 114, 122-128] No 6 [104, 106, 111, 112, 115, 118] 2.3 Results of the SLR RQ1. How have existing dashboard solutions tackled the necessity of tailoring capabilities? The selected works were categorized in terms of their tailoring process. Each paper was analyzed to answer the questions that would characterize them into a specific category. The final classification can be seen in Figure 7. Most of the selected works are framed in the category of "customizable," meaning that the tailoring process of the dashboard is driven by explicit user requirements [68, 70, 103, 105, 107, 111, 112, 114, 117, 118, 122, 124, 125, 128]. Customizable solutions involve manual approaches mainly (which will be detailed in RQ2), meaning that users need to perform a set of explicit actions to tailor their dashboard according to their needs [102, 105, 111, 114, 122]. However, not only manual user interactions are employed for arranging the tool, some of these customizable dashboards involve generative or automatized approaches through the specification of configuration files [112, 118, 128], models [70, 107], or pre-defined templates [117]. 46 Chapter 2 Figure 7. Classification of the retrieved solutions in terms of their tailoring method. Source: [48], own elaboration. Although these solutions involve automatization, the inputs (configuration files, models, templates, etc.) of the generative pipelines still need to be filled with the user requirements. In the end, these generative approaches add an abstraction layer that helps users to configure their dashboards without requiring programming skills. For these reasons, these solutions are also classified as customizable dashboards. In contrast with customizable solutions, personalized solutions infer a suitable configuration based on implicit data about users, tasks, or goals [101, 104, 113, 120]. Adaptive solutions, on the other side, can adapt themselves at run-time based on environmental changes. These environmental changes include the analysis of user queries [106], interaction history [116], and explicit user feedback [127]. The last solution takes advantage of machine learning (ML) to adapt the dashboard’s views depending on the user’s interactions. 47 State-of-the-art Other tailored solutions have been identified, as they cannot be framed on the last categories (customizable, personalized, or adaptive). On the one hand, solutions identified as "hybrid" are mainly personalized or adaptive dashboards that allow the user to have the last word regarding the dashboard configuration, or that need user actions to complete the tailoring process. Works in this category analyze the data sources to personalize or adapt the dashboards’ visualization types [108, 126] or indicators [110, 115], but allow users to customize or create the final display. On the other hand, there are customizable solutions that can assist and help users build their dashboards according to a series of factors, identified as customizable with system support solutions. Visual mapping is the preferred method to assist users in the selection of the best visualization types for their dashboards [103, 109, 119, 121]. Also, [123] presents an extension of [110] in which the semantic reasoner supports and guides the implementation of the dashboard through a graphical interface. Classifying these tools regarding their tailoring capabilities is complex. The selected papers describe too many different solutions implemented through various methods with other goals, so this classification of tailored dashboards should be seen as a spectrum, allowing the existence of dashboards that mix features of different approaches. However, framing them in distinct categories allows better understanding regarding existing solutions and the current state of the present field. Table 9 summarizes this categorization. 48 Chapter 2 Table 9. Classification of the solutions regarding their tailoring capabilities. Source: own elaboration. Type Total Papers Customizable 14 [105, 107, 111, 112, 117, 118] [68, 70, 103, 114, 122, 124, 125, 128] Customizable with system support 5 [102, 109, 119, 121, 123] Personalized 4 [101, 104, 113, 120] Hybrid 4 [108, 110, 115, 126] Adaptive 3 [106, 116, 127] RQ2. Which methods have been applied to support tailoring capabilities within the dashboards’ domain? The preferred method for customizing dashboards is by using configuration wizards that support the users' decisions when building their customized dashboards without programming skills. For example, [102, 105, 111, 114, 122] uses graphical user interfaces that ease the selection of widgets and the data to be displayed. Configuration wizards are also the preferred method for customizable dashboards with system assistance, in conjunction with visual mapping methods that ease the selection of visualization types given the data types or structure [103, 109, 119, 121, 123]. Although it is considered a hybrid solution, authors in [126] also make use of mapping (by using thresholds) to match data sources to specific visualizations. Users configure their dashboards based on their needs and the system then provides feedback to support the customization process and potentially obtain more effective dashboards. Another common method to customize dashboards is to configure them by using structured configuration files [112, 118, 122], which also allow users to tailor their dashboards with a higher level of abstraction (through JSON files, XML files, 49 State-of-the-art etc.) through richer and more domain-specific syntaxes than programming languages. Although [122] uses a graphical interface to select widgets, the final dashboard specification and data schema are stored as JSON files. In this case, a series of parameters are set to render a concrete and functional dashboard. Some works also take advantage of the SPL paradigm [68, 70, 107] or Model- Driven Development (MDD) [101, 104, 113, 124, 125]. For example, in [125], a meta-model is used to transform the definition of different KPIs into an arranged set of visualizations. In this case, the meta-model only references the layout of the dashboard. The final rendering process is performed through external tools, such as Kibana or Grafana, as the authors mention. These paradigms are used to finally generate a dashboard that fits previously defined feature models (in the case of SPL) or meta-models (in the case of MDD). A similar MDD approach is followed in [116], although authors do not explicitly indicate that they followed this paradigm. In this case, to generate the dashboard, a context-aware generator with users’ data and visualization models as inputs oversees the generation of the dashboard instances. Still, the internal features of the dashboard generator are not detailed. Regarding adaptive solutions, agents are a common method for managing changing requirements [106, 108]. Machine learning has also been used in [127] to adapt the dashboard and to learn from users’ interactions to show better data visualizations subsequently. Other methods found in the selected papers enclose inclusive user modeling for adapting the dashboard interface to the user abilities [120], semantic reasoners for selecting appropriate data sources and compositions [110, 123], and knowledge graphs and ontologies to adapt the dashboards to the target data domain [115]. 50 Chapter 2 RQ3. How do the proposed solutions manage the dashboard’s requirements? The second research question shows that configuration wizards are popular methods to manage these requirements by giving the user the responsibility of building their own dashboard based on their necessities. These solutions allow users to customize their displays while using their dashboards freely, thus performing the tailoring process at user-configuration time (i.e., at run-time, but with the user’s intervention through explicit actions). All solutions found that use a configuration wizard approach [102, 103, 105, 109, 111, 114, 119, 121, 122] manage individual user requirements by implementing authentication and account management services, associating each user to their dashboard configuration persistently. This user management approach is also applied to other solutions found, like in [120], where a user creates an account and fills a questionnaire about their abilities to access their personalized view based on the previous information finally. Also, in [106], users’ behavior and events need to be stored to adapt the display. However, these works do not further discuss the storage method nor the possibility of storing different versions of a user dashboard over time, which could be very useful for collecting the evolution of the preferences or user behavior. This fact also applies to [127]. Although interactivity with the system needs to be captured to trigger the recommendation process, it is only mentioned that the feedback from the user is stored, but no further details are given. On the other hand, various selected works take advantage of structured files or models to hold individual dashboard requirements that finally serve as inputs of generators that provide the configured dashboard instance meeting the original specifications. In this category fall those solutions based on configuration files [112, 118, 128], data models [108], context models [116], software product lines [68, 70, 107], or model-driven development [101, 104, 113, 124, 125]. In this case, user requirements are managed “outside” the dashboard systems before their exploitation and stored within individual files or models. 51 State-of-the-art In the case of [117], no requirement management is explicitly performed, as the pre-defined templates enclose general requirements collected from the gathering and analysis phase. The same happens in [126], which only references the characteristics of the datasets to personalize the dashboards, but the final dashboard implementation is undisclosed. The remaining solutions use semantic reasoners [110, 123] and knowledge graphs [115] to manage the dashboards’ information requirements, but the management of the end-users’ requirements is not further discussed. RQ4. Can the proposed solutions be transferred to different domains? Most of the solutions can be transferred to different domains. This is the case of solutions in which data sources can be uploaded or specified [103, 106, 119, 122, 126, 127], solutions based on MDD [101, 104, 113, 124, 125] or SPL strategies [68, 70, 107], and some solutions based on configuration files [112, 128]. In the case of [126], although the application domain is business intelligence, the applied thresholds refer to abstract features of the datasets, such as their size, so this solution could be employed in other contexts. This is also the case of [127], in which the ML recommendation process is triggered by user interactions and the suggested visualizations are adapted to the target dataset, no matter the domain. Some solutions allow freedom when configuring the dashboards, but only within the original domain (environmental performance [105], micro-services monitoring [111, 118], emergency situations [109], learning analytics [114, 116], physics [102], economics [121]. The works that focus on sensor monitoring [110, 123] and device clouds [108] employ methodologies that could be reused for other domains. Still, in the end, the dashboard solutions would need to be built from zero to adapt them to new domains. The remaining solutions are tightly coupled to its original context, which is the case of [120] –the adaptation is focused on users’ physical abilities–, [117] –the 52 Chapter 2 templates are related to specific areas of BI–, and [115] –a specific Smart City ontology is employed to tailor the dashboards–. As a clarification, it is worth stating that every methodology employed in the selected papers could be applied to develop dashboards in different data domains. However, the purpose of this research question is to identify the most flexible and powerful solutions regarding their abstraction and, therefore, their potential reuse to other domains in an automatized manner (i.e., avoiding developing the same solution for new domains manually). RQ5. Has any artificial intelligence approach been applied to the dashboards’ tailoring processes, and, if applicable, how have these approaches been involved in the dashboards’ tailoring processes? Only a few works have applied or mentioned AI when presenting their dashboard solutions. The most explicit application of AI can be found in [127], in which authors use a Multi-Armed Bandits (MBAs) reinforcement learning model to improve the recommendations and adaptation of the dashboards’ visual components based on the users’ feedback. In [106], the Apriori algorithm [129] is used to compute association rules, a technique from the data mining field. This solution takes advantage of “pairs of events that have happened in sequence” that fed the Apriori algorithm to obtain a set of if-then rules that will be used to restructure the dashboard in terms of the presented data and visualization types employed. In a study referencing those mentioned above [130], the same authors specify that their solution also supports the restructuration of the dashboards through other methods, like Markov chains or top-k queries, but they do not detail these processes. Also, in [110, 123], a semantic reasoner is employed to discover potentially interesting data compositions through a knowledge base and semantically annotated visualization and data services. However, no details about the implementation of the reasoner are addressed in this work. 53 State-of-the-art Finally, although it is not explicitly involved in the dashboard generation process, the work exposed in [68] focuses on labeling and training ML models to obtain a classifier and detect potentially misleading visualizations. Other papers mention the possibility of introducing AI techniques, like [128], to rate the generated dashboards through classification algorithms. Still, the authors state that is out of the scope of the paper and refer to [131] as an inspiration. There is also a work that mentions inference methods [116] to provide a suitable dashboard given the context, user description, and analysis scenario, although no further details are given, nor the inference method named. RQ6. How mature are tailored dashboards regarding their evaluation? Only 11 articles mention any kind of user testing. The testing methodologies for each of these works are summarized in Table 10. There are some works, like [104, 108, 109, 113] that mention user testing outcomes, however, they do not describe the methods employed nor the samples’ sizes. Also, it should be underlined that they collected user feedback but do not detail the specific method. Table 10. Summary of user-testing methods applied in the retrieved articles. Source: own elaboration. Article Interview Survey Data exploration Other #Participants [105] Yes - - Expert evaluation 5 experts 13 participants [111] Yes Yes - - 15 participants [108, 109] - - - Undisclosed Undisclosed [117] - - - Multi-criteria evaluation [132] 40 enterprises [104, 113] - - - User feedback 2 participants [103] Yes - Yes - 7 novice users 8 BI experts 54 Chapter 2 [119] Yes [133] - Think-aloud protocol - 6 novice users [114] Yes [134] - Yes - 12 participants [121] - Yes Yes - 60+ participants The solutions presented in [68, 70, 101, 102, 106, 107, 110, 112, 115, 116, 118, 120, 122-128] did not mention any formal testing regarding end-users’ perceptions about the dashboard solutions, mentioning these evaluations as future work. Some of these proposed tools were tested in real-world scenarios to prove their applicability and functionalities, but this research question is focused on user perceptions on the solutions. 2.4 Conclusions The SLR aimed at identifying current trends and solutions within the domain of study of the present Ph.D. thesis: the automatic generation of tailored information dashboards. The research questions covered relevant aspects to consider when addressing generative workflows of dashboards. With the collected information, it is possible to select the best strategy to implement approaches that tackle the automatic generation of these tools. By virtue of the SLR outcomes, the decision is to follow a meta-modeling approach to conceptualize the generative framework and the SPL paradigm to materialize and transform abstract features into source code. The analysis of the retrieved articles has proved that these two approaches are feasible in this domain; almost one third of the selected works -8 out of 30- employ one of these two paradigms. However, the feasibility of the solutions was not the only object of study. Other attributes, like flexibility and evolving capabilities, possibility to transfer the solutions to any data domain, traceability of the dashboard requirements, and the potential to 55 State-of-the-art integrate AI algorithms to adapt the dashboard features to environmental changes were also under the focus of this review. On the other hand, other challenges and open research paths have been identified during the analysis of the selected works. For example, a few works mention leveraging Machine Learning (ML) models to unburden users from complex tasks such as configuring the dashboard layout. However, these applications are not detailed or are in their first development stages. This research line is very promising, as these methodologies could yield several benefits to assist users and provide them with useful guidelines to learn and understand how to design effective dashboards and visualizations. In this sense, choosing a meta-modeling approach is also suitable for applying ML methods, as these models require structured data to learn from. Instances of the meta-model can be provided as inputs to identify patterns that make specific configurations useful, efficient, effective, usable, etc. In short, carrying out this analysis have provided clear evidence that solutions following meta-modeling and/or SPL paradigms meet these properties, concluding that the versatility of these solutions provides a great starting point for implementing a generative dashboard system. 57 3 Dashboard meta-model This chapter addresses the main outcome of the present research: the dashboard meta- model. According to the results of the SLR, using a meta-modeling approach has proved to be suitable in the information dashboards’ domain for a series of relevant factors. First, it enables the abstraction of commonalities. Although information dashboards can present disparities in their design and look very different at first sight [92], they are developed using common, low-level elements [135]. Second, the meta-model provides structures for these low-level features to arrange them into a set of entities and relationships, capturing how the different elements that comprise information dashboards influence each other. Third, this approach supports the automatic generation of products through methodologies such as Model-Driven Architecture (MDA) and Model-Driven Development (MDD), or Software Product Lines (SPLs). This approach implies that the obtained meta-model can be translated into source code to develop real-world, functional products. 58 Chapter 3 Finally, adopting a model-driven generative approach entails traceability of every design decision taken until obtaining the final product, from theoretical specifications (model instances) to tangible system features (dashboards' code). Traceability is crucial in the context of information dashboards and visualizations. As will be discussed in subsequent sections, this attribute improves the transparency of the whole design process, which could result in a better understanding of data visualizations. In addition, it enables easier version control of each model instance, keeping the evolution of individual dashboard requirements. In addition to the above, meta-modeling has many more benefits: increase in flexibility, faster developments, reusability of core assets, reusability of knowledge, etc. [136]. The suitability of a model-driven approach in this context led to the proposal of a dashboard meta-model to tackle the automatic generation of software interfaces for supporting decision-making processes. The rest of this chapter details the developed dashboard meta-model and the followed domain engineering methodology to obtain the final version (Section 3.1), the approach conceived to implement a generative pipeline (Section 3.2), and the conclusions derived from the application of this proposal to the information dashboards’ domain (Section 3.3). 3.1 Dashboard meta-model The dashboard meta-model consists of a series of elements, properties, and relationships among them. As stated in Chapter 2, the dashboard meta-model is framed within the MDA paradigm [137]. The dashboard meta-model is part of the four-layer meta-model architecture proposed by the OMG, in which a model at one layer is used to specify models in the layer below [18]. Figure 8 shows the correspondence of the dashboards domain with the followed MDA paradigm [89]. 59 Meta-model Figure 8. Correspondence of the MDA framework levels with the followed approach in the dashboards and data visualizations domain. Source: own elaboration. The dashboard meta-model is an M2 model, which is defined by a meta-meta- model, and it will be used to instantiate dashboard model which, in turn, will be transformed into real-world, functional dashboards. The first version of the dashboard meta-model was an instance of MOF; however, it was finally transformed into an instance of Ecore using Graphical Modelling for Ecore included in Eclipse Modeling Framework (EMF). The development of the meta-model has been subject to several iterations with the aim of improving the captured domain elements. Figure 9 shows an overview of the improvements made in each iteration. The first two iterations were focused on identifying the static, tangible elements of data visualizations and dashboards (layout, visual components, resources, etc.). The next two iterations, on the other hand, deal with the user characteristics and their intents with the dashboard. The fourth iteration 60 Chapter 3 aims at modeling interactive, dynamic behavior between the dashboard elements. Finally, the last improvement included more complex and higher-level concepts like the domain of the data to be displayed. An animated overview of this evolution can be consulted at https://youtu.be/ZyAZIRZXogc. As it can be seen, the meta-model development process was incremental, which helped in focusing on the entities of the dashboard domain one by one to identify better their properties and relationships. Figure 9. Iterations in the development of the dashboard meta-model. Source: own elaboration. The following subsections detail the meta-model development process towards obtaining the current version of the meta-model. All these iterations have been published in [49-54, 70]. 61 Meta-model 3.1.1 Basic layout The first iteration of the meta-model development process was focused on identifying the basic skeleton that makes up information dashboards and visualizations. This version of the meta-model was employed to develop dashboards within the employment and employability domain. This case study is detailed in Section 4.4.1, and the associated publications at Appendix E. Domain engineering for generating dashboards to analyze employment and employability in the academic context and Appendix L. Taking advantage of the software product line paradigm to generate customized user interfaces for decision-making processes: a case study on university employability [49, 70]. The very first version of the dashboard meta-model included the layout specification and the user entity (Figure 10). Figure 10. Dashboard meta-model (Initial version). Source: own elaboration, published in [70]. The meta-model was defined as an instance of MOF and captures the highest- level entities involved in the domain. In this case, the <> uses a <>, which is composed of one or more pages that contain one or more 62 Chapter 3 containers (divisions of the screen with a width and height). Containers can be specified as rows or columns, and they can contain, in turn, more containers recursively. Finally, each container holds a <>, which references any kind of resource in the dashboard: data visualizations, text, images, etc. 3.1.2 Including the components’ specification The second iteration included the detailed specification of dashboards’ components, specifically data visualizations [50]. This part is very complex because several primitive elements and combinations are involved when building these tools. To capture these elements, different types of visualizations were analyzed to identify common and abstract features among them. As can be seen in Figure 11, this meta-model excerpt extends the initial version’s <> entity. This version is still an instance of MOF. Figure 11. Dashboards' components specification (increment #1). Source: own elaboration, published in [50]. 63 Meta-model Specifically, the features of the meta-model have been identified through domain engineering [81-84, 86], a review of visualization grammars [135, 138], and the outcomes of the previous literature review [48]. These methodologies and resources were complemented with an example- driven approach [139], providing an approach to identify common fine-grained components of information dashboards and data visualizations. First, a component in a dashboard display does not have to be necessarily a visualization. Some of the dashboard's containers can hold graphical resources (e.g., images or illustrations) or text, to provide a context to the displayed information or instructions about how to employ the tool. But in the end, the main components of dashboards are the information visualizations that present the domain's data. On the other hand, components can also be controls or tools that can affect several visualizations at once. For example, filters that allow to select or highlight data points that meet certain conditions among every visualization in the dashboard are identified as “global control” in the meta-model. A visualization can be affected by the aforementioned global controls, and by "local" controls (i.e., controls that only affect a specific visualization). This distinction allows having control of visualizations both on global and local levels, thus letting users explore data more freely. In this case, a control is understood as any explicit handler that allows modifications on visualizations at any dimension: displayed data, design, visual encoding, etc. Moreover, a visualization can be decomposed into lower-level elements that are shared among all the potential instances. That is why the meta-model reflects that a visualization is composed of one or more primitives. The <> class is a high-level class that encompasses different elements. These elements can be axes, annotations, marks, and resources (images, text, etc.). But before detailing the meaning of these low-level components, it is important to clarify that the visualizations' local controls can affect these primitives; as 64 Chapter 3 introduced, a control allows the modification of the visualizations, that is, their primitives, which are who hold the actual information. In addition, primitives of a given visualization can also be modified by the available interaction methods. For example, a visualization that allows zooming will change the primitives when this interaction method is employed. Once these classes and associations have been clarified, each primitive will be detailed. First, one of the most important primitives regarding visualizations are axes. Axes contain information about the scales and thus, about some properties that can influence the appearance of a visual mark, as it will be explained. Axes can take different forms, which are encoded as a meta-class attribute (type); for example, an axis can be linear or radial, presenting curvature in its presentation. Axes can be labeled to clarify their role, or the variable being represented. A meta-class <