SEiIS Lab

Research

Research Areas

I. Software Engineering Areas of Research

Software Cost Estimation applying Computational Methods

Research in this area concentrates on open issues in the context of predicting the overall costs to develop or enhance software systems based on incomplete, imprecise, uncertain and/or noisy input. The two main approaches investigated are the quantitative and the qualitative approach. The quantitative approach examines the use of various forms of Intelligent Systems, Artificial Neural Networks (ANN), Probabilistic Systems, Fuzzy Decision Trees (FDT), Fuzzy Inference Systems (FIS) and Hybrid Systems (ANN combined with Genetic Algorithms (GA); Ridge Regression combined with GA; Conditional Sets combined with GA; Classification and Regression Trees combined with FIS) to model and forecast software development effort. To attempt this, four different datasets have been utilised, namely COCOMO, Kemerer, Albrecht and Desharnais datasets. Each dataset includes historical data on a number of software projects (such as, lines of code, function points, effort). Relevant research activities have also included the investigation of ISBSG dataseries, which included developing models for forecasting development effort and isolating the factors that have the highest descriptive power on the predicted variable (effort). The ISBSG dataset, obtained from the International Software Benchmarking Standards Group, contains an analysis of software project costs belonging to a broad cross-section of industry and coming from various countries. The projects also range in size, effort, platform, language and development technique data. In addition, the dataset contains a vast record of software projects and measures of a large number of attributes. Other Statistical (Regression), Conditional Sets, Categorical and Regression Decision Trees, Genetic Programming, along with various Clustering and Classification Algorithms are used to understand and quantify the effect of project attributes and estimate effort by an analogy notion. The projects that are classified in the same chance nodes and are proven to comply with the same regression equations, association rules and genetically evolved ranges, are then used to identify any present correlations between effort and cost attributes. Research in this area also utilises Input Sensitivity Analysis (ISA), Attribute Ranking and Feature Subset Selection (FSS) algorithms to export the optimum subset of features and establish accurate cost estimations for a particular technique. Relevant research employed Conformal Predictors (CP) to produce reliable confidence measures and define certain ranges of effort that suggest improved project effort estimations. The qualitative approach identifies the critical cost factors and attempts to model cost estimation using Fuzzy Cognitive Maps (FCM) by representing the factors that affect cost as nodes with certain interrelationships. Once such model is finalised, various scenarios can be simulated to test the validity of the model and its efficiency in providing indications for cost estimation as well as to provide support in the context of project management and decision support.

Analysis and Forecasting οf Software Reliability Dataseries

Software reliability is one of the most significant quality features according to the ISO9126 software quality standard. This research aims at investigating the nature and structure of a set of dataseries known as the “Musa datasets” for software reliability. Non-parametric methods like R/S analysis and its variations were employed to test for the presence of long-term dependence in the datasets. So far, results have shown that these software reliability dataseries follow a pink noise structure with randomness being the dominant characteristic. Artificial Neural Networks (ANNs) have been employed to investigate the level of forecasting ability of these datasets and compared results against the previous R/S findings that strongly supported the random structure. Future work will focus on collecting new data of modern software systems (i.e., operating systems, e-mail servers, web browsers) and testing the new datasets with R/S analysis to investigate consistencies in the structure of software reliability. Finally, hybrid systems comprising Artificial Neural Networks (ANNs) and Genetic Algorithms (GA) will be employed for prediction purposes and the new results will be contrasted with those obtained for Musa’s dataseries so as to detect whether modern approaches for producing software affect, and to what extent, their reliability.

Software Components: Requirements, Classification, Retrieval

Software components management involves the handling of components in a repository and their extraction based on ideas borrowed from the field of Computational Intelligence. The clustering of components uses techniques to group components in heterogeneous sets aiming for both the efficient storing and retrieval of the most suitable components from a repository. One such technique employs GAs where a binary representation of components in a repository is used to compute an optimal component categorisation based on initially randomised classifiers along with a threshold of bit similarity which guides the assignment of a component to a class. Users are then able to select a component by providing their preference which is matched against the optimal classifiers produced by the GA and subsequently retrieves the components assigned to the best class and displays them for the user to select. Another technique uses a hybrid combination of entropy and fuzzy k-modes clustering. With this approach, components in a repository are preliminarily grouped based on an entropy-based clustering algorithm as a pre-processing step in order to discover which components are the most representative of the repository as well as how many clusters are inherent in the repository. Thereafter, a fuzzy k-modes algorithm is employed to perform the actual clustering of the components using the outputs of entropy-based clustering. With the addition of fuzziness, the k-modes algorithm allows components to belong to several clusters with a degree of participation. As a result, it deals with the uncertainty residing in a component repository and also provides flexibility as the repository grows in size. Users give their preference to be matched against the final cluster centres (or representatives), which are computed through the hybrid algorithm. The nearest cluster is selected and the most suitable components assigned to that nearest cluster are presented for the user to choose from. Both approaches were thoroughly tested and validated and the results prove that the utilisation of Computational Intelligence methods greatly assists in the classification and retrieval of software components and component management as a whole.

Requirements Engineering

In this research area, two different approaches have been followed. The first involves a new elicitation methodology for enhancing the traditional requirements engineering process. The methodology is based on Human, Social and Organizational (HSO) factors that exist in the business environment of the client organisation and affect the functional and non-functional part of a software system under development, as well as its future users. Such factors include, for example, the working procedures of potential users, the working habits and customs of users, their workload, the communication and cooperation between users within their working environment, their psychology and temperament, the organisation and visibility of their everyday working activities, the level to which the product will promote employees’ productivity and content, legal and ethical issues posed. The methodology proposes a set of activities for uncovering HSO factors, assessing their effect on known system requirements and recording new requirements resulting from these factors. The second approach constitutes a new and complete requirements engineering process which is based on Natural Language Syntax and Semantics (NLSS). Although at a preliminary stage, this newly proposed process focuses on the way requirements are elicited, analysed and recorded. Basic elements of the syntax and semantics of a sentence (e.g., verbs, nouns, adjectives, roles, etc.), guide elicitation activities so as to ask specific, predetermined questions and gather the relevant functional and constraints information. This information is then written in dedicated syntactic forms of requirements classes. The resulted requirements are thus more complete, while they are written in a semi-formal natural language style, with less ambiguity and vagueness, minimising the time-consuming nature of engineering requirements from huge documents. Once requirements are expressed as semi-formal statements of the proposed type, a dedicated CASE tool, specially developed to support the NLSS process, reads the statements and automatically produces semi-formal diagrammatic notations, such as Data Flow Diagrams and Class Diagrams.

Recommender Systems

In the current age of information overload, it is becoming increasingly harder to find relevant content. This problem is not only widespread but also alarming. Over the last 10-15 years recommender systems technologies have been introduced to help people deal with these vasts amount of information and they have been widely used in research as well as e-commerce applications. The main aim of a Recommender System is to provide accurate recommendations back to the user that he/she may not thought about or it may was difficult to find. Recommender Systems are divided into two main categories: Content Based Systems (CB) and Collaborative Filtering Systems (CF). Content based systems are providing recommendations to the active user based on text similarities that exist in different articles. This method is limited to media files where text or tags can be assign. Collaborative Filtering systems are divided into memory-based and model-based algorithms. Memory based algorithms help the active user to get recommendations based on similarities between his/her neighbors (users with similar characteristics) or between items he/she previously bought or seen. Model based algorithms such as matrix factorization or latent factor models try to build a profile in the user-item matrix and then to predict and provide accurate recommendations back to the user. Recommender systems face many challenges and limitations (sparsity, cold-start problem etc) so to deal with these problems hybrid techniques have been introduced that combine CB and CF techniques. Also other algorithms and techniques have been produced to deal with the limitations of RS in order to provide accurate recommendations back to the user.

Cloud Computing

The fact that Cloud Computing is steadily becoming one of the most significant fields of Information and Communication Technology (ICT) has led many organizations to consider the benefits of migrating their business operations to the Cloud. Decision makers are facing strong challenges when assessing the feasibility of the adoption of Cloud Computing for their organizations. Cloud adoption is a multi-level decision which is influenced by a number of intertwined factors and concerns thus characterizing it as a complex and difficult to model real-world problem. The extremely fast moving nature of the cloud computing environment changes makes particularly difficult the decision making process. Any model or framework aims to support cloud computing adoption should be quite flexible and dynamically adaptable. Guided by these assumptions/prerequisites we tried to approach the problem by using adapted computational intelligence techniques which have shown promising results indicating strong ability to capture the dynamics of complex environments. A brief description of our related work on this area follows. The decision for adopting the cloud environment was addressed first using an approach based on Fuzzy Cognitive Maps (FCM), which models the parameters that potentially influence such a decision. The construction and analysis of the map is based on factors reported in the relevant literature and the utilization of experts’ opinion. The proposed approach is evaluated through four real-world experimental cases and the suggestions of the model are compared with the customers’ final decisions. The evaluation indicated that the proposed approach is capable of capturing the dynamics behind the interdependencies of the participating factors. Further research on the same topic suggested the use of Influence Diagrams (ID) as modeling tools aiming to support the decision process. The developed ID model combines a number of factors which were identified through literature review and input received from field experts. The proposed approach is validated against four experimental cases, two realistic and two real-world, and its performance proved to be highly capable of estimating and predicting correctly the right decision. Continuing in the same direction, we proposed two decision support modeling approaches based on ID aiming to model the answer to the question “Adopt Cloud Services or Not?” Two models are developed and tested; the first is a generic ID with nodes interacting in a probabilistic manner, while the second is a more flexible version that utilizes Fuzzy Logic. Both models combine several factors that influence the decision to be taken, which were identified through literature review and input received from field experts. The proposed approaches are validated using five experimental scenarios, two synthetic and three real-world cases, and their performance suggests that they are highly capable of supporting the right decision. Our most recent work proposes a multi-layer FCM approach which models a number of factors which play a decisive role to the cloud adoption issue and offers the means to study their influence. The factors are organized in different layers which focus on specific aspects of the cloud environment, something which, on one hand, enables tracking the causes for the decision outcome, and on the other offers the ability to study the dependencies between the leading determinants of the decision. The construction and analysis of the model is based on factors reported in the relevant literature and the utilization of experts’ opinion. The efficacy and applicability of the proposed approach are demonstrated through four real-world experimental cases.

Software Project Management

Research carried out in the area of software project management focuses on two main topics. The first topic involves solving the problem of human resource allocation and task scheduling in software development projects using computational intelligence. Traditional approaches for automated decision support or tools for project scheduling and staffing found in literature often make heavy assumptions so as to lower the complexity of the whole process. Examples of such assumptions may be that all software developers have the same skills and/or level of experience, that developers are expected to deliver with the same productivity regardless of the working environment and their teammates, that members of a team either possess a skill or not (no intermediate case where a skill may be possessed at different levels between employees), that tasks are worked on in the same way, and so on. The work carried out in this topic includes the use of evolutionary algorithms in order to minimize project duration and cost based on developer productivity, type of task interdependence and communication overhead. The second topic concerns project staffing using personality types aiming to improve various features of software development, such as software quality, job satisfaction, team performance, and social cohesion and conflict. Attributes of personality in the development process are often partially or totally neglected leading to significantly inaccurate estimates in terms of time, cost and quality. Studies focus on investigating personality issues of software professionals to help assess the effectiveness of a team as a whole in order to form teams that will not have communication and cooperation problems because of personality type mismatches. Additionally, personality types are used for associating developers with tasks, so as to ensure that the right type of personality is selected for to undertake a task. Teams formed in these ways will be able to execute tasks and activities with the maximum possible productivity, at the same time shortening development schedules and lowering effort.

Software Testing based on Dynamic Program Analysis and Automatic Test-Cases Generation

Research here addresses several open issues in the context of software testing and in particular deals with static and dynamic program analysis. Problems encountered in this area involve the creation and presentation of control flow graphs, code initialisation sequences, the support of large and complex programs in a variety of programming languages, the extraction of program paths, and the identification of variables’ scopes and method identification. Another equally important issue raised by programmers is the presentation of program analysis results in a friendly graphical user interface with interactive capabilities. So far, a novel multilayered architecture has been proposed that attempts to offer specific solutions to the aforementioned problems by providing a set of embedded cooperating software modules for program analysis and by focusing on practical, static and dynamic analysis tools for programmers. Two types of analysis are supported: runtime and non-runtime; each type comprises modules that collaborate and provide an interactive user-programmer GUI displaying the results in a relatively short execution time, the latter being proportional to the size of the program (in lines of code). Based on the proposed architecture, the issue of automatically producing software test cases using intelligent optimisation algorithms, such as Genetic Algorithms (GAs) has been investigated. More specifically, a new evolutionary algorithm has been proposed, which is able to automatically produce a set of test data for a given program according to a specified criterion (e.g., statement, edge, condition/edge). The performance of the algorithm is measured over a pool of sample programs and benchmarks and may be characterised as highly successful. In addition, program slicing is currently being studied, as well as the automatic creation of source code dependence graphs and the integration of these into the main architecture. Finally, techniques of program slicing and testing using symbolic execution and model-based checking (e.g., using JML) are also under examination.

Design and Development οf Quality Web Applications and Electronic/Mobile Commerce Services and Applications

Mobile Commerce (M-Commerce) is an evolving area of e-Commerce, where users can interact with the service providers through a mobile and wireless network, using mobile devices for information retrieval and transaction processing. M-Commerce services and applications can be adopted through different wireless and mobile networks, with the aid of several mobile devices. However, constraints inherent in both mobile networks and devices influence their operational performance; therefore, there is a strong need for taking into consideration those constraints in the design and development phases of m-Commerce services and applications in order to improve their quality. Another important factor in designing quality m-Commerce services and applications is the identification of mobile users’ requirements. Furthermore, m-Commerce services and applications need to be classified based on the functionality they provide to the mobile users. This kind of classification results in two major classes: directory and transaction-oriented services and applications. This research builds upon and extends the lab’s previous work on designing and developing m-Commerce services and applications. This approach takes account of the mobile users’ needs and requirements, the classification of the m-Commerce services and applications, as well as the current technologies for mobile and wireless computing and their constraints. In this context the different characteristics and capabilities of modern mobile devices (e.g., smart phones, PDAs, etc.) have been studied to form the level to which such devices affect the quality that mobile software must provide for the needs of the modern mobile user.

Smart Data Processing

The term Smart Data is used to emphasize the latent value inherent in widely dispersed and unconnected data sources. The decisive criterion here is not necessarily the amount of data available, but smart content techniques that promote not only the collection and accumulation of related data, but also its context and understanding. This requires Systems of Deep Insight to discover (hidden) associations between the data, prioritize results, find useful insights, discover large-scale patterns and trends within the data to reveal a wider picture that is more relevant to the problem in hand and react to them.

Web Engineering

The World Wide Web has become a major delivery platform for a variety of complex and sophisticated enterprise applications in several domains. In addition to their inherent multifaceted functionality, these Web applications exhibit complex behaviour and place some unique demands on their usability, performance, security, and ability to grow and evolve. However, a vast majority of these applications continue to be developed in an ad hoc way, contributing to problems of usability, maintainability, quality and reliability. While Web development can benefit from established practices from other related disciplines, it has certain distinguishing characteristics that demand special considerations. In recent years, there have been developments towards addressing these considerations. Web engineering focuses on the methodologies, techniques, and tools that are the foundation of Web application development and which support their design, development, evolution, and evaluation. Web application development has certain characteristics that make it different from traditional software, information system, or computer application development.

DevOps

Targeting faster application delivery and born from agile methodologies, the DevOps approach has recently started to be applied to software development and operation processes. Among various process tasks like testing, deployment, reconfiguration, etc., the DevOps approach favors a new culture between development and operations teams, aiming to achieve a closer collaboration and communication without silos and barriers between these teams and to emphasize automation and sharing. DevOps describes the main stakeholders of producing and supporting a distributed software service or application that possess a mixture of code skills and system operation skills. The study of the DevOps phenomenon and its various aspects is closely related to cloud computing. Although cloud computing and DevOps are two independent computing/software paradigms or strategies where one does not prerequisite the other, they are often used together under the modern software development process, which advocates in favour of delivering everything as a service. Our research focuses on filling the gaps in DevOps area that arise from the literature review and mainly relies on applied research. Among others, this research includes introducing methods and techniques and implementing tools and processes for supporting it.

Service-Oriented Architecture

Service-oriented architecture (SOA) as the most recent emerging distributed development architecture, represents an appropriate architectural model for composite services, enabling, even dynamically, combinations of a variety of independently developed services to form distributed software systems. Microservices share a similar definition with SOAP and RESTful services, that highlights the relationship between microservices and Service-oriented Architectures (SOA). Although microservices can be seen as an evolution of SOA, they are inherently different regarding sharing and reuse. SOA is built on the concept of fostering reuse, a share-as-much-as-possible architecture style. In contrast, microservices architecture is built on the concept of a share-as-little-as-possible architecture style. Our research in this area mainly focuses on service composition in terms of introducing frameworks and tools to support the decision-making for enabling this architecture, taking into account all related aspects such as QoS, performance, cost, etc.

II. Intelligent Information Systems Areas of Research

Intelligent Decision Support Systems

A new framework for developing IIS has been proposed, which is based on Fuzzy Logic, Neuro-fuzzy computing and Genetic Algorithms (GAs). More specifically, we have introduced and used a modified form of Fuzzy Cognitive Maps (FCM), enhanced by a specially designed Fuzzy Knowledge Base and Genetic Algorithms to produce a hybrid form of IIS. Such a hybrid system is able to handle the problems of traditional FCM (e.g., the limit-cycle phenomenon) and offers the ability to perform multi-objective scenario analysis and optimisation. The proposed IIS has been successfully employed for crisis modelling and decision support in various real world problems (such as the settlement of the Cyprus issue, the S-300 missiles and Imia crises), yielding very promising results. This research part also examines the extension of FCMs proposing a multi-layered structure comprising smaller FCMs and working using inheritance characteristics. In particular, the multi-layered FCM targets problems that involve factors with high levels of complexity. These factors may be decomposed into other sub-factors describing the behavior of their originator. This decomposition may continue at various layers thus resulting in a hierarchy of elementary and more easily manageable factors. The modelling of this elementary piece of information is based on the multi-layered structure of FCMs where each composite node-factor is essentially a child FCM at a lower level. In addition, multi-objective optimisation is addressed in order to provide the means for introducing hypothetical scenarios and enable simulation at various levels and concepts of interest.

Swarm Intelligence

Swarm intelligence is the discipline that deals with natural and artificial systems composed of many individuals that coordinate using decentralized control and self-organization. In particular, the discipline focuses on the collective behaviors that result from the local interactions of the individuals with each other and with their environment. Examples of systems studied by swarm intelligence are colonies of ants and termites, schools of fish, flocks of birds, herds of land animals. Some human artifacts also fall into the domain of swarm intelligence, notably some multi-robot systems, and also certain computer programs that are written to tackle optimization and data analysis problems.

Neural Networks

A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature. Neural networks can adapt to changing input; so the network generates the best possible result without needing to redesign the output criteria. The concept of neural networks, which has its roots in artificial intelligence, is swiftly gaining popularity in the development of trading systems.

Fuzzy Logic

In fuzzy mathematics, fuzzy logic is a form of many-valued logic in which the true values of variables may be any real number between 0 and 1 both inclusive. It is employed to handle the concept of partial truth, where the true value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. Fuzzy logic is based on the observation that people make decisions based on imprecise and non-numerical information. Fuzzy models or sets are mathematical means of representing vagueness and imprecise information (hence the term fuzzy). These models have the capability of recognising, representing, manipulating, interpreting, and utilising data and information hat are vague and lack certainty. Fuzzy logic has been applied to many fields, from control theory to artificial intelligence.

Evolutionary Computation

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character. In evolutionary computation, an initial set of candidate solutions is generated and iteratively updated. Each new generation is produced by stochastically removing less desired solutions, and introducing small random changes. In biological terminology, a population of solutions is subjected to natural selection (or artificial selection) and mutation. As a result, the population will gradually evolve to increase in fitness, in this case the chosen fitness function of the algorithm. Evolutionary computation techniques can produce highly optimized solutions in a wide range of problem settings, making them popular in computer science. Many variants and extensions exist, suited to more specific families of problems and data structures. Evolutionary computation is also sometimes used in evolutionary biology as an in silico experimental procedure to study common aspects of general evolutionary processes.