Model level code smell detection using EGAPSO based on similarity measures

1. Introduction

Software evolution and maintenance involve high costs of the development process, particularly as systems become larger and more complex. A usual concern that makes software maintenance and evolution difficult is the existence of structural design problems. These design problems are otherwise known as code smells. The majority of the existing approaches of code smell detection is in the code level [1], [2], [3], [4], [5].

The quality of the software can be improved if code smells are identified in the model level of the software. Detecting the code smells in the model level is very difficult as all the metrics are not supported in model level [6]. A model maintenance activity includes adding new functionalities, incorporating modifications if needed, detecting poor design fragments and correcting them [4].

In model level maintenance, majority of the existing approaches is rule based [6], [7]. All code smells cannot be detected in rule based approach because finding the right threshold value of the metrics is a tedious task. For example, a rule that detects large classes involves metrics related to the class size. Even though the metrics of large classes are easily measurable, finding the right threshold value is significant. To be precise, a class considered as a large class in a program can be an average class in any other program. The objective of the work is to identify the code smells in the model level. This paper focuses on detecting the code smell in the UML class diagram, which does not rely on the definition of rules. Thus the approach does not require a specification of the metrics to use their related threshold values for code smell detection.

In this work, code smell detection is based on, estimating the similarities between the initial model and the base example. To this end, EGAPSO [8] is used for finding the code smells in the model level. Three open source softwares are used in this work. Two of them are considered as the initial models and the third one is considered as the base example. Metrics are computed from both initial model and base example. Based on the metrics, the similarity between the class of the initial model (CIM) and the class of the base example (CBE) is determined. Finally, using EGAPSO, all the code smells are detected. The EGAPSO is compared with the code smell detection approaches namely genetic algorithm (GA), DEtection and CORrection (DÉCOR), Parallel Evolutionary Algorithm (PEA) and Multi-Objective Genetic Programming (MOGP).

The remainder of this paper is organized as follows: Section 2 describes a literature review of code smell detection. Section 3 presents the relevant background material for the presented work. Section 4 describes the approach for code smell detection using EGAPSO; empirical study definition and design are discussed in Section 5. In Section 6 experimental results and its analysis are discussed; Section 7 presents the threats that could affect the validity of the code smell detection using EGAPSO is discussed. Finally, conclusion and scope for future work are discussed in Section 8.

2. Literature review

Smell detection has been addressed in the literature from different perspectives. The approach for code smell detection can be classified into three broad categories. The first category is search based approach, the second category is non-search based approach and the third category is the model based approach.

2.1. Search based techniques

Seng et al. presented an approach that uses genetic algorithm for suggesting the list of refactoring that do not change the external behaviour of the software [9]. The researchers used a genetic algorithm for finding the set of refactoring that has to be applied in the software. They used a fitness function that depends on an existing set of object oriented metrics which is evaluated on the open source software JHotDraw. Using this approach, more than 90% of refactoring operations were found.

Harman et al. used Pareto optimality to improve the search-based refactoring, which combines metrics and provides numerous sequences of optimal refactoring [10]. Several runs of this search-based refactoring system produced Pareto front, whose values represent Pareto optimal sequences of refactoring. The drawback is, the Pareto optimal search explicitly focused only on suggesting the refactoring, not on detecting the defects in the code.

Jensen et al. proposed an approach REMODEL based on genetic programming and a set of software metrics to make out the most appropriate set of refactoring that introduce design patterns to the software [11]. The researchers used Quality Model for Object Oriented Design (QMOOD) [12] a suite of OO metrics to improve the quality of the software design. The approach REMODEL was validated using the Repository for Model Driven Development (REMODD) a web based software system. The limitation of the approach REMODEL is that the design defects are not detected explicitly as the main focus is on suggesting the suitable refactoring to improve the quality of the software.

Kessentini et al. used genetic programming for an automatic approach to define rules for code smell detection. The rules are the combination of metrics and threshold values [5]. The researchers compared their approach to Harmony Search (HS), Particle Swarm Optimization (PSO) and Simulated Annealing (SA) to find a near optimal set of detection rules. When compared to other approaches, the researchers found a better result of an average of 75% of known code smells.

Ouni et al. had developed an approach for code smell detection and correction [13]. The approach developed by Ouni et al. has two objectives, one is code smell detection and the other is the correction of code smell. The correction of code smells focuses on suggesting the suitable refactoring operations. Detection is based on the technique of Genetic Programming whereas Non-dominated Sorting Genetic Algorithm (NSGA-II) is used for the correction of code smells. This approach had been evaluated on six open source projects, namely Gantt project, Quick UML, Azureus, Log4J, ArgoUML and Xerces. The approach attained 94% precision and 97% recall for Ganttproject, 91% precision and 91% recall for Xerces, 87% precision and 93% recall for ArgoUML, 86% precision and 93% recall for QuickUML, 73% precision and 87% recall for Azureus and 79% precision and 83% recall for Log4J.

Ghannem et al. proposed an approach to detect code smell based on Genetic Algorithm [14]. The approach detects code smells in the design stage of the project with the help of well-known design defects. The approach was evaluated on four open source program, namely GanttProject, ArgoUML, Log4J and Xerces which targets to detect the three design defects namely blob, functional decomposition and data class. This approach attained 95% precision and 75% recall.

Shizhe Fu et al. proposed a novel approach to detect code smells through Evolutionary data mining [29]. They used association rules mined from the change history of software systems along with the heuristics algorithms to detect code smells. Experimentation was performed on 5 open source projects namely Eclipse, Junit, Guava, Closure Compiler and Maven to detect 3 code smells such as duplicate code, shotgun surgery and divergent change. On the basis of their experiments and results they concluded that their approach achieves precision between 50% and 100%, recalls between 60% and 100% and F-measure between 58% and 100%. The researchers have compared their approach with SonarQube which detects more duplicate code but achieves low F-measure in the detection of duplicate code. They have also compared their approach with DECOR, JDeodorant and DCPP to show that their approach outperforms other approaches.

Kessentini et al. (2014) proposed parallel evolutionary algorithm (PEA) for the detection of code smell namely blob, functional decomposition, spaghetti code, feature envy, data class, long parameter list, lazy class and shotgun surgery [33]. In PEA the researchers used genetic programming and genetic algorithm in parallel to generate the code smell detection rules and the detectors from well-designed code examples respectively. The approach has been evaluated on nine open source projects namely JFreechart, Ganttproject, ApacheAnt V1.5.2, ApacheAnt V1.7.0, Nutch, Log4J, Lucene, Xerces-J and Rhino. The PEA used for code smell detection approach is compared with random search (RS) algorithm and two single population based approaches and two code smell detection techniques that are not based on meta-heuristics search. In the nine open source software, the approach PEA attained an average precision and recall of 85% on eight different types of code smell detection.

Mansoor et al. introduced a multi-objective approach to generate rules for the detection of code smell [34]. The researchers used Multi-Objective Genetic Programming (MOGP) to find the best combination of metrics that maximize the detection of code smell examples and minimize the detection of well-designed code examples. They achieved 86% precision and 91% recall.

The major limitation of using these search based techniques is that, most of them can be applied only for refactoring the software system. The results produced by search based techniques are complementary when compared with other code analysis techniques. Some of the search based techniques require generation of rules for identifying the code smells. In this paper, a search based technique EGAPSO is used, which overcomes all the limitations of the existing approaches. It does not require the generation of rules for defect detection; instead it uses defect examples to identify the code smells.

2.2. Non-search based techniques

Marinescu proposed a metric-based approach to detect code smells with detection strategies [4]. The metric based approach defines a set of rules for detecting the defects in the object oriented design at method, class or subsystem level. The strategies capture deviations from good design principles and heuristics. The detection strategies have been evaluated on multiple industrial case studies of size ranging from 700 KLOC to 2000 KLOC. For all the strategies, the accuracy rate is over 50%, and the average accuracy is over 67%. However, there are some limitations in their detection strategies, such as they have no justification for choosing the metrics, thresholds and the combination of the metrics and threshold.

Moha et al. introduced an approach called DÉCOR that includes all the necessary steps for the specification and detection of code and design smells [15]. This approach describes the symptoms of the defects in abstract rule language. These descriptions are then mapped to detection algorithms. To overcome the threshold, this method uses some heuristics to approximate some notions. The approach DÉCOR had detected the code smells namely Blob, Swiss army knife, Functional decomposition, Spaghetti code and their underlying fifteen code smells. The approach is tested on the open source project Xerces-J and obtained an overall precision of 60.5% and recall of 100%. However, they have not compared approach DÉCOR with the other existing approaches related to smell detection.

Budi et al. proposed an approach to detect the defects in layered oriented architecture based on stereotypes [16]. In this layered approach, the classes are automatically separated into three groups boundary, entity and control to detect the design flaws associated with each stereotype. The limitation of this approach is that it can detect only the defects associated with the stereotype and not the real design defects.

Palomba et al. have proposed an approach Historical Information for Smell deTection (HIST) to detect the code smells in the source code [17]. The approach HIST detected five different code smells, namely Shotgun Surgery, Parallel Inheritance, Divergent Change, Feature Envy and Blob by exploiting change history information mined from versioning systems. For example, classes that change frequently are considered as a blob. The researchers evaluated the approach HIST to eight projects such as Apache Ant, Apache Tomcat, jEdit and five Android applications and obtained an overall precision of 71% and overall recall of 81%. Vimala et al. proposed a refactoring framework based on game theory to restructure the PL/SQL code [18]. The similarity measure has been used to compute the payoff matrix and refactoring is based on game theory.

Fontana et al. proposed an approach based on Machine learning techniques[30]. The focus of the approach was based on the detection of code smells namely God class, Data class, Long method, Feature Envy. They considered 76 systems for the analysis and experimented on 6 different machine learning algorithms. Experimentation was performed on J48, Random forest, Naive Bayes, Jrip, SMO, LibSVM classifiers through k-fold cross validation in different configurations to find the best one. They have also produced a learning curve to determine which algorithm learns more quickly. On the basis of the experiment and results, they concluded that J48, Random Forest, Jrip and SMO have accuracy values greater than 90% for all data sets and an average they have best performances. Naive Bayes has slightly lower performances on Data class and Feature Envy than on the other two smells. LibSVM performances are lower than the others. The limitation of this approach is that they have also used default parameters instead of optimized parameters. Better results can be obtained with optimized setting of parameters.

Hamid et al. formed a comparative study on two tools namely JDeodorant and incode to detect two bad smells such as God class and Feature Envy [31]. The incode uses metrics based approach and JDeodorant uses the agglomerative clustering technique to detect God class smell. Feature envy smells detected by incode tool are static methods whereas JDeodorant detects non-static methods. They have also extracted some suggestions on the bases of variations found in the results of both detection tools. They concluded that besides development of new tools current tools also need to be revised. They have differentiated static and non-static methods with the use of two tools and are unable to find the reason why inCode does not detect non-static methods detected by Jdeodorant.

Rasool et al. discussed analysis of code smell mining techniques. They illustrated generic code smell detection process steps to diagnose symptoms in the design [32]. They presented a comparison of seven approaches namely manual approach, symptoms based, metric based, probabilistic approach, search-based approach and cooperative approach based on their key features. They highlighted some of the issues in the code smell detection tools such as the integration of tool with other tools is not considered by most authors. They also pointed out the missing feature of tools in the detection of design and architectural smells. Through the review of 46 selected primary papers, the authors have not found any language-independent code smell detection technique that requires the attention of the researchers. They presented a summary of techniques/tools, and stated that not a single technique/tool is capable of detecting all 22 code smells of Fowler. Experiments reveal the differences in results computed by different tools. The majority reasons in the disparity of results are due to the different threshold values on the number of parameters used by these tools. Though an up-to-date review on the state of the art techniques and tools used for mining code smells are presented, they are limited by covering the area of code smells on detection rather than covering all aspects of code smells.

To conclude, the major drawback of existing non-search based techniques is mainly its difficulty in defining the threshold values for metrics which are used in identifying the defects.

2.3. Model based approaches

Model maintenance is defined as the different modifications made to the model in-order to improve its quality, to add new functionalities to the model, to easily detect the badly designed fragments in the model, to correct the defects in the model and to modify the model. When arbitrary changes are made to a model, it is quite likely that its behaviour consistency is not preserved. To preserve the behaviour, the maintainers apply the refactoring operations. The goal of the refactoring operation was to improve the structure of the model and preserve the behavioural properties. Van Der Straeten et al. explored the relationship between behaviour consistencies of model refinements and behaviour preservation of model refactoring [6]. Moreover, the researchers developed a plug-in which is an extension of the Poseidon plug-in to show the practical use of relation model refinement and refactoring. This plug-in detects automatically, whether model refinements and model refactoring are behaviourally consistent. It has been tested in ATM software developed by Prof. Russell Bjork from Gordon University. The main drawback of this approach is that the experiment has been carried out only on small projects.

Van Kempen et al. described the refactoring for the UML model [19]. The researchers used the class diagram of Software Architecture Analysis Tool (SAAT) as a case study for refactoring. From the class diagram of SAAT, the researchers calculated the metrics. Using the metrics, they identified the design defects and the corresponding refactoring operation is applied. The researchers also used Communicating Sequential Process (CSP) to prove the behaviour equivalence of the SAAT after refactoring. The drawback is that this process becomes difficult when it is operated over more complex designs.

Zhang et al. proposed a rule based approach that describes how the model refactoring is applied [20]. The researchers developed a model transformation engine named as C-SAW, which provides the general refactoring tool for manipulating models, and then it has been integrated with the refactoring browser to enable the automation of refactoring methods. Ivan Porres defined model refactoring as a rule-based model transformation which is similar to Zhang et al. But, the refactoring tool of Porres does not support automated refactoring and domain-specific model refactoring [21].

Most of the model based approaches are based either on the rules or on graph transformation and focus on refactoring operations. However, complete code smell detection was not considered in model level.

The existing search based techniques in general can be applied only to refactoring the software system. Also they have limitation in defining the rules and ensuring the correctness of the rules in identifying the code smells. The existing non-search based technique presents complexity in finding the correct threshold values for metrics which are used to detect the defects. The existing model based approaches do not focus in code smell detection. Hence, to overcome these issues, an EGAPSO based approach is proposed to detect code smell in the model level. The proposed EGAPSO based approach aims in overcoming these limitations of rule based approach using a similarity search based optimization search technique rather than relying on a simple threshold value. Hence the proposed approach yields better results than the other approaches.

3. Background and problem statement

This section provides the necessary background material used for the detection of code smells. The definition of code smells, the metrics used for code smell detection and EGAPSO are discussed below:

3.1. Code smell

Code smells are first defined by Fowler and Beck [22]. They defined code smells as the symptoms of code and design problems [22]. In this paper, code smells detection that occurs in the model level, especially in the class diagram is focused, thereby improving the model quality. The following five code smells were considered and evaluated in this paper.

•
Blob: Blob is found in designs where one large class monopolizes the behaviour of a system. It is a large class that has many fields and methods with low cohesion and almost the class does not have parent class and children. It is a class which implements different responsibilities and has large size.
•
Functional Decomposition (FD): FD occurs when a class is designed with the intent of performing a single function. FD is found in a class, where inheritance and polymorphism are poorly used. This is found in class diagrams produced by non experienced object-oriented developers.
•
Data Class (DC): It is a class that has only data. It does not have any processing on the data. The only methods that are defined by this class are the getters and the setters.
•
Feature Envy: Feature envy is a classic smell that occurs when a method get fields of another method in some other class than the one it is actually implemented in. It is often characterized by a large number of dependencies with the envied class, so it reduces the cohesion and increases the coupling of the class.
•
Spaghetti Code: Spaghetti Code is a classic code smell that occurs when the code does not use appropriate structuring mechanisms. Spaghetti Code prevents the use of object oriented mechanisms, namely polymorphism and inheritance. Classes affected by this smell, are characterized by complex methods, with no parameters and interaction between them using instance variables.

3.2. Software metrics

Software metrics provide some useful information that helps in assessing the quality of the software [23]. It can also be used for detecting the similarities between the software systems [24]. In this work, ten metrics are considered, where, all these metrics are related to the class entity in the class diagram. Among the ten metrics, NA, NPvA, NpbA, NprotA, NM, NPvM, NPbM, NprotM metrics give information about the number of methods and attributes in the class diagram, while NAss and Ngen give the relationship between classes. The metrics that are considered in this work with expressiveness and usefulness are listed in Table 1 [25].

Table 1. Metrics description.

Metrics	Description of the metrics
NA	The total number of attributes per class
NPvA	The total number of private attributes per class
NpbA	The total number of public attributes per class
NprotA	The total number of protected attributes per class
NM	The total number of methods per class
NPvM	The total number of Private methods per class
NPbM	The total number of Public methods per class
NprotM	The total number of Protected methods per class
NAss	The total number of Associations
Ngen	The total number of Generalization relationships

3.3. Euclidean distance based genetic algorithm and particle swarm optimization (EGAPSO)

The EGAPSO [8] is a population based heuristic search optimization algorithm developed by Kim and Park in 2006. The EGAPSO is a hybrid approach of Particle Swarm Optimization (PSO) [26] and Genetic Algorithm (GA) [27] which is based on Euclidean distance. This algorithm was used to tune the proportional integral derivative (PID) controller in a steam temperature control system of the thermal power plant, industrial system of chemical process and also in biomedical process [8]. EGAPSO works as follows: In EGAPSO, each particle is considered as individuals, whereas a group of particles is called as a swarm. Each particle in the swarm has its own position and its velocity. Initially the particles are placed at random positions in the search space, and the velocity of the particle is represented as zero. For each generation the position of the particle and its velocity in EGAPSO can be updated using the formula given in the Eqs. (1), (2).(1) $v_{i} (t + 1) \leftarrow ω v_{i} (t) + c_{1} r_{1} ({pbest}_{i} (t) - x_{i} (t)) + c_{2} r_{2} ({gbest}_{i} (t) - x_{i} (t))$ (2) $x_{i} (t + 1) \leftarrow x_{i} (t) + v_{i}$

In the Eqs. (1), (2), $x_{i} (t)$ and $x_{i} (t + 1)$ represent the position of the particle at time (t) and time (t + 1), respectively. $v_{i} (t)$ is the velocity of the particle at time (t), ${pbest}_{i} (t)$ is the best position of the particle found so far, ${gbest}_{i} (t)$ is the global best position of the particles, $c_{1}$ and $c_{2}$ are the acceleration coefficients that influence the best position of the particles, $r_{1}$ and $r_{2}$ are the random variables and $ω$ represents the inertia weight of the particles.

During the search process, the position of a particle is guided by two factors, namely local best ( $pbest$ ) and global best ( $gbest$ ). The best visited position for the particle by itself is local best ( $pbest$ ) and it arrives at the best position obtained so far by any particle in the neighbourhood is global best ( $gbest$ ). During iterations particle converges to its local point and it moves according to the following iterative mathematical models:(3) ${pbest}_{i} (t + 1) = \{\begin{matrix} {pbest}_{i} (t) & if f ({pbest}_{i} (t)) ⩽ f (x_{i} (t + 1)) \\ x_{i} (t + 1) & if f ({pbest}_{i} (t)) > f (x_{i} (t + 1)) \end{matrix}$ (4) $gbest (t + 1) = \min {f (y), f ({gbest}_{i} (t))}$ where $y \in {{pbest}_{0} (t), {pbest}_{1} (t) \dots \dots \dots \dots . . {pbest}_{s} (t)}$ .

After computing the local best and global best of the particles, Euclidean distance is computed between each particles pbest and the gbest value using Eq. (5).(5) ${Euclideandistance}_{(j, i)} = α . f (p_{j}) - f (p_{i}) / | | p_{j} - p_{i} | |$ (6) $α = | | s | | / f (p_{g}) - f (p_{w})$ (7) $S = \sqrt{\sum_{i = 1}^{p} {(x_{i 1} - x_{i 2})}^{2}}$

In the above Eqs. (5), (6), (7) $f (p_{j})$ represents the pbest value of the $j^{th}$ particle, $f (p_{i})$ represents the pbest value of the $i^{th}$ particle, ‘ $α$ ’ represents the scaling factor which can be computed from the formula given in the Eq. (6). In Eq. (6) $f (p_{g})$ represents the global best of the particle and $f (p_{w})$ represents the global worst of the particle. ‘S’ represents the size of the search space which can be computed using the formula given in Eq. (7). In Eq. (7) p represents the size of the swarm, $x_{i 1}$ and $x_{i 2}$ represent the position of the particle. Based on the value of the Euclidean distance, the particles at a minimum distance with the gbestare chosen for single pair crossover and random mutation operation. The obtained offsprings from the random mutation operation are added to the existing particles for the next generations. The process is repeated until the optimal solution is returned.

4. A search based approach for detecting code smells

In this paper, EGAPSO is used for identifying the code smells present in the given model specifically from class diagrams. In the EGAPSO, the knowledge from the class of the base example (CBE) and software metrics are used to detect the code smell in the class of the initial model (CIM). In the following subsections, the paper describes how the code smell detection is encoded using EGAPSO algorithm.

4.1. Individual representation

An individual is represented as a block. A block is a triplet which consists of three parts. It includes CIM, CBE and the code smell detected from the base example. An example of an individual is represented below in Fig. 1. The set of individuals is combined to form the initial population.

4.2. Fitness function

The fitness function denotes the quality of the individuals. It indicates the similarity between the model under analysis (initial model) and the existing model (base example) to infer the code smell that must be corrected. In this work, the similarity between CIM and CBE is calculated. If the similarity value is high, then the code smell in the base example exists in the initial model. The formula for computing the similarity between CIM and CBE is given in Eqs. (8), (9).(8) $Similarity (CIM, CBE) = \frac{1}{m} \sum_{i = 1}^{m} Sim ({CIM}_{i}, {CBE}_{i})$ (9) $Sim ({CIM}_{i}, {CBE}_{i}) = \{\begin{matrix} 1 & if {CIM}_{i} = {CBE}_{i} \\ 0 & if ({CIM}_{i} = 0 and CBE \neq 0) or ({CIM}_{i} \neq 0 and {CBE}_{i} = 0) \\ {CIM}_{i} / {CBE}_{i} & if {CIM}_{i} < {CBE}_{i} \\ {CBE}_{i} / {CIM}_{i} & if {CBE}_{I} < {CIM}_{i} \end{matrix}$ where $m$ is the number of metrics and $Sim ({CIM}_{i}, {CBE}_{i})$ is given in Eq. (9). ${CIM}_{i}$ denotes the metric value of $i^{th}$ class in the initial model and ${CBE}_{i}$ denotes the metric value of $i^{th}$ class in the base example. Then the $fitnessfunction$ is calculated using Eq. (10).(10) $fitnessfunction = f_{1} + f_{2} / 2$

The value of the $fitnessfunction$ is normalized in the range [0,1]. Using similarity between the classes, the function $f_{1}$ is calculated using Eq. (11) and the function $f_{2}$ ensures the completeness of an individual using Eq. (12).(11) $f_{1} = \frac{1}{n} \sum_{j = 1}^{n} Similarity ({CIM}_{Bj}, {CBE}_{Bj})$ where $n$ is the number of blocks and ${CIM}_{Bj}$ and ${CBE}_{Bj}$ are the classes composing the first two parts of the jth block.(12) $f_{2} = IndividualSize / MaximumIndividualSize$

4.2.1. Illustration of fitness function

To illustrate the computation of fitness function of two individuals, two classes are taken, one from initial model (Abstract Chart Implementation) and the other class from base example (Abstract DOM Parser). First the metric values are calculated from the two classes which are represented in Table 2. $\begin{matrix} f_{1} = 1 / 1 [1 / 10 [(10 / 51 + 6 / 10 + 0 + 1 + 1 + 0 + 7 / 8 + 8 / 35 + 1 + 0)]] \\ f_{2} = 1 \\ finalfitness = 0.74 \end{matrix}$

Table 2. Classes from the initial model, base example and their metric values.

Classes	NA	NPvA	NProtA	NPbA	NM	NPvM	NProtM	NPbM	NAss	Ngen
AbstractChart Implementation	10	10	0	0	45	2	8	35	0	1
AbstractDOM Parser	51	6	45	0	45	0	7	8	0	0

After computing the fitness function, the individuals are sorted based on their fitness values.

4.3. Crossover of the particles

In this work, single point crossover is applied. Two parent individuals for crossover are selected based on the minimum Euclidean distance of the particles. From the selected individuals the CBE block is interchanged to produce two new offsprings. The crossover operation on individuals is shown in Fig. 2.

4.4. Mutation of the particles

The individuals for mutation are selected based on minimum Euclidean distance. From the selected individuals, the CBE and Defects from base example of one individual are replaced with the CBE and Defects from base example of other individual. In this paper, the random mutation operation is applied. Fig. 3shows the effect of the mutation operation that the class Abstract DOM parser (base example) of the design defect Blob is replaced with the class CMNode Factory (base example) of the design defect Functional decomposition.