Left Accent
UAB CIS Header

Clone Detection with Microsoft Phoenix

This project investigates the use of the Microsoft's Phoenix research platform to detect code clones. In this context, clones are sections of code that are duplicated in various parts of a program. These code clones usually occur when a programmer copies and pastes code from one part of the program to another section. When an error is found in the original copy of the code, or when some coding updates need to be performed on the original code, the bug fixes or updates need to be done on all clones.

Finding code clones becomes a necessity during bug fixes or software updates. Being able to perform this task automatically offers a significant advantage. The improvement of software can also utilize a clone detector by finding duplicated code that could be restructured or refactored. In addition, clone detection in aspect-oriented software development can help in the identification of aspects (often called aspect mining). Aspects typically crosscut different modules to realize a specific concern. The presence of multiple clones may suggest the emergence of an aspect. A specialized clone detector can look for such aspects and identify their location in the source.

This project uses Microsoft's new Phoenix research platform to host the clone detection process. Although Microsoft has given research universities access to this platform, it is not an entirely research-oriented tool. It is poised to become the heart of all Microsoft development tools in the future.

In Phoenix, plugins can be added during the compilation process to perform a variety of tasks including software analysis. This project is focused on the research and development of a clone detection plugin that can be integrated within Phoenix to search for clones during the compilation of a program.

The process of detecting clones utilizes abstract syntax trees and suffix trees. Abstract syntax trees are used because they provide a structural representation of the code after the lexical analysis process. Cloned code will maintain their structure, but the variable names may change. In general, suffix trees have been successfully applied to find duplicate sections in biological sequences. In this project, the nodes of the abstract syntax tree become the "sequences" and a suffix tree is generated to find the sections of the nodes that are duplicates.

Figure 1
Figure 1: Clone detection process

Figure 2
Figure 2: Clone detection process in Phoenix

Software Composition and Modeling Laboratory


Demo in Windows Media   Demo in RealOne Player

Note: You may need to install the Camtasia co-dec to view the Windows Media version.
Download co-dec


Phoenix-Based Clone Detection Using Suffix Trees
[ Paper | Presentation ]

This project was presented at the 2005 ACM Mid-Southeast Conference.
Awarded "First Place: Master's Division"

This project was presented at the 2006 Graduate School Research Day at UAB.
Awarded "First Place: Session 6 - Mathematics and Computer & Information Sciences"

Accent Right