Numerous techniques involve mining change data captured in software archives to assist engineering efforts.
We observed that important changes to software artifacts are sometimes accompanied by numerous non-essential modifications,
such as local variable refactorings or textual differences induced as part of a rename refactoring.
We developed a tool-supported technique (called DiffCat) for detecting non-esential differences in the revision histories of software
systems, and used our technique to investigate code changes in over 24 000 change sets gathered from the change histories of seven
long-lived open-source systems. The details of our technique, as well as the observations supported by our investigation, were
accepted for publication in the 33rd ACM/IEEE International Conference on Software Engineering.
This website allows readers to download DiffCat - our prototype Eclipse implementation that we used to scan change
histories (CVS/SVN), to detect fine-grained structural differences in change sets from those change histories, and to identify which
of those structural differences were non-essential. This website also allows readers to download our full experimental data package.
Our current implementation of DiffCat is a proof-of-concept prototype, rather than a fully-reusable API. We hence release DiffCat
as a suite of (open-source) Eclipse plugin projects, rather than a reusable 3rd party API. These projects can be downloaded
here and consist of four plugin projects:
- ca.mcgill.cs.swevo.diffcat
- ca.mcgill.cs.swevo.util
- ca.mcgill.cs.swevo.diffcat.view
- DiffCompare
The first two projects make up DiffCat's diffing component. The last two projects make up DiffCat's experimental (and optional) Eclipse
viewer. You can import either just the first two projects or all four. The code is distributed under the
Eclipse EPL, version 1, except for the rebundled
org.eclipse.compare
project (inside DiffCompare), which is released as-is under whatever license you find in the code.
DiffCat is implemented as an extension to the
SemDiff repository analysis
framework. SemDiff facilitates the retrieval of repository information and viewing the results of a diffing analysis. To use
DiffCat, you'll need to run it via SemDiff (see below).
If you'd like to use DiffCat
programmatically, you may download the latest
snapshot
(July 26th, 2011). This snapshot lets you use DiffCat's diffing service from your own Eclipse plugins without setting up any SemDiff
repositories/DBs (you'll still need to install it though to resolve dependencies). To use DiffCat programmatically, please follow the instructions
below.
DiffCat Installation Prerequisites
DiffCat is built on two existing research prototypes:
SemDiff
and
ChangeDistiller.
An installation of DiffCat will also require a prior installation of these two projects. We outline the requisite steps for a full installation below.
System requirements: Eclipse 3.6, Java 1.6, SemDiff, and ChangeDistiller. DiffCat has been tested on Linux, but I've managed to run
it on Windows as well. I've restricted my testing of all DiffCat components to the Eclipse "RCP Development" release. Although this shouldn't be a problem
for the main DiffCat components (diffcat + util), the viewer component (view + DiffCompare) might break on other Eclipse releases.
Installing SemDiff: Full instructions can be found here. Please make sure to
install the latest version (2.3.1). SemDiff's licensing information can be found on its website. For the absolute best performance, I suggest that, after
installing SemDiff, you overwrite its PPA distro with the latest PPA release from PPA's update site "http://www.sable.mcgill.ca/ppa/site_latest/" . I've
found numerous PPA bugs in the past few months, and most have been fixed, but the fixes have not yet been released with SemDiff.
Installing ChangeDistiller:
- Register an account with the software evolution and architecture lab (s.e.a.l.)
at the University of Zurich.
- Register the following site with the Eclipse update mechanism: https://www.evolizer.org/updates/evolizer. You will need to enter your s.e.a.l. credentials.
- Select and install the "ChangeDistiller" and "Evolizer Core" projects and restart Eclipse.
ChangeDistiller's licensing information can be found on its website.
Installing DiffCat
To install DiffCat, just import the desired combination of plugin projects (see above) into your Eclipse workspace.
Running DiffCat via SemDiff
To try out DiffCat, follow these steps
- Launch project ca.mcgill.cs.swevo.diffcat as an Eclipse application by right clicking on the project and selecting Run As ... | Eclipse Application.
This will open up a new Eclipse test environment through which DiffCat can be used. The rest of the instructions should be carried out within this test environment.
- If you've already created a SemDiff database, you'll need to refresh that database every time you re-install DiffCat. This allows SemDiff
to handle the results of your newly installed DiffCat instance.
To do this, go to SemDiff | Update Database and re-enter the information for the database you'll be working with. Otherwise,
if this is your first install, you don't need to refresh anything.
- Setup a SVN/CVS repository and select change sets via SemDiff's UI
(SemDiff | Run Detectors ...).
- After entering the change set range, click Next and check the DiffCat detector.
- Launch the analysis.
- The results can be viewed in SemDiff's Transaction View or in our (experimental) viewer, as we describe below.
Viewing the Results
To view the output of DiffCat's results, I recommend you use our experimental DiffCat viewer. To open and use this viewer, you'll have to have
the
ca.mcgill.cs.swevo.diffcat.view and
DiffCompare projects in your workspace. Then follow the steps below:
- To open the view, select Window | Show View | Other ... | DiffCat View.
- Use the arrow buttons in the top right corner to navigate to the change sets that have been processed by DiffCat.
- If results are present, use the view to open up the files and methods that were found to have been modified.
- Double click on any of the diffs. The viewer will use the Eclipse compare view (somewhat crazily) to show the diff as best as possible.
Programming against DiffCat
To program against DiffCat, make sure you have installed the latest code snapshot. Then, start an Eclipse plugin project and declare the following
dependencies in your manifest file:
- ca.mcgill.cs.swevo.diffcat
- ca.mcgill.cs.swevo.util
You'll probably have to resolve some other dependencies. For these, look into DiffCat's manifest file to find the requisite plugins (it is straightforward).
Once you've resolved these dependencies, refer to DiffCat's
ca.mcgill.cs.swevo.diffcat.MainController class and use its
findStructuralDiffs
method. This method requires you to specify the files you'd like to diff using (type-resolved)
org.eclipse.jdt.core.CompilationUnit instances. Each
map must associate a file path with each file, so that identical CompilationUnit instances can be properly distinguished during diffing. DiffCat treats
file versions with identical file paths (including the name of the file) as versions of the same file. The rest will be treated as either file
insertions/deletions or class renames, depending on the similarity between unmatched files.
DiffCat's diff model is straightforward. Each
DiffCatResult instance returned by the MainController embodies one fine-grained structural difference,
as would be returned by ChangeDistiller. There's a bunch of self-explanatory getter methods to access various components of each diff, e.g., the change type,
the left and right AST nodes that were affected by the change, their enclosing method, class, and field signatures (as applicable), their positions in the
original code file, etc. To weed out non-essential differences (as we've defined them so far), just do:
Collection ‹DiffCatResult› results = ...
CollectionUtil.removeAll(results, DiffCodes.NON_ESSENTIAL_DIFF);
Programming against DiffCat (within SemDiff)
If you'd like to use DiffCat within the SemDiff framework, you may develop your own SemDiff recommender, as outlined
on SemDiff's
help page. Your detector will need to declare
a dependency on the DiffCat detector (id =
ca.mcgill.cs.swevo.diffcat). You can then access the results using this id within your code.