Design flaws - bugs dataset

This data set can be used to study the relationship between design flaws and software defects. The data set is composed of two sets of matrixes: The design flaws matrix and the post release defects matrixes.

The matrixes includes data for multiple versions of six software systems: Lucene, Maven and Mina from the Apache foundations and Eclipse CDT, Eclipse PDE UI and Equinoxfrom the Eclipse community. The number of versions for each software system varies from 70 to 108.

The matrixes have the following properties, exemplified in the figure for the brain method design flaw:

  • Each column represents a version of the system.
  • Each row represents a class.
  • The value of a cell at row r and column c is equal to the number of instances of the design flaw in the class represent by r at the version represent by c.
  • Since some classes exist in some versions but not in others, the set of rows is composed by the sum of all the classes existing in at least one version. If a class at row r does not exist at the version in column c, we set the value of the cell c,r to -1.
The post release defects matrix has analogous properties, with the only difference that instead of having the number of design flaws each cell represent the number of post release defects.

 

To download the entire data set just send me an email, I will be happy to send it to you.

Copyright © Marco D'Ambros - Design adapted from Inf08 by Inf Design