ICST 2021
Mon 12 - Fri 16 April 2021
Tue 13 Apr 2021 15:30 - 16:00 at Boa Viagem - Journal First I Chair(s): Heike Wehrheim

White-box test generation analyzes the code of the system under test, selects relevant test inputs, and captures the observed behavior of the system as expected values in the tests. However, if there is a fault in the implementation, this fault could get encoded in the assertions (expectations) of the tests. The fault is only recognized if the developer, who is using test generation, is also aware of the real expected behavior. Otherwise, the fault remains silent both in the test and in the implementation. A common assumption is that developers using white-box test generation techniques need to inspect the generated tests and their assertions, and to validate whether the tests encode any fault or represent the real expected behavior. Our goal is to provide insights about how well developers perform in this classification task.

We designed an exploratory study to investigate the performance of developers. We also conducted an internal replication to increase the validity of the results. The two studies were carried out in a laboratory setting with 106 graduate students altogether. The tests were generated in four open-source projects. The results were analyzed quantitatively (binary classification metrics and timing measurements) and qualitatively (by observing and coding the activities of participants from screen captures and detailed logs). The results showed that participants tend to incorrectly classify tests encoding both expected and faulty behavior (with median misclassification rate 20%). The time required to classify one test varied broadly with an average of 2 min. This classification task is an essential step in white-box test generation that notably affects the real fault detection capability of such tools.

We recommended a conceptual framework to describe the classification task and suggested taking this problem into account when using or evaluating white-box test generators.

Example of participants' classification performance. Rows represent the tests to classify, and columns denote each participants’ results. (results.png)50KiB

Tue 13 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

15:00 - 16:30
Journal First IJournal-First Papers at Boa Viagem
Chair(s): Heike Wehrheim Paderborn University
Adaptive metamorphic testing with contextual bandits
Journal-First Papers
Helge Spieker Simula Research Laboratory, Norway, Arnaud Gotlieb Simula Research Laboratory
Link to publication DOI Pre-print
Classifying generated white-box tests: an exploratory study
Journal-First Papers
Dávid Honfi , Zoltán Micskei Budapest University of Technology and Economics
Link to publication DOI File Attached
Hansie: Hybrid and Consensus Regression Test Prioritization
Journal-First Papers
Shouvick Mondal Federal University of Pernambuco, Brazil, Rupesh Nasre IIT Madras, India
Link to publication DOI Media Attached File Attached