UTFix: Change Aware Unit Test Repairing using LLM
Software updates, including bug fixes and feature additions, are frequent in modern applications, but they often leave test suites outdated, resulting in undetected bugs and increased chances of system failures. A recent study by Meta revealed that 14%-22% of software failures stem from outdated test cases that fail to reflect changes in the codebase. This highlights the need to keep test cases in sync with code changes to ensure software reliability. In this paper, we present UTFix, a novel approach for repairing unit test cases (unit test) when their corresponding focal methods (focal method) undergo modifications. UTFix addresses two critical issues: assertion failures and reduced code coverage caused by changes in the focal method.
Our approach leverages large language models (LLMs) to repair test cases by providing contextual information such as static code slices, dynamic execution traces, and failure messages. We evaluate UTFix using both synthetic and real-world benchmarks. The synthetic benchmark includes diverse changes from popular open-source Python projects, where UTFix successfully repaired 89.2% of assertion failures and achieved 100% code coverage for 96 tests. Additionally, we curated real-world examples, finding that UTFix repaired a significant number of assertion failures and coverage issues. To the best of our knowledge, this is the first comprehensive study focused on test case repair in evolving Python projects. Our contributions include the development of UTFix, the creation of synthetic and real-world benchmarks, and the demonstration of the effectiveness of LLM-based methods in addressing test case failures due to software evolution.