Friday, November 12, 2010

How to regression test a database application (part 2)

This is part 2 of 2. In part 1 I characterize what is meant by "database application". I also quickly review some different types of regression testing, and then describe a hypothetical retail inventory example system which will serve as the subject of detailed discussion. In part 2, I explore the "final table" type of regression test in detail, and apply it to our retail inventory system. I conclude with an argument for why "final table" testing should always be included on your project.

The challenge

I’m the project manager for the AIR system, which was described in some detail in part 1. The system has been humming along in production for years, and now my developers are working on some significant functional changes to the system which are scheduled to be released shortly. My biggest concern is to not break something that is already working. For that I need some type of regression testing, but I"m unsure what the best regression strategy is. Some, but not all, of my system is already covered by Unit tests, and there are a couple of higher level functional tests that cover a slice. But there is no end-to-end fully integrated systems test.
In an ideal (fantasy) world, I would now instruct my developers to write regression tests, at all levels, covering every aspect of the system. They would write low level unit tests to test every DB function, DB stored procedure, and Java method. They would then write a higher level functional test for each slice. Finally, they would script, or automate, a highest level fully integrated end-to-end systems test.
Figures 1 & 2 (same as in part 1) diagram the system.
http://1.bp.blogspot.com/_EkaVamlSbJE/TNnmVv_KV7I/AAAAAAAAAFs/KMGqzvDfzHY/s1600/inventory_system.png
Figure 1: System
   
http://1.bp.blogspot.com/_EkaVamlSbJE/TNnlbct1JcI/AAAAAAAAAGg/1xLTBrLwG5A/s1600/inventory_slice.png
Figure 2: Forecast slice
Taking the Forecast slice as an example, if I want to competely cover this slice with Unit tests, my developers will need to write several for the getProductList() stored procedure. More than one Unit will be needed in order to exercises all of the possible boundary and error conditions. The test harness will need to somehow populate the underlying tables/views that are read by getProductList(). Just the Unit tests required to provide decent coverage for the getProductList() could take an entire day to write. Next, my Java developers need to provide Units to blanket the forecast() method. Since forecast() calls the getProductList() stored procedure, the forecast() Units will need to mock the stored procedure. Setting up mock stored procedures is itself an exacting and laborious endeavour. Again, providing comprehensive Unit coverage for just forecast() might pin down a Java developer for an entire day. Finally, all of the tables and views touched by this slice will also need their own Unit tests. These Unit tests will ensure that defaults, invariants, and constraints are always observed within the tables and views. If the views implement any business logic, that too must be covered by Unit testing.
All of the above Unit tests are data intensive. So in each case, the test writer must think long and hard about the possible range of data values that the subject function might be exposed to. The test writer will need to think about both expected values for each parameter or data set, and also possible values for each. After all, one of the goals of software testing is to ensure that the test subject continues to operate properly even in the face of unexpected or "flaky" data or paramters.
It’s clear that providing deep Unit test coverage for just the Forecast slice will entail significant effort. 3 developer days (1 for java, 1 for stored procedure, and 1 for tables/views) might be generous, but it’s not an unrealistic estimate, considering all of the different concerns involved. The Forecast slice is just one of many slices that constitute the whole system. In a complex enterprise system, and that includes most Big Enterprise systems, there will be hundreds or thousands of slices.
If my whole system has just 100 slices, it will take 300 developer days to achieve exquisite, comprehensive, rock-solid, Unit test coverage. Current software testing theory dictates that this is exactly the type of Unit testing you should have on a well run project. But as the manager of project AIR, I don’t have 300 developer days to write Unit tests. Purists will object that most of the Unit tests should already have been written before the current iteration, but on project AIR (as on many, if not most, Enterprise systems) they weren’t. That’s the fault of the previous project manager, but he has since moved on, so now it’s my problem. The theory dictates perfect Unit testing, but it’s hard to square that with time and resource constraints:
In theory there is no difference between theory and practice. In practice there is.
Jan L. A. van de Snepscheut
Alternatively, I can abandon the idea of blanket Unit test coverage. Instead, I could go to the other extreme and build just 1, fully integrated, end-to-end, systems test. One way to carry out this system test is to:
Final Table Test
  • capture a complete set of inputs (files, tables, whatever) from the current production (pre-change) system.
  • in an isolated, controlled, test environment, run those inputs through the complete current production (pre-change) system in order to produce outputs. I’ll call the outputs "final tables", because all of the outputs are captured in DB tables and those tables are at the end of a chain (or pipeline) of processing. The outputs from this run are labelled "expected".
  • in an isolated, controlled, test environment, run those same inputs through the complete current in-development (post-change) system in order to produce outputs. Label those outputs "actual".
  • compare the expected final tables with the actual final tables. They should match, except in the areas where there are intended functional changes that are meant to produce changes to the outputs.
Some benefits of this scheme are worth considering carefully. This is the ulimately "black box" test. The testers do not need to understand anything at all about the internals of the system. They only need to know which buttons to press in order to run the system, and which are the "final tables" they need to compare. So this is the type of test that an independent testing group (say QA) should be able to effect. Another interesting aspect of this approach is that of test data selection. The good part is that it relies on current production inputs, exactly the real-world data conditions that the application will experience in practice. Also, simply reusing production data alleviates the burden of the test team needing to carefully consider and synthesize test data. This could be a significant labor savings.
But perhaps the most important aspect of this type of testing is that it effectively tests the system at all levels. Put another way, the final table test can provide much of the test coverage that would be provided by the collection of Units, whereas Units cannot provide any coverage at the functional or system levels. As a result, even if you have a blanket of Unit tests, you still really do need to do some form of functional or systems test. That’s because the interfaces, or boundaries, between functions, modules, and slices have to be exercised. As an example, look at the Forecast Slice. Suppose I write individual Unit tests for getProductList(), forecast(), and the tables and views. I test the forecast() method by mocking up the getProductList(). That’s not sufficient to provide confidence that the slice is production ready. I need at least some functional tests that exercise forecast() making a real call against the getProductList() stored procedure. Otherwise, the Unit tests, in mocking the getProductList(), might make assumptions about the parameters, behaviour, or side effects, that are not consistent with the real (non-mock) getProductList(). Likewise, it’s not enough for me to simply write a functional test that just covers each slice. I need a still higher level test that exercises the interfaces between slices. So the bottom line is that, if you absolutely have to, you can live without Unit tests and rely exclusively on a single, integrated, systems test (final table test). But the converse is not true. Even if you have superbe Unit test coverage, you will still need to perform some integrated systems testing.
Of course, the final table test is hardly bulletproof. Probably the biggest problem with this test is that it is a relative test, wherease the Units are absolute. By that I mean the final table test only tells you that the changed system behaves the same as the reference (current production) system that you compare it against. If the current production system is producing incorrect results, and your changed system passes the final table test, then your changed system will also be producing incorrect results in exactly the same fashion. The Unit tests, on the other hand, are measured against absolute expectations, not relative ones. So Unit tests are better grounded. Also, the final table test only tests aspects of the system that will be exercised by the current production data inputs. It’s possible that there are bugs lurking in the newly developed (changed) system, but that the bugs are not triggered by the current production data inputs. But the production data could change at any time in a way that would then trigger the bug, but in production. One more material drawback of the final table test is that it can only be carried out late in the Software Development Cycle, whereas Unit tests are able to uncover problems very early on in the cycle. That’s important because it’s generally, though not always, more expensive to correct bugs late in the cycle than early in the cycle.
Table 1. Units versus final table test
Units Final Table
bug detection
early
late
level
lowest, function
highest, system
test type
white box
black box
coverage
isolated
comprehensive
results type
absolute
relative
effort
large
small
impl. expertise
programming
operations
data
synthetic
production
The key to deriving maximum benefit from the final table test is that the final tables comparison be carried out at the actual field value level. That is, every row+column in the expected final tables must be compared with its corresponding row+column in the actual final tables. Remarkably, very few shops that utilize a fully integrated systems test actually carry out these detailed comparisons. Instead, they rely on gross (summary level) comparisons, and visual inspection of the resulting data sets. That is completely inadequate.Many bugs can, and do, elude gross and visual comparisons.
The conclusion is that you must perform some type of fully integrated systems test, and a very powerful and cost effective option for this is the "final table test". Of course, high quality projects will also have low-level Unit tests and mid-level functional tests. However, if you are in a bind, constrained by very limited developer and testing resources, I would recommend that the first and foremost test you should carry out is the final table test.

19 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies. By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  2. A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
    Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai

    ReplyDelete
  3. Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. The way you have explained everything is quite impressive and elegant. Thank you so much for sharing this data in here.

    cloud computing training in chennai
    cloud computing training
    cloud computing training chennai

    ReplyDelete

  4. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.
    Regards,
    SAP training in chennai|sas training in Chennai|SAP course in chennai

    ReplyDelete
  5. Quite Interesting post!!! Thanks for posting such a useful post. I wish to read your upcoming blog to enhance my skill set, keep blogging.
    Regards,
    Informatica course in Chennai|Informatica training in chennai|Informatica courses in Chennai

    ReplyDelete
  6. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    SAS Training in Chennai | SAS Course in Chennai

    ReplyDelete
  7. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Python Training in Chennai | Python Course in Chennai

    ReplyDelete
  8. Nice Post! It is really interesting to read from the beginning & I would like to share your blog to my circles, keep your blog as updated.
    Regards,
    JAVA Training in Chennai|JAVA Course in Chennai|PHP Training in Chennai

    ReplyDelete

  9. The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
    Ethical hacking course in Chennai | Ethical hacking training in chennai

    ReplyDelete
  10. The post about regression testing is very much helpful.Thanks for sharing it
    Selenium Training Center in Chennai

    ReplyDelete
  11. The post about regression testing is very much helpful.Thanks for sharing it
    Selenium Training Center in Chennai

    ReplyDelete
  12. Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training

    Hibernate Online Training Hibernate Online Training Spring Online Training Spring Online Training Spring Batch Training Online Spring Batch Training Online

    ReplyDelete