Friday, August 27, 2010

0.6.6 allows composing plans from multiple files

It's fairly common to have information needed for a plan that you would like to reuse across multiple plans. For instance, if you are diff'ng DB sources, it would be nice to isolate the DB connection information to one file that could be reused across different comparisons.

0.6.6 now allows this via the -planfiles argument. -planfiles can take as a value either a single file name or multiple comma separated file names. It then composites the files to create a single plan. Here's an example:

 joe$ java -jar diffkit-app.jar -planfiles test3.plan.xml,sources.xml  

file: test3.plan.xml
      <bean id="plan" class="org.diffkit.diff.conf.DKPassthroughPlan">  
           <property name="lhsSource" ref="lhs.source" />  
           <property name="rhsSource" ref="rhs.source" />  
           <property name="sink" ref="sink" />  
           <property name="tableComparison" ref="table.comparison" />  

file: sources.xml
      <bean id="rhs.source" class="org.diffkit.diff.sns.DKFileSource">  
           <constructor-arg index="0"  
                value="./test3.rhs.csv" />  
           <constructor-arg index="1" ref="lhs.table.model" />  
           <constructor-arg index="2">  
                <null />  
              <constructor-arg index="3">  
                <null />  
              <constructor-arg index="4" value="\," />  
              <constructor-arg index="5" value="true" />  
              <constructor-arg index="6" value="true" />  

Note that the bean id="rhs.source" is nowhere defined in the file test3.plan.xml. It's only defined in the file sources.xml

much improved command line interface in 0.6.6

0.6.6 was just released. By default, all of the logback INFO level spam is gone from the standard output. Instead, there is now a minimalist output that includes a simple, high-level, diff report. e.g.

 joe$ java -jar diffkit-app.jar -planfiles test9.plan.xml  
 diff'd 9 rows, found:  
 !4 row diffs  
 @2 column diffs  

It's still possible to get the logback spam. You need to edit conf/logback.xml to change the  root logger level to info (instead of warn), and then specify the command line param to force diffkit-app.jar to use that logback.xml conf file.

Thursday, August 26, 2010

0.6.5 released -- add support for most SQL data types


Saturday, August 21, 2010

0.6.2 released-- DiffKit now diffs across different databases

LHS table can be in one DB, while RHS table can be in another. TestCase 18 demos the feature.

Friday, August 20, 2010

DB2 -- more trouble

Turns out that the version of the jdbc driver we are using has a different behavior than all other jdbc drivers. The javadoc indicates that you should be able to make repeated calls to and simply get a null when the end of the RS has been reached. But the db2 driver blows up after the first call to an exhausted (consumed) RS:

Caused by: [ibm][db2][jcc][10120][10898] Invalid
 operation: result set is closed.
        at [diffkit-app.jar:na]
        at [diffkit-app.jar:na]
        at [diffkit-app.jar:na]
        at org.diffkit.diff.sns.DKDBSource.getNextRow( [diff

According to this, there is a new db2 jdbc driver that cures this behavior, and as a bonus doesn't require a license file!

db2 9.5

This is part of the 9.5 release, so I'm not sure it will be compatible with lower versions, but seems worth a try.

DiffKit now builds and executes under Java 1.5

The 0.5.3 release builds and passes all TCs under JDK/JRE 1.5 on Windoze.

Thursday, August 19, 2010

0.5.1 released

There is now a source and a bin distribution. The standalone executable application successfully executes the TestCases on Windoze (at least under JVM 1.6).

Monday, August 16, 2010

Standalone application now runs the TestCases

Previously, TestCases could only be run through the DiffKit project. You needed to download the source code, have groovy installed, and then figure out how to invoke the TestCaseRunner.

Now, the standalone executable diffkit-app.jar carries the whole TestCase suite with it. All TestCases can be executed in place simply:

java -jar diffkit-app.jar -test

So now it's very easy for us to collect end-user supplied validation of the kit from different environments.

Saturday, August 14, 2010

JarClassLoader -- it's not Groovy related

Further experimentation turned up that the problem does not stem from an interaction between JarClassLoader and Groovy. The same effects can manifest with JarClassLoader even if the target class is Java.

Simply put, Class.getPackage() does not always work with non-default ClassLoaders. But if you only need the package name, ClassUtils.getPackageName() does. Here's a link on the problem:

JarClassLoader & Groovy -- side effects

It appears that when you ask JarClassLoader to load compiled Groovy classes (compiled with Ant groovyc), the resulting classes do not behave the same way as regular java classes. In particular, the Groovy classes don't know what package they belong to: 
 println MyClass.class.getPackage() 
 println MyClass.class.getPackage() 

The Java class gives the correct answer whether it is loaded by ClasspathClassLoader or by JarClassLoader. However, the Groovy class only works when called by ClasspathClassLoader, not by JarClassLoader.

Wednesday, August 11, 2010

application now processes command line using Apache commons CLI

Entry point to application is now conf/DKApplication:

   private static final String VERSION_OPTION_KEY = "version";  
   private static final String HELP_OPTION_KEY = "help";  
   private static final String TEST_OPTION_KEY = "test";  
   private static final String PLAN_FILE_OPTION_KEY = "planfile";  
   private static final Options OPTIONS = new Options();  
   static {  
    OPTIONS.addOption(new Option(VERSION_OPTION_KEY,  
      "print the version information and exit"));  
    OPTIONS.addOption(new Option(HELP_OPTION_KEY, "print this message"));  
    OPTIONS.addOption(new Option(TEST_OPTION_KEY, "run embedded TestCase suite"));  
   public static void main(String[] args_) {  
    LOG.debug("args_->{}", Arrays.toString(args_));  
    try {  
      CommandLineParser parser = new PosixParser();  
      CommandLine line = parser.parse(OPTIONS, args_);  
      if (line.hasOption(VERSION_OPTION_KEY))  
      else if (line.hasOption(HELP_OPTION_KEY))  
      else if (line.hasOption(TEST_OPTION_KEY))  

commons CLI is very easy to use.

Tuesday, August 10, 2010

controlling logback configuration in standalone app

Application end-users must have the ability to change the logback logging level applied to all of the DK code, without having to explode the application jar.

Logback looks for a config file named logback.xml on the classpath. So first we copy a new logback.xml file into cwd and try setting the classpath on the command line:

java -cp . -jar diffkit-app.jar

No joy.

But this works: 

java -Dlogback.configurationFile=./logback.xml -jar diffkit-app.jar
Simple enough.

Also, logback can find the logback.xml file embedded in the application jar if the System property is defined this way:

java -Dlogback.configurationFile=conf/logback.xml -jar diffkit-app.jar

In this case, logback treats the property value as a classpath resource specification.

Saturday, August 7, 2010

First download released on google code

0.5.0 is now available:

Download the zip and unzip. diffkit-app.jar is a completely self-contained executable jar. In order to run it, you only need have java 1.6 installed on your system (might even work with java 1.5, but not yet tested). Here's an example invocation:

java -jar diffkit-app.jar example.plan.xml

=== example.plan.xml ===

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns=""

<bean id="plan" class="org.diffkit.diff.conf.DKMagicPlan">
<property name="lhsFilePath" value="./example.lhs.csv" />
<property name="rhsFilePath" value="./example.rhs.csv" />
<property name="keyColumnNames">
<property name="sinkFilePath" value="./example.sink.diff" />


Hat's off to JDotSoft

Integrated JarClassLoader from JDotSoft. Only took me about an half hour to figure the whole thing out. For my tastes, much simpler and more embeddable than One-JAR.

Now the dist ant target builds a standalone, executable jar-- diffkit-app.jar. For embedded applications, users simply explode the diffkitapp.jar, the diffkit library jar is in the root, and all dependent jars in lib/.

One-JAR 0.97

One-JAR works exactly as advertised. Unfortunately, it's a bit of a pain to work with. The fundamental problem is that assumes a particular world view/structure. For instance, the jar that is produced from the target project source is named 'main.jar' in the output, and stored in main/main.jar. That's not ideal for DiffKit distribution purposes, since I really need the DiffKit jar to carry version information in its name. Documentation for One-JAR is sparse. I can't find any reference in the documentation regarding how I can change the name of the main.jar.

Verdict; One-JAR is a nifty product that just works. But it's not very malleable, which can be a problem if your desired output does not have exactly the structure envisioned by the One-JAR author.

I'm going to look for a simpler alternative that is easier to bend to my purposes.

Here's one place to start

Thursday, August 5, 2010

packaging DiffKit for distribution

Right now, DiffKit is source code only. We need some way of packaging a binary distribution to satisfy two audiences:

  1. Users who run DiffKit as a standalone application.
  2. Users who access DiffKit programmatically and need to embed a DiffKit jar in their own application.
Remarkably, there are no standard Java mechanisms for this in Java 1.6. JSR 277 seems to have gone comatose. Apparently Java 1.7 introduces some kind of modularized packaging implementation: java module system in java 7; but 1.7 is not in sight.

After reading about OSGi for about an hour, I fatigued and decided on something simple and immediate for the short run. One-JAR™ uses a custom class loader to allow references to jars that are embedded in other jars. So the plan is to create an executable binary (jar) using One-JAR for the application users, and embedded developers can unjar the One-JAR to access the diffkit.jar and any of its dependent jars that they might require.