ID | Type | Status | Priority | Summary | AllLabels |
29 | Defect | Fixed | Medium | vhl summary reports wrong number of rows diff'd | Type-Defect, Priority-Medium, OpSys-All |
30 | Defect | Fixed | Critical | TestCase 23 fails on linux | Type-Defect, Priority-Critical, OpSys-All |
31 | Enhancement | Fixed | High | create README file | Type-Enhancement, Priority-High, OpSys-All |
32 | Enhancement | Fixed | High | create QuickStart document | Type-Enhancement, Priority-High, OpSys-All |
34 | Defect | Duplicate | Medium | create User Guide | Type-Defect, Priority-Medium |
35 | Enhancement | Fixed | High | create faq | Type-Enhancement, Priority-High, OpSys-All |
Monday, September 20, 2010
0.6.12 adds some documentation, fixes minor issues
Thursday, September 16, 2010
TextDiffor in 0.6.11
0.6.11 introduces the TextDiffor, which is useful for diff'ng chunks of text that might have small formatting differences that you would like to ignore. A good example of this is programming language code. For instance, I was recently trying to diff SQL schemas (DDL) using meta data tables. A troublesome schema object to diff was the TEXT definition of stored procedures. One side looked like this:
while the other side looked like this:
Note that the lines of the text1 and text2 variable declarations have small alignment differences between the two sides, and that the SQL SELECT statement is multiline in the first case, but only 1 line in the second case. These two snippets are identical PL SQL programmings (produce the same AST), but are different textually.
The TextDiffor will, by default, see these two snippets as identical. It uses a very simple text normalization before performing the String comparison.
1) replace all tabs and newlines ([\t\r\n]) with a single space character
2) compress all multi-character whitespace runs to a single space character
3) trim all whitespace from both ends
DECLARE
number1 NUMBER(2);
number2 NUMBER(2) := 17; -- value default
text1 VARCHAR2(12) := 'Hello world';
text2 DATE := SYSDATE; -- current date and time
BEGIN
SELECT street_number
INTO number1
FROM address
WHERE name = 'INU';
END;
while the other side looked like this:
DECLARE
number1 NUMBER(2);
number2 NUMBER(2) := 17; -- value default
text1 VARCHAR2(12) := 'Hello world';
text2 DATE := SYSDATE; -- current date and time
BEGIN
SELECT street_number INTO number1 FROM address WHERE name = 'INU';
END;
Note that the lines of the text1 and text2 variable declarations have small alignment differences between the two sides, and that the SQL SELECT statement is multiline in the first case, but only 1 line in the second case. These two snippets are identical PL SQL programmings (produce the same AST), but are different textually.
The TextDiffor will, by default, see these two snippets as identical. It uses a very simple text normalization before performing the String comparison.
1) replace all tabs and newlines ([\t\r\n]) with a single space character
2) compress all multi-character whitespace runs to a single space character
3) trim all whitespace from both ends
Saturday, September 11, 2010
Results summarization from MagicPlan
Release 0.6.10 introduces a new, optional, capacity to the MagicPlan. Previously, the file (report) produced by the MagicPlan only displayed individually itemized diffs. That is, each diff appeared, only once, in the output as a detailed description of just that discrete diff. 0.6.10 allows you to instruct the MagicPlan to produce aggregate level summary information as well as the detailed individual diffs.
If you include this:
Above is the output from TestCase 23, which provides functional test coverage for the results summarization feature. The input data that produced this report:
Dissecting this report, section by section; first, there is the Very High Level (vhl) summary:
0:00:38.011 is an ISO 8601 formatted time specification. It represents 0 hours, 0 minutes, 38 seconds, and 11 milliseconds. The next line, "!4 row diffs", starts with the ! mark, which is the symbol for ROW_DIFF in both the summary and detail sections of the report. The 4 row diffs are the rows of dashed lines in the tables above: 1 on the lhs, and 3 on the rhs. The terminology that DK uses is: "there are 4 rows missing". The final line of the vhl summary, "@7 column diffs", shows that there are a total of 7 individual column (or cell) value diffs. The @ sign is the symbol for COLUMN_DIFF in both the summary and detail sections of the report. The 7 column diffs are: row 3 column3, row 4 column2, row 4 column3, row 4 column4, row 5 column 3, row 5 column4, row 6 column3.
The next section is the row diff summary:
Next is the column diff summary section:
Finally, the column diffs clustered section:
If you include this:
<property name="withSummary" value="TRUE" />
as a property of the MagicPlan, the output file will have a header that looks like this: --- vhl summary ---
diff'd 8 rows in 0:00:38.011, found:
!4 row diffs
@7 column diffs
-------------------
--- row diff summary ---
1 row diffs <
3 row diffs >
------------------------
--- column diff summary ---
columns having diffs->(column3, column4, column2)
column3 has 4 diffs
column4 has 2 diffs
column2 has 1 diffs
---------------------------
--- column diffs clustered ---
columnClusters having diffs->(column3, column2.column3.column4, column3.column4)
column3 has 2 diffs
column2.column3.column4 has 1 diffs
column3.column4 has 1 diffs
---------------------------
Above is the output from TestCase 23, which provides functional test coverage for the results summarization feature. The input data that produced this report:
lhs: rhs:
column1,column2,column3,column4 column1,column2,column3,column4
---------------------------- 1, 0000, x, aaaa
2, 1111, x, aaaa ----------------------------
3, 2222, y, aaaa 3, 2222, x, aaaa
4, 0000, z, bbbb 4, 3333, x, aaaa
5, 4444, z, bbbb 5, 4444, x, aaaa
6, 5555, u, aaaa 6, 5555, x, aaaa
7, 0000, v, aaaa ----------------------------
8, 1111, x, aaaa ----------------------------
Note well that the primary key on both the lhs and rhs tables is column1. So DK will use column1 as the diff'ng key, to align the rows.Dissecting this report, section by section; first, there is the Very High Level (vhl) summary:
--- vhl summary ---
diff'd 8 rows in 0:00:38.011, found:
!4 row diffs
@7 column diffs
-------------------
The first line tells us how many rows were diff'd and how long it took. In this case 8 unique rows were evaluated for diffs. If a row occurs on only one side (is a ROW_DIFF), it counts as 1 row diff'd. In the case where DK is able to match the lhs row with a rhs row, that counts as 1 row diff'd, not 2. So the 8 rows that were diff'd are: the 1 row that appears only on the rhs (1), the 3 rows that appear only on the lhs (2,7,8), and the 4 rows that appear on both sides (3,4,5,6).0:00:38.011 is an ISO 8601 formatted time specification. It represents 0 hours, 0 minutes, 38 seconds, and 11 milliseconds. The next line, "!4 row diffs", starts with the ! mark, which is the symbol for ROW_DIFF in both the summary and detail sections of the report. The 4 row diffs are the rows of dashed lines in the tables above: 1 on the lhs, and 3 on the rhs. The terminology that DK uses is: "there are 4 rows missing". The final line of the vhl summary, "@7 column diffs", shows that there are a total of 7 individual column (or cell) value diffs. The @ sign is the symbol for COLUMN_DIFF in both the summary and detail sections of the report. The 7 column diffs are: row 3 column3, row 4 column2, row 4 column3, row 4 column4, row 5 column 3, row 5 column4, row 6 column3.
The next section is the row diff summary:
--- row diff summary ---
1 row diffs <
3 row diffs >
------------------------
This breaks down the row diffs according to which side they occur on. The line, "1 row diffs <", tells us that there is 1 row missing from the lhs: row 1. The next line states that there are 3 rows missing from the rhs: row 2, row 7, and row 8.Next is the column diff summary section:
--- column diff summary ---
columns having diffs->(column3, column4, column2)
column3 has 4 diffs
column4 has 2 diffs
column2 has 1 diffs
---------------------------
This is a very straightforward grouping of the COLUMN_DIFFs, grouped according to which column the diff occurs in. column3 has 4 diffs: row 3, row 4, row 5, and row 6. column4 has 2 diffs: row 4, and row 5. column2 has 1 diff: row 4.Finally, the column diffs clustered section:
--- column diffs clustered ---
columnClusters having diffs->(column3, column2.column3.column4, column3.column4)
column3 has 2 diffs
column2.column3.column4 has 1 diffs
column3.column4 has 1 diffs
---------------------------
This groups the COLUMN_DIFF columns according to which row the diffs occur in. "Cluster" is another name for "pattern of column names having diffs all in the same row". The first line tells us that there are 3 clusters, and which columns participate in each cluster. The column3 cluster has 2 diffs. That is, there are two rows where the only COLUMN_DIFFs are in column3: row 3 and row 6. The column2.column3.column4 cluster has 1 diff: row 4. Finally, the column3.column4 cluster has 1 diff: row 5. Column diff clusters are useful for spotting patterns of linked or related column diffs, which can be helpful in understanding the origin of diffs.
Friday, September 10, 2010
diff'ng CLOBs
0.6.10 introduced a new default behavior for CLOB diff'ng. CLOBs usually represent formatted text. When diff'ng formatted text, users typically would like certain incidental aspects of the formatting to be ignored. So by default, CLOBs are now diff'd in a way that is insensitive to both *nix and Windows newlines (\n and \r, in any combination).
0.6.10 fixes the following issues
ID | Type | Status | Priority | Summary | AllLabels |
21 | Enhancement | Fixed | Critical | FileSink should be able to produce summaries | Type-Enhancement, Priority-Critical, OpSys-All |
22 | Defect | Fixed | High | displayColumnNames should be validated | Type-Defect, Priority-High, OpSys-All |
23 | Enhancement | Fixed | High | extend TestCaseRunner to test for failures, exceptions | Type-Enhancement, Priority-High, OpSys-All |
24 | Defect | Fixed | Medium | add cluster information to column diff summary | Type-Defect, Priority-Medium |
25 | Enhancement | Fixed | High | add group by (column list) option to Sink summary | Type-Enhancement, Priority-High, OpSys-All |
28 | Defect | Fixed | High | Replace newline characters with spaces in clobs | Type-Defect, Priority-High |
Wednesday, September 1, 2010
0.6.9 fixes following issues
ID | Type | Priority | Summary | AllLabels |
13 | Enhancement | High | ant build target to execute JUnit tests | Type-Enhancement, Priority-High, OpSys-All |
14 | Defect | High | ant build target to execute TestCases | Type-Defect, Priority-High, OpSys-All |
15 | Enhancement | Medium | add elapsed diff time to user output from standalone app | Type-Enhancement, Priority-Medium, OpSys-All |
16 | Defect | Medium | add diff progress indicator to output from standalone app | Type-Defect, Priority-Medium, OpSys-All |
17 | Defect | Critical | MagicPlan does not accept diffKind parameter | Type-Defect, Priority-Critical, OpSys-All |
18 | Defect | Critical | maxDiffs property does not work in MagicPlan | Type-Defect, Priority-Critical, OpSys-All |
Subscribe to:
Posts (Atom)