Splitting a table imported from Word?

Submitted by: Brent Hartwig Last Updated: 2006-08-04

Brent Hartwig answers a question about importing a table from a Microsoft Word document, and shares a "post import hook" function.

=Question=

Our customer wants to import a WORD table into Epic (into the CALS Table model). I am not familiar with Epic's INTERCHANGE. The source table is very complex - it is set up as one single table, and the customer is expecting the import routine to break it up into separate tables at a particular breaking point. Also, the Word table has a row that serves as a header and re-occurs at this breaking point. The customer wants this row to be stripped out during the import routing.

My question is, does is sound like Interchange can do this?

=Brent Hartwig answers=

I found (at least in Epic 4.4K and Word 2000-2002) that one Word table creates one //table/tgroup/tbody structure. If there was any table structure modification to be made, it had to be done within the postimporthook.

''If the header row you mentioned signifies the point at which the tables need to split, you will need to programmatically identify this row. Once you can do this, find the first one and (a.) copy the table -- everything but the rows -- to a valid point after the table you're in, (b.) delete the "split indicator" row, (c.) cut all rows thereafter, (d.) paste in shell of current table, (e.) search for the next "split indicator" row in the new table.''

I scanned the ACL table functions and didn't see anything that made this much easier. I did find tbl_grid_split but not tbl_split. I tried a recursive split between two CALS rows; but, even with context rules off, Epic wouldn't perform the first split.

From within the postimporthook, you are not limited to ACL. An alternative is Java: manipulate the DOM via AOM.

If someone can think of an easier way, please share.

As encouragement, Interchange has supported most of what I needed it to do. There will be some things specific to tables that Interchange (and/or the versions of Word you support) will not allow you to programmatically resolve. We drew our line of support shy of things that would require us to traverse the Word document after Interchange was done with it.

Below is my postimporthook, provided to help you see the types of things I did in this hook. I cut out the duplicate stuff (same function, different xpaths).

# # # # function postImportHook( doc ) {   # Restrict to intermediate instances. if ( doc_type( doc ) == package_name ) {     current_doc( doc ); local win = current_window; local xpath; local arr[]; # Update user uWindow::setMessage( "Performing Post-Import Cleanup...", win ); # Counters local modifiedCnt = 0; # Counter for modified nodes local totalCnt   = 0; # Counter for total nodes local optionalCnt = 0; # Optional counter uLog::setLogMsgPrefix; # reset/default uLog::add( "START - post-import cleanup" ); uLog::setLogMsgPrefix( "\t" ); # Extract  from , scraping # any residual content. xpath = "//inline-graphic-temp"; modifiedCnt = processInlineGraphics( xpath, doc ); uLog::add( "Total \"$xpath\" nodes processed: " . modifiedCnt ); # Remove empty elements. Empty elements are created # when a Word doc begins at the part level even when user selects # the part template. xpath = "//part"; totalCnt = 0; modifiedCnt = 0; removeEmptyNodes( xpath, totalCnt, modifiedCnt, optionalCnt ); uLog::add( "Empty \"$xpath\" nodes deleted: " . modifiedCnt ); uLog::add( "Total \"$xpath\" nodes processed: " . totalCnt ); # Split based on  modifiedCnt = splitParas( doc = current_doc ); uLog::add( "Total \"para\" nodes split: " . modifiedCnt ); # Convert soft return characters into  elements modifiedCnt = processSoftReturns( totalCnt, doc ); uLog::add( "Converted soft returns: " . modifiedCnt ); uLog::add( "Total soft returns processed: " . totalCnt ); # Scrub elements, removing leading tab characters and #    empty # elements. Tab removal necessary as tabs from list items are imported; # see ATI case# 27490 for more info. xpath = "//para"; totalCnt = 0; modifiedCnt = 0; optionalCnt = 0; removeEmptyNodes( xpath, totalCnt, modifiedCnt, optionalCnt, 1 ); uLog::add( "Empty \"$xpath\" nodes deleted: $modifiedCnt" ); uLog::add( "Leading tabs removed from \"$xpath\" nodes: $optionalCnt" ); uLog::add( "Total \"$xpath\" nodes processed: $totalCnt" ); # Convert  elements into  elements. The  is created # to support if-ancestor tests, preventing underscore from coming # through as well. xpath = "//double-underscore"; delete( arr ); arr[ "styles" ] = "doubleunderscore"; # Attribute name/value pair modifiedCnt = changeNodes( xpath, "style", arr ); uLog::add( "Total \"$xpath\" nodes converted: " . modifiedCnt ); # Process all nodes, setting 5 of its attributes based on     # its current xlink:href attribute value. processXrefs( doc ); # Merge back-to-back //hyperlink elements where the first has an     # "xlink:href" attribute and the following ones do not. Back-to-back # elements are created when the character-style # combination changes yet still includes the "Link Text" style or     # when the "Link Text" style contains cross-references. xpath = "//hyperlink"; totalCnt = 0; modifiedCnt = 0; mergeNodes( xpath, totalCnt, modifiedCnt, doc, \        HYPERLINK_ATTR_NAME_HREF ); uLog::add( "Pre-Merge \"$xpath\" node count: $totalCnt" ); uLog::add( "Post-Merge \"$xpath\" node count: " . \        ( totalCnt - modifiedCnt ) ); # Merge back-to-back //fib-item//table//fib-response elements. # Back-to-back  elements are created when the # character-style combination changes yet still includes the # "FIB Answer" style. xpath = "//fib-item//table//fib-response"; totalCnt = 0; modifiedCnt = 0; mergeNodes( xpath, totalCnt, modifiedCnt ); uLog::add( "Pre-Merge \"$xpath\" node count: $totalCnt" ); uLog::add( "Post-Merge \"$xpath\" node count: " . \        ( totalCnt - modifiedCnt ) ); # See if we need to move some graphics. totalCnt = 0; modifiedCnt = 0; optionalCnt = 0; moveGraphics( doc, totalCnt, modifiedCnt, optionalCnt ); uLog::add( "Graphic files moved: $optionalCnt" ); uLog::add( "Graphic references updated: $modifiedCnt" ); uLog::add( "Total graphic references: $totalCnt" ); # Add id to //fib-response nodes. modifiedCnt = addFIBResponseIds; xpath = "//$FIB_ITEM_ELEMENT_NAME//$FIB_RESPONSE_ELEMENT_NAME"; # log msg only uLog::add( "Set the \"$FIB_RESPONSE_ATTR_NAME\" attribute on $modifiedCnt \"$xpath\" nodes." ); # Place caret at the top of the file if ( oid_valid( oid_first ) ) {        goto_oid( oid_first ); }     # Save this function's edits execute( "save;" ); # Clear our message uWindow::setMessage( " ", win ); # Perform completeness check. check_completeness -full; uLog::setLogMsgPrefix; # reset/default uLog::add( "END - post-import cleanup" ); } }
 * 1) Entry point after (Word to Intermediate XML) import. Clean up the XML and
 * 2) force a completeness check. Invoked after file imported but before file
 * 3) opened in editor.
 * 1) Beyond clean up, this function is responsible for the following:
 * 2)  1. Split the para element containing //para//split-para nodes; thus,
 * 3)     providing support for multiple paragraphs within list items. Opted not to
 * 4)     limit to //item/para nodes given the XSLT should not receive any
 * 5)      elements.
 * 6)  2. Move graphic files and change XML to use relative path to new location;
 * 7)     thus, providing support for relative image paths.
 * 8)  3. Add response-ident attribute to //fib-response nodes, defining the
 * 9)     given fib-response's ordinal of its fib-item.
 * 10)  4. SCR 4209: Convert  to
 * 11)     . Part of workaround for Interchange
 * 12)     limitation.
 * 13)  5. SCR 4440: Split xlink:href attribute value of, setting 5 of
 * 14)     its attributes.
 * 1) WARNING: Given this hook is called before the document is loaded into a
 * 2)          window, some built-in variables are not yet set. For example,
 * 3)          $main::filename is an empty string in this hook; use doc_path(
 * 4)          $doc ) instead. (ATI29270)
 * 1) WARNING: It is imperative that the pub caret PI in the Intermediate XML
 * 2)          template does not trail the end tag; rather, it should be between
 * 3)          the start and end tags. (SCR3652 and ATI30960)