CODT 1.0 Atlantic

Atlantic is the codename for the upcoming general availability version of the Configurable Ontology to Data model Transformation, CODT.

Ontology Graph to Entity Replationship diagram

The Enterprise Data Management Council, EDMC created the Financial Industry Business Ontology as an industry-standard business conceptual model.

FIB-DM, the Financial Industry Business Data Model, opened the finance industry standard ontology for data architects and modelers. More than four hundred people from over a hundred financial institutions downloaded the FIBO data model.

CODT is the patent-pending technology used to derive the enterprise data model from the industry-standard ontology. Last year’s article described the mappings and outcome; the generated PowerDesigner model. Watch the Semantics for Data Architects YouTube videos for a detailed look at model structure and content. This second article explains how the transformation works.

Atlantic enables FIB-DM commercial licensees to transform their proprietary FIBO extensions and other ontologies into data models.

Finally, Semantics for Large Banks

FIB-DM marketing has been targeting midsize financial institutions. As external FIBO product page on the EDM Council website states:

Many midsize EDMC members want to leverage the industry standard, but don’t have ontology tooling, databases, and the human expertise inhouse yet.

“Semantics for midsize banks” is the most popular data model resource with one hundred thirty views on YouTube and three hundred downloads of the PowerPoint from FIB-DM and the Bank Ontology website.

Asset size, as in the EDM Council membership tiers, Large meaning over $200 billion, is a poor proxy for sophistication in semantic technologies. Some “midsize” financial institutions are very advanced in FIBO implementation. Some global financial institutions still don’t have an RDF-store in production or rolled out the FIBO across the enterprise.

Not invented here - stone age cartoon by Oxford Creativity

Anyhow, the majority of financial institutions downloading FIB-DM are smaller banks – so far, only five global banks downloaded the FIBO data model. Most large banks already developed their proprietary transformation and data models derived from the ontology. I invite you to run a PowerDesigner/ERwin complete compare of your FIBO Data Model vs. FIB-DM. Does your model have the full FIBO documentation? How does it handle inverse object properties and complex class restrictions?

Regardless of size, once a financial institution customizes and extends the Financial Industry Business Ontology, they need to update their FIBO data model.

A “Semantics for Large banks” education PowerPoint and video come out with the software release.

Semantic Enterprise Information Architecture

SEIA, the Semantic Enterprise information Architecture replaces the enterprise data (or class) model at the apex of model-driven development with the ontology.

Semantic Enterprise Information Architecture diagram
Semantic Enterprise Information Architecture.

When the Enterprise Data Management Council (EDMC) created the financial industry standard, they choose Ontology Web Language (OWL) rather than the relational model. I fully support the EDMC’s choice, because the semantic model expresses business rules, including values and multiple-entity joins.

In SEIA, transformations derive the enterprise data model, FIB-DM, object models, FIBUM and in the future, messages, and process models.

We recommend using RDF/OWL, with the business-friendly FIB Concept Maps rather than a conceptual data model to define and describe the enterprise for non-technical users. For Financial Institutions, the enterprise ontology must derive from the industry standard, the FIBO.

You leverage the industry standard and using the same names, concepts, definitions, and design patterns across the enterprise.

As the EDM Council FIB-DM page states:

Common names and definitions across the enterprise facilitate integration between applications, RDBMS and RDF-stores.

Of course, in order to make the arrows in the pretty picture work, you want to propagate changes to your Enterprise Ontology to downstream implementation models.

Metadata Sets

The approach to derive a data model from an ontology is basic ETL

The diagram shows the transformation from RDF/OWL to PowerDesigner with ETL in between.
CODT is ETL

We extract metadata from the ontology, transform and load into the data modeling tool or repository. The preferred extraction method is to query an ontology on a Triple Store or development platform. We issue commands in SPARQL, the query language and store the result sets.

The load takes files in a format consumable for the data modeling tool. PowerDesigner, for example, has the functionality to import MS-Excel files.

The complex part of CODT is the transformation from ontology to Entity-Relationship metamodel.

Metadata is data that provides information about other data. A data set is a collection of data records. As defined herein, a Metadata Set is metadata stored in data sets.

An RDBMS store information about the database schema in metadata sets, the system tables. For example, an RDBMS SYSCOLUMNS table comprises COLUMN_NAME, TABLE_NAME, DATATYPE, NULLS, and other information about the column. COLUMN_NAME and TABLE_NAME define a unique key for the SYSCOLUMNS table.

Likewise, the CODT metadata sets are isomorphic representations of ontology, entity-relationship, and data modeling tool-specific metadata.

A diagram of the the Configurable Ontology to Data model Transformation with Metadata Sets for Ontology, generic Entity Relationship model and tool-specific metadata.
CODT Metadata Set transformation

The transformation works in two steps:

  1. Transforming Ontology Metadata into generic Entity-Relationship metadata
  2. Transforming the Generic ER into Tool specific metadata.

In the diagram, the same generic ER Metadata Set is the source for both PowerDesigner and Sparx EA metadata sets.

MS-Excel Application

The patent drawing shows the CODT Extraction component interfacing with the Ontology Platform, Protege, Topbraid, or any RDF-Store, and the Load component importing metadata into the modeling tool or repository. Internally there are Configuration, Analytics, and a User Interface component.

CODT system components

Any platform and programming language can implement the system, metadata sets, and method. Microsoft Excel is the tool of choice to view and analyze tabular data. Every data architect has Excel and knows how to view, search, and filter. Hence MS-Excel is not only a fast prototyping tool for the CODT Metadata Sets but also makes the transformation easy to deploy.

Atlantic CODT installs with seven Excel Workbooks, Configuration, Analytics (of the source ontology), a User Interface for orchestration and one-click transformation, an Excel Add-in for common VBA functions, and the three metadata sets.

In this article, we look at the core of the transformation process:

ComponentMetadata SetExcel Workbook
ExtractionOntology MetadataOntology MDS.xlsx
TransformationGeneric ER Metadata Entity Relationship MDS.xlsx
LoadPowerDesignerPowerDesigner MDS.xlsx

In the following paragraphs, we follow the easiest transformation, ontology class to data model entity, to understand how CODT works. The PowerDesigner Metadata Set for import into the modeling tool is a simple Excel spreadsheet with columns for Entity Code, Name, Comment, and lineage attributes Prefix, Localname, and URI. PowerDessigner imports the spreadsheet and creates a data model entity for every row.

Future articles and Semantics for Large Banks will explain the very complex transformation of ontology object properties into data model associative entities, and the configuration options.

Extract SPARQL query

Some data modeling tools like Sparx Enterprise Architect and IBM Infosphere Data Architect have imports of RDF/OWL files. The file import approach parses the ontology file and creates data model objects as they go along. CODT doesn’t parse files. I believe that your ontology development platform, Protege or Topbraid, or your Graph, RDF-Store DB, already invented the parsing wheel. Hence, Atlantic CODT uses the ontology query language, SPARQL on a FIBO graph to extract the metadata.

The CODT Configuration component generates a simple SPARQL query – all named classes. We query the class name, qualified name, the namespace (URI), and as per the CODT FIBO configuration, the SKOS definition or the class.

# Owl Classes.rq
SELECT ?class ?qname ?namespace ?skos_definition
WHERE {
	?class a owl:Class .
 	BIND(afn:namespace(?class) AS ?namespace) .
 	FILTER (smf:isBound(?namespace) ).
	BIND (smf:qname(?class) AS ?qname ) .
	OPTIONAL { ?class skos:definition ?skos_definition }
	FILTER (?class NOT IN (owl:Nothing,  owl:Thing))
}

You do not want unmanned classes, as in anonymous class restrictions in your data model. Hence, we only extract classes that the EDM Council FIBO teams have named by stipulating a URI.

We also filter the ontology root classes owl:Nothing and owl:Thing that won’t become entities in a data model.

The raw query result set is a comma-separated value, CSV text file with a tabulator as the delimiter.

A notepad screnshot of the Classes SPARQL query result set.
qrClasses – raw results of the Classes SPARQL query

The above query and resultset are for the Topbraid Composer ontology development platform. You can adapt the query for your SPARQL dialect to produce the same metadata extract.

Extraction – Ontology MDS Workbook

The Ontology Metadata Set is an import of the raw CSV into MS-EXcel.

The ontology metadata workbook imports the raw extract and performs simple format conversions from the raw result set.

The CODT ontology Metadata Set in MS-Excel
Ontology MDS – Classes

We have the Class, Qualified Name, Namespace, the CODT configured main descriptive annotation property, Prefix, Localname, and FIBO URI.

Notice the other Excel tabs, ontology metadata sets for Object Properties, Domain, Range, Sub-class, and Sub-property. Upcoming articles and PowerPoint educations decks will explain them in detail.

Entity Relationship MDS Workbook

The Entity-Relationship (ER) workbook is a generic Conceptual Data Model representation. The ER MDS is independent of data modeling tool specifics.

The class-to-entity transformation is straightforward. The Entity Code must uniquely identify an entity in the ER Metadata Set. The Ontology MDS qualified name (= prefix+localname) is already unique for an ontology class. Hence, the configuration for the FIBO transformation simply uses the QName as the entity code without further transformation. Likewise, Prefix, Localname, and Namespace are copies of the Ontology MDS source.

The CODT Entity-Relationship Metadata Set in MS-Excel
CODT Entity Relationship MDS – Entities

The only real transformation is a string conversion from the FIBO Camel Case Localname to Conceptual/Logical Data Model naming convention. A UNCAMEL() function in the CODT add-in finds lower to uppercase changes and inserts a space. BoardAgreement becomes Board Agreement.

Data Modeling tools have a prominent Comment or Definition properties for all model objects. A Configuration option tells CODT what Ontology Metadata Set column to use. All FIBO classes have a SKOS Definition

The consistent design for all MDS has sources columns to the left and derived columns like the Entity Name to the right.

Other tabs in the workbook transform supertype/subtype relations, associative entities derived from object properties, and attributes derived from data properties.

PowerDesigner MDS Workbook

The data model tool-specific MDS uses names and formats required for a direct load. In other words, PowerDesigner imports this MDS workbook.

The CODT PowerDesigner import Metadata Set in MS-Excel
PowerDesigner MDS – Entity

For the trivial import of Entities, we don’t need further transformation. The Code, Comment, Prefix, Localname, URI, and (logical) Name are a direct copy of the ER MDS values.

The other tabs, Inheritance, Inheritance Links, Association reflect PowerDesigner object names that are different from the generic ER names.

PowerDesigner Import

SAP PowerDesigner has a very robust metadata import. Rather than CSV files the tool imports MS-Excel spreadsheets. We define a mapping from the Excel tab to the PD model object (entity) and from columns to model object properties.

A PowerDesigner screenshot of the Excel import properties for the CODT metadata set.
PowerDesigner Excel Import – Entities

The Imported File specified the path and Excel filename, the Mapping Description lists column and properties.

The import also facilities orchestration. We can import several tabs withing a single import job.

The screenshot below shows the outcome of the entity import. The tool created an entity for every row in the PowerDesigner MDS.

A PowerDesigner screenshot of the imported entities and the Board Agreement properties.
CODT PowerDesigner imported entities

The Entity Properties show Name, Code, and Comment of the Board Agreement.

Performance is an issue with the PowerDesigner import. It takes more than an hour to import 2000 entities and several hours for Relationships and Relationship links.

Shortcomings the CODT “working product”

The pre-Atlantic CODT, implemented in MS-Excel transformed the FIBO into a high-quality Enterprise Data Model. The FIB-DM non-provisional patent application with the United States Patent and Trademark Office (USPTO), #16985414, describes the invention offering a “working product”, CODT.

The “working product” was a prototype, the FIBO data model is the Proof of Concept. While good enough to demonstrate the model transformation for USPTO examiners, the prototype is not user-friendly. For example:

 ==IF(VLOOKUP(C54,'D:\Local Documents\Financial Regulation Ontology\Data Model\Query\Transformation[Object Properties.xlsx]Object Properties'!$A$1:$I$495,8)<>"",
VLOOKUP(C54,'D:\Local Documents\Financial Regulation Ontology\Data Model\Query\Transformation[Object Properties.xlsx]Object Properties'!$A$1:$I$495,8),"") 

Arcane Excel formulas and undocumented VBA code.

While fully implementing the invention, including the patent storage medium Metadata-Sets, and system comprising of ETL, configuration and analytics components, the method required too much operator manual action, copying data between metadata sets and extending formula ranges.

In short, a USPTO working product, but not ready for general availability.

Atlantic requirements

Just like FIB-DM is an enterprise data model, CODT remains enterprise-level software. There will never be a shrink wrap ontology to data model converter or a CODT import function within the data modeling tool.

  • Robust
  • Transparent
  • Easy
  • Configurable
  • Extensible

Atlantic CODT is an open platform, fully disclosing Metadata Sets, Excel formulas, Visual Basic for Applications (VBA), and Power Query M-language code.

Data Architects and Developers and easily extend the application supporting additional configuration options and OWL constructs.

Game-changing Microsoft Excel tables and Power Query

Tables and Power Query in the Atlantic version significantly reduced code, manual intervention, and improved readability.

Excel Tables

Excel tables identify a list of data with headers and provide additional functionality to manage and refer to the data.

For Atlantic, this means the formulas refer to meaningful column names rather than letters, digits, and the $ sign. For example, the Entity-Relationship MDS Name formula becomes: =UNCAMEL([@Localname])

Excel applies the formula automatically to the entire column. In other words, we don’t have to worry about named ranges and manually (or per macro) copy formulas.

Tables make the Metadata Set transparent and reduces errors.

Excel Power Queries

Power Query is data connectivity and transformation technology for external data sources. While there is no RDF-Store connector for Power Query available yet, CODT uses the technology to load the raw CSV query results and to move and transform data from Ontology to Entity-Relationship, and on to the tool-specific metadata set.

A Power Query screenshot of the Ontology Metadata Set connections.
CODT Ontology MDS – queries and connections

The screenshot shows our Classes tab in the Ontology MDS and its Queries & Connections.

Every tab in the workbook contains an Excel Table which populates from a Power Query on the SPARQL CSV result set.

A major benefit over the pre-Atlantic Text Import is the defined mapping of CSV to Excel Table column and powerful value transformation.

Furthermore, we can refresh individual or all queries in the workbook, if our raw ontology metadata changes.

We no longer have to open CSV files, copy values, format and transform the data.

Let’s take a closer look at the Class query.



The Power Query Editor formula bar shows our source file, the CSV Document CODT_HOME & "\Ontology Source\qrClasses.txt.

A Power Query screenshot of the Ontology Metadata Set queries.
CODT Ontology MDS Class query

The Applied steps list Source, type changes to the text, and to promote the first CSV record to table column headers.

M Programming language

Underneath the User interface lies M, a powerful ETL language. The snippet below shows the M-code of our Class query transformation.

    Source = Csv.Document(File.Contents(CODT_HOME & "Ontology SourceqrClasses.txt"),[Delimiter="	", Columns=4, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true])

Just like SQL and SPARQL, M is a declarative language. We simply define the WHAT we want, without concern of HOW Power Query accomplishes the transformation.

For the Atlantic version of CODT, Power Query eliminated error-prone macros and transformation procedures.

Complex queries and transformations

Beyond simple transformations, Power Query fully supports Joins, Aggregations, and Transpositions.

The snapshot below is an outlook to upcoming Atlantic CODT articles and PowerPoints, and videos about the complex Object Property Transformations.

Ontology Object Properties are the most complex transformation challenge. The common mapping as ER Relationships is erroneous:

  1. Object properties, especially in the FIBO have a sub-property hierarchy. Hence we must transform into associative entities – not relationships.
  2. Inverse object properties do not transform into two relationships or associative entities. For example in a data model, we do not have two association identifies and is identified by between Legal Entity and LEI. You, the data architect must specify which object property (usually the active verb form) to transform.
  3. Various class restrictions must be consolidated into ER relationship cardinalities.

Queries rather than complex VLOOKUP() formulas in Atlantic make complex transformation rules transparent.

The Power Query Dependencies diagram shows the Ontology and the Entity-Relationship MDS – the arrows indicating a data source.

A MS PowerQuery scrennshot of CODT Ontology and ER Metada Sets
CODT Metadata Sets – PowerQuery dependencies

We visualize the query dependencies and hence the movement of metadata from the ontology graph into the data modeling tool.

Atlantic summary

For midsize FIB-DM licensees starting out with SEIA, Atlantic provides a single click transformation or your FIBO extensions into your FIBO Data Model. You can easily change configuration parameters to support your data modeling standards and naming convention.

Large Financial Institutions take Atlantic CODT source code to develop an deploy proprietary enterprise embodiments.

Ontology Source  Transformation System  Data Model  
TypeSubtypeExtractionOSApplication typeUser InterfaceData Model TypeModeling ToolTool Interface
Ontology platformDevelopment ToolSPARQLWindowsMS-ExcelWhite BoxConceptualPowerDesignerImport
RDF Store, Semantic EndpointLogicalERWinApplication Programming Interface
RDF/OWL filesLocalParserUnixStandaloneGraphicalPhysicalOther
World Wide WebObjectXML Metadata Exchange

 

Develop a Graph DB connector, or source the raw Ontology metadata from your RDF-Store in a batch job. Replace the modeling tool import with your proprietary synchronization of Enterprise Ontology and Data Model Repository.

Licensing Atlantic CODT

The re-coding from working product to full utilization of declarative language is still in progress. The upcoming major FIB-DM release, based on FIBO Q1/2020 new modules is the development testcase.

General availability of Atlantic CODT is scheduled for fall, this year, after pilot/POC with two banks.

In a few months, there will be an opportunity for one or two institutional licensees to try out Atlantic with your FIBO extensions and other ontologies.

Conclusion

Atlantic CODT enables the Semantic Enterprise Information Architecture.

With Atlantic, FIB-DM users transform their FIBO extensions and related proprietary or Open Source ontologies into data models.

Atlantic is robust, transparent, easy to use, configurable, and extensible.

Thanks for reading; you are welcome to comment on the LinkedIn version of this article or email me your questions.

Jurgen Ziemer, Ontologist and Data Architect at Jayzed Data Models
Email: jziemer@jayzed.com

References and further reading

  • Finance Ontology transformed into an Enterprise Data Model article
  • Semantics for Data Architects PowerPoint online presentation
  • FIBO on Enterprise Data Management Council website:
  • Collect, Combine, and Transform Data Using Power Query in Excel and Power BI 1st Edition by Gil Raviv
  • Excel 2019 Power Programming with VBA 1st Edition by Michael Alexander, Dick Kusleika
  • Microsoft Excel 2019 VBA and Macros (Business Skills) by Jelen, Bill
  • Stone age cartoon used with permission from Oxford Creativity, https://www.triz.co.uk