CODT 1.0 Atlantic

Atlantic is the codename for the latest version of the Configurable Ontology to Data Model Transformation, CODT.

Ontology Graph to Entity Replationship diagram

The Enterprise Data Management Council, EDMC created the Financial Industry Business Ontology as an industry-standard business conceptual model.

FIB-DM, the Financial Industry Business Data Model, opened the finance industry standard ontology for data architects and modelers. Over three thousand people and over one hundred banks downloaded the FIBO data model.

CODT is a patented technology that derives enterprise data models from industry-standard ontologies. Last year’s article described the mappings and outcome of the generated PowerDesigner model. Watch the Semantics for Data Architects YouTube videos for a detailed look at model structure and content. This second article explains how the transformation works.

Atlantic enables FIB-DM commercial licensees to transform their proprietary FIBO extensions and other ontologies into data models.

Finally, Semantics for Large Banks

FIB-DM marketing has been targeting midsize financial institutions. As the external FIBO product page on the EDM Council website states:

Many midsize EDMC members want to leverage the industry standard, but don’t have ontology tooling, databases, and the human expertise inhouse yet.

“Semantics for midsize banks” is the most popular data model resource, with one hundred thirty views on YouTube and three hundred downloads of the PowerPoint from FIB-DM and the Bank Ontology website.

As in the EDM Council membership tiers, asset size is large, meaning over $200 billion, and is a poor proxy for sophistication in semantic technologies. Some “midsize” financial institutions are very advanced in FIBO implementation. Some global financial institutions still don’t have an RDF store in production or have rolled out the FIBO across the enterprise.

Not invented here - stone age cartoon by Oxford Creativity

Most financial institutions downloading FIB-DM are smaller banks—so far, twenty global banks have downloaded the FIBO data model. Most large banks have already developed their proprietary transformation and data models derived from the ontology. I invite you to complete a PowerDesigner/ERwin comparison of your FIBO Data Model vs. FIB-DM. Does your model have the full FIBO documentation? How does it handle inverse object properties and complex class restrictions?

Regardless of size, once a financial institution customizes and extends the Financial Industry Business Ontology, it must update its FIBO data model.

A “Semantics for Large Banks” education PowerPoint and video came out with the software release.

Semantic Enterprise Information Architecture

The Semantic Enterprise Information Architecture SEIA replaces the enterprise data (or class) model at the apex of model-driven development with the ontology.

Semantic Enterprise Information Architecture diagram
Semantic Enterprise Information Architecture.

When the Enterprise Data Management Council (EDMC) created the financial industry standard, it chose Ontology Web Language (OWL) rather than the relational model. I fully support the EDMC’s choice because the semantic model expresses business rules, including values and multiple-entity joins.

In SEIA, transformations derive the enterprise data model, FIB-DM, object models, FIBUM, and, in the future, messages and process models.

We recommend using RDF/OWL, with the business-friendly FIB Concept Maps, rather than a conceptual data model to define and describe the enterprise for non-technical users. For Financial Institutions, the enterprise ontology must derive from the industry standard, the FIBO.

You leverage the industry standard and use the same names, concepts, definitions, and design patterns across the enterprise.

As the EDM Council FIB-DM page states:

Common names and definitions across the enterprise facilitate integration between applications, RDBMS and RDF-stores.

Of course, in order to make the arrows in the pretty picture work, you want to propagate changes to your Enterprise Ontology to downstream implementation models.

Metadata Sets

The approach to derive a data model from an ontology is basic ETL

The diagram shows the transformation from RDF/OWL to PowerDesigner with ETL in between.
CODT is ETL

We extract metadata from the ontology, transform it, and load it into the data modeling tool or repository. The preferred extraction method is to query an ontology on a Triple Store or development platform. We issue commands in SPARQL, the query language, and store the result sets.

The load takes files in a format consumable by the data modeling tool. PowerDesigner, for example, can import MS Excel files.

The complex part of CODT is the transformation from ontology to Entity-Relationship metamodel.

Metadata is data that provides information about other data. A data set is a collection of data records. As defined herein, a Metadata Set is metadata stored in data sets.

An RDBMS stores information about the database schema in metadata sets, the system tables. For example, an RDBMS SYSCOLUMNS table comprises COLUMN_NAME, TABLE_NAME, DATATYPE, NULLS, and other information about the column. COLUMN_NAME and TABLE_NAME define a unique key for the SYSCOLUMNS table.

Likewise, the CODT metadata sets are isomorphic representations of ontology, entity-relationship, and data modeling tool-specific metadata.

A diagram of the the Configurable Ontology to Data model Transformation with Metadata Sets for Ontology, generic Entity Relationship model and tool-specific metadata.
CODT Metadata Set transformation

The transformation works in two steps:

  1. Transforming Ontology Metadata into generic Entity-Relationship metadata
  2. Transforming the Generic ER into Tool-specific metadata.

The same generic ER Metadata Set in the diagram is the Source for PowerDesigner and Sparx EA metadata sets.

MS-Excel Application

The patent drawing shows the CODT Extraction component interfacing with the Ontology Platform, Protege, Topbraid, or any RDF-Store, and the Load component importing metadata into the modeling tool or repository. Internally, there are Configuration, Analytics, and a User Interface component.

FIG 2 is a UML component diagram of the CODT system and the two external systems: the ontology platform and the Data Modeling Tool.
CODT Patent FIG 2 System UML Component diagram (color)

Any platform and programming language can implement the system, metadata sets, and method. Microsoft Excel is the tool of choice for viewing and analyzing tabular data. Every data architect has Excel and knows how to view, search, and filter. Hence, MS Excel is a fast prototyping tool for the CODT Metadata Sets and makes the transformation easy to deploy.

Atlantic CODT installs four Excel Workbooks: Configuration, Ontology, Entity Relationship, PowerDesigner, and an Excel Add-in for common VBA functions.

In this article, we look at the core of the transformation process:

ComponentMetadata SetExcel Workbook
ExtractionOntology MetadataOntology MDS.xlsx
TransformationGeneric ER Metadata Entity Relationship MDS.xlsx
LoadPowerDesignerPowerDesigner MDS.xlsx

In the following paragraphs, we follow the easiest transformation, ontology class to data model entity, to understand how CODT works. The PowerDesigner Metadata Set for import into the modeling tool is a simple Excel spreadsheet with columns for Entity Code, Name, Comment, and lineage attributes Prefix, Localname, and URI. PowerDessigner imports the spreadsheet and creates a data model entity for every row.

Future articles and Semantics for Large Banks will explain the complex transformation of ontology object properties into data model associative entities and the configuration options.

Extract SPARQL query

Some data modeling tools like Sparx Enterprise Architect and IBM Infosphere Data Architect have imports of RDF/OWL files. The file import approach parses the ontology file and creates data model objects as they go along. CODT doesn’t parse files. Your ontology development platform, Protege or Topbraid, or your Graph, RDF-Store DB, already invented the parsing wheel. Hence, Atlantic CODT uses the ontology query language SPARQL on a FIBO graph to extract the metadata.

The CODT Configuration component generates a simple SPARQL query for all named classes. We query the class name, qualified name, namespace (URI), and, per the CODT FIBO configuration, the SKOS definition or class.

# Owl Classes.rq
SELECT ?class ?qname ?namespace ?skos_definition
WHERE {
	?class a owl:Class .
 	BIND(afn:namespace(?class) AS ?namespace) .
 	FILTER (smf:isBound(?namespace) ).
	BIND (smf:qname(?class) AS ?qname ) .
	OPTIONAL { ?class skos:definition ?skos_definition }
	FILTER (?class NOT IN (owl:Nothing,  owl:Thing))
}

You do not want unmanned classes, as in anonymous class restrictions in your data model. Hence, we only extract classes that the EDM Council FIBO teams have named by stipulating a URI.

We also filter the ontology root classes owl:Nothing and owl:Thing that won’t become entities in a data model.

The raw query result set is a comma-separated value, a CSV text file with a tabulator as the delimiter.

FIG 5 is a screenshot of the query results in a text editor, Windows Notepad.
CODT Patent FIG 5 Ontology CSV Extract screenshot.

The above query and resultset are for the Topbraid Composer ontology development platform. You can adapt the query for your SPARQL dialect to produce the same metadata extract.

Extraction – Ontology MDS Workbook

The Ontology Metadata Set is an import of the raw CSV into MS-EXcel.

The ontology metadata workbook imports the raw extract and performs simple format conversions from the raw result set.

FIG 6 is an MS Excel screenshot of the Ontology Metadata Set.
CODT Patent FIG 6 Ontology MDS MS Excel screenshot (color).

We have the Class, Qualified Name, Namespace, and SKOS definition, the CODT configured main descriptive annotation property.

Notice the other Excel tabs and ontology metadata sets for Object Properties, Domain, Range, Sub-class, and Sub-property. The CODT Patent Specification explains them in detail.

Entity Relationship MDS Workbook

The Entity-Relationship (ER) workbook is a generic Conceptual Data Model representation. The ER MDS is independent of data modeling tool specifics.

The class-to-entity transformation is straightforward. The Entity Code must uniquely identify an entity in the ER Metadata Set. The Ontology MDS qualified name (= prefix+localname) is already unique for an ontology class. Hence, the configuration for the FIBO transformation uses the QName as the entity code without further transformation. Likewise, Prefix, Localname, and Namespace are copies of the Ontology MDS source.

FIG 9 is a CODT screenshot of the Entity-Relationship MDS.
CODT Patent FIG 9 Entity Relationship MDS Excel screenshot (color)

The only real transformation is a string conversion from the FIBO Camel Case Localname to the Conceptual/Logical Data Model naming convention. A UNCAMEL() function in the CODT add-in finds lower to uppercase changes and inserts a space. BoardAgreement becomes Board Agreement.

Data Modeling tools have prominent Comment or Definition properties for all model objects. A Configuration option tells CODT what Ontology Metadata Set column to use. All FIBO classes have a SKOS Definition.

The consistent design for all MDS has source columns to the left and derived columns like the Entity Name to the right.

Other tabs in the workbook transform supertype/subtype relations, associative entities derived from object properties, and attributes derived from data properties.

PowerDesigner MDS Workbook

The data model tool-specific MDS uses names and formats required for a direct load. In other words, PowerDesigner imports this MDS workbook.

FIG 11 is a screenshot of the PowerDesigner_MDS in the CODT Excel implementation.
CODT Patent FIG 11 PowerDesigner MDS Excel screenshot (color)

For the trivial import of entities, we don’t need further transformation. The Code, Comment, Prefix, Localname, URI, and (logical) Name are direct copies of the ER MDS values.

The other tabs, Inheritance, Inheritance Links, and Association, reflect PowerDesigner object names that differ from generic ER names.

PowerDesigner Import

Unlike CSV files, SAP PowerDesigner has a robust metadata import, and the tool imports MS Excel spreadsheets. We define a mapping from the Excel tab to the PD model object (entity) and from columns to model object properties.

FIG 12 is a screenshot of the PowerDesigner data modeling tool.
CODT Patent FIG 12 PowerDesigner Import screenshot (color)

The Imported File specified the path and Excel filename, the Mapping Description lists column and properties.

The import also facilities orchestration. We can import several tabs withing a single import job.

The screenshot below shows the outcome of the entity import. The tool created an entity for every row in the PowerDesigner MDS.

A PowerDesigner screenshot of the imported entities and the Board Agreement properties.
CODT PowerDesigner imported entities

The Entity Properties show the Name, Code, and Comment of the Board Agreement.

The PowerDesigner import is performing poorly. It takes over an hour to import 2000 entities and several hours for Relationships and Relationship links.

Shortcomings of the CODT “working product”

The pre-Atlantic CODT, implemented in MS Excel, transformed the FIBO into a high-quality Enterprise Data Model. The FIB-DM non-provisional patent application with the United States Patent and Trademark Office (USPTO), #16985414, describes the invention as offering a “working product”, CODT.

The “working product” was a prototype, and the FIBO data model is the Proof of Concept. While good enough to demonstrate the model transformation for USPTO examiners, the prototype is not user-friendly. For example:

 ==IF(VLOOKUP(C54,'D:\Local Documents\Financial Regulation Ontology\Data Model\Query\Transformation[Object Properties.xlsx]Object Properties'!$A$1:$I$495,8)<>"",
VLOOKUP(C54,'D:\Local Documents\Financial Regulation Ontology\Data Model\Query\Transformation[Object Properties.xlsx]Object Properties'!$A$1:$I$495,8),"") 

Arcane Excel formulas and undocumented VBA code.

While fully implementing the invention, including the patented storage medium Metadata sets and a system comprising ETL, configuration, and analytics components, the method required too much operator manual action, copying data between metadata sets and extending formula ranges.

In short, it was a USPTO working product that was not ready for general availability.

Atlantic requirements

Just like FIB-DM is an enterprise data model, CODT remains enterprise-level software. The data modeling tool will never include a shrink-wrapped ontology-to-data model converter or a CODT import function.

  • Robust
  • Transparent
  • Easy
  • Configurable
  • Extensible

Atlantic CODT is an open platform that fully discloses Metadata Sets, Excel formulas, Visual Basic for Applications (VBA), and Power Query M-language code.

Data Architects and Developers can easily extend the application by supporting additional configuration options and OWL constructs.

Game-changing Microsoft Excel tables and Power Query

Tables and Power Query in the Atlantic version significantly reduced code, manual intervention and improved readability.

Excel Tables

Excel tables identify a list of data with headers and provide additional functionality to manage and refer to the data.

For Atlantic, the formulas refer to meaningful column names rather than letters, digits, and the $ sign. For example, the Entity-Relationship MDS Name formula becomes: =UNCAMEL([@Localname])

Excel applies the formula automatically to the entire column. In other words, we don’t have to worry about named ranges and manually (or per macro) copy formulas.

Tables make the Metadata Set transparent and reduce errors.

Excel Power Queries

Power Query is data connectivity and transformation technology for external data sources. While no RDF-Store connector for Power Query is available, CODT uses the technology to load the raw CSV query results and move and transform data from Ontology to Entity-Relationship onto the tool-specific metadata set.

A Power Query screenshot of the Ontology Metadata Set connections.
CODT Ontology MDS – queries and connections

The screenshot shows our Classes tab in the Ontology MDS and its Queries & Connections.

Every tab in the workbook contains an Excel Table, which populates from a Power Query on the SPARQL CSV result set.

A major benefit of the pre-Atlantic Text Import is the defined mapping of CSV to Excel Table columns and powerful value transformation.

Furthermore, we can refresh individual or all queries in the workbook if our raw ontology metadata changes.

We no longer have to open CSV files, copy values, format, and transform the data.

Let’s take a closer look at the Class query.


The Power Query Editor formula bar shows our source file, the CSV Document CODT_HOME & "\Ontology Source\qrClasses.txt.

FIG 8 shows the Class query in the PowerQuery Editor.
CODT Patent FIG 8 Ontology MDS in PowerQuery screenshot (color)

The Applied steps list the Source, type changes to the text, and promote the first CSV record to table column headers.

M Programming language

Underneath the User interface lies M, a powerful ETL language. The snippet below shows the M-code of our Class query transformation.

    Source = Csv.Document(File.Contents(CODT_HOME & "Ontology SourceqrClasses.txt"),[Delimiter="	", Columns=4, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true])

Like SQL and SPARQL, M is a declarative language. We define what we want without concern for how Power Query accomplishes the transformation.

For the Atlantic version of CODT, Power Query eliminated error-prone macros and transformation procedures.

Complex queries and transformations

Beyond simple transformations, Power Query fully supports Joins, Aggregations, and Transpositions.

The snapshot below is an outlook to upcoming Atlantic CODT articles and PowerPoints, and videos about the complex Object Property Transformations.

Ontology Object Properties are the most complex transformation challenge. The common mapping of ER Relationships is erroneous:

  1. Object properties, especially in the FIBO, have a sub-property hierarchy. Hence, we must transform them into associative entities, not relationships.
  2. Inverse object properties do not transform into two relationships or associative entities. For example, we do not have two association identifiers in a data model, and we are identified between Legal Entity and LEI. You, the data architect, must specify which object property (usually the active verb form) to transform.
  3. Various class restrictions must consolidate into ER relationship cardinalities.

Queries rather than complex VLOOKUP() formulas in Atlantic make complex transformation rules transparent.

The Power Query Dependencies diagram shows the Ontology and the Entity-Relationship MDS – the arrows indicating a data source.

A MS PowerQuery scrennshot of CODT Ontology and ER Metada Sets
CODT Metadata Sets – PowerQuery dependencies

We visualize the query dependencies and the metadata movement from the ontology graph into the data modeling tool.

Atlantic summary

For midsize FIB-DM licensees starting with SEIA, Atlantic provides a single-click transformation of your FIBO extensions into your FIBO Data Model. You can easily change configuration parameters to support data modeling standards and naming conventions.

Large financial institutions use the Atlantic CODT source code to develop and deploy proprietary enterprise embodiments.

Develop a Graph DB connector or source the raw Ontology metadata from your RDF store in a batch job. Replace the modeling tool import with your proprietary Enterprise Ontology and Data Model Repository synchronization.

Licensing Atlantic CODT

The general availability of Atlantic CODT is scheduled for fall this year after pilot/POC with two banks.

In a few months, one or two institutional licensees can try out Atlantic with your FIBO extensions and other ontologies.

Conclusion

Atlantic CODT enables the Semantic Enterprise Information Architecture.

With Atlantic, FIB-DM users transform their FIBO extensions and related proprietary or Open Source ontologies into data models.

Atlantic is robust, transparent, easy to use, configurable, and extensible.

Thanks for reading; you are welcome to comment on the LinkedIn version of this article or email me your questions.

Jurgen Ziemer, Ontologist and Data Architect at Jayzed Data Models
Email: jziemer@jayzed.com

References and further reading

  • NEW! Semantics for extra Large Banks – the CODT Tutorial video
    and the CODT Tutorial PowerPoint
  • Finance Ontology transformed into an Enterprise Data Model article
  • Semantics for Data Architects PowerPoint online presentation
  • FIBO on Enterprise Data Management Council website:
  • Collect, Combine, and Transform Data Using Power Query in Excel and Power BI 1st Edition by Gil Raviv
  • Excel 2019 Power Programming with VBA 1st Edition by Michael Alexander, Dick Kusleika
  • Microsoft Excel 2019 VBA and Macros (Business Skills) by Jelen, Bill
  • Stone Age cartoon used with permission from Oxford Creativity, https://www.triz.co.uk