Configurable Ontology to Data-model Transformation (CODT)

Abstract

CODT is a method, metadata specification, and system for transforming an ontology into a data model. A user can configure transformation rules and ontology to data model mapping. CODT has configurable metadata sets for a source ontology schema, an entity-relationship model, and a target data modeling tool-specific metamodel, which may have extensions for source ontology lineage, semantics, and annotations. The method loads the source ontology into the metadata set, transforms into data model metadata sets, and loads the metadata into the data modeling tool. The system has components for extraction, transformation, and load and may have a user interface and analytics component and interfaces to the ontology platform and data modeling tool.

This article is an abridged version of the CODT patent application. It is reworded and reformatted for better readability.

Background

There is a need to align and synchronize newly emerging semantic information models with traditional entity-relationship based information models.

Since Edgar F. Codd’s 1970 seminal publication “A Relational Model of Data for Large Shared Data Banks” Relational Database Management Systems (RDBMS) quickly became the standard way to store information. RDBMS present data to the user in the form of tables with sets of columns and rows. A Primary Key specifies one or more columns that uniquely identify a row in the table. A Foreign Key column in a table establishes a relation to the Primary Key another table.

In 1976 Peter Chen published “The Entity-Relationship Model – Toward A Unified View of Data.” The Entity-Relationship (ER) model describes information in the form of Entities, Attributes, and Relationships. ER modeling notation standardizes diagram shapes and connectors.

Data modeling tools are software applications that provide a graphical environment to create consistent information models. SAP’s PowerDesigner and Erwin, Inc.’s ERWin are the most widely used data modeling tool by far.

Model transformations are an automated way to generate a target model from a source model:

  • All data modeling tools can transform a logical data model (LDM) into a physical data model (PDM), and then generate SQL Data Definition Language (DDL) scripts to deploy the schema on an RDBMS.
  • Most data modeling tool can open native PowerDesigner or ERWin model file and transform it into their proprietary format or the Object Management Group’s standard, XML Metadata Interchange Standard (IMI).
  • Some enterprise modeling tools such as PowerDesigner or Sparx EA can transform data models into object models and vice Versa.

Enterprise Data Models (EDM) is a single integrated ER definition of data for a company or organization, independent of the physical databases. ERM diagrams a critical communication tool between business and IT. Data architects map exiting RDBMS to the ERM and transform (parts of) the ERM into departmental and project LDMs. Industry Models are generic ERMs for a domain.

In 2001 Berners-Lee, Hendler, and Larissa published “The Semantic Web” article in Scientific American. The World Wide Web Consortium (W3C), founded by Tim Berners-Lee, is the primary international standards organization for the World Wide Web.

In 2014 the W3C published the Resource Description Framework (RDF) specification. RDF describes information resources in expressions of the form subject-predicate-object. RDF Schema (RDFS) provides structure to RDF with predefined classes and properties. Ontology Web Language (OWL) further extends RDFS enabling the modeler to define complex class restrictions. RDF stores or “triple-stores” are specialized databases to store and retrieve triples via a query language SPARQL.

While RDBMS are dominating, RDF-stores fill a niche in data architecture for large enterprises. However, OWL has replaced the ERM and become the model of choice business and enterprise architects. Prominent examples for semantic industry models are the Gene Ontology and the Financial Industry Business Ontology (FIBO).

Enterprise Architects need ontology to data model transformation tools to leverage their ontologies for traditional RDBMS. However, existing tools fail to meet the requirements for a quality ontology generated EDM:

  • Ontology development platforms do not have the functionality to generate LDMs.
  • The most widely used data modeling tools, ERWin, and PowerDesigner do not have an RDF/OWL import option.

Some less widely used object modeling tools, Sparx EA, and IBM InfoSphere Data Architect can import RDF/OWL files, but they are a Black Box, and the user has no control over the transformation process.

  • The transformed LDM entity names do not conform to Logical model naming standards.
  • The Transformed LDM contains entities generated from class restrictions that have no place in a data model.

Trying to transform the FIBO into an EDM, I have found that InfoSphere Data Architect aborted without any output or diagnostic files and that Sparx EA was unable to process ontology imports. In other words, the tool can only transform a single ontology file – FIBO has more than 160 ontology files. Furthermore, the tool uses full resource URIs as entity names.

CODT advantages

Although other invented OWL to LDM transformation, various aspects of my transformation are superior because:

  • The user can fully configure the scope of the source ontology, mapping, and transformation rules.
  • CODT can transform very large ontologies because it can retrieve metadata from an RDF store using SPARQL rather than parsing ontology files.
  • The invention enriches the output data model with metamodel extensions.
  • The transformation process is bi-directional. In other words, the invention can transform a data model into an ontology.

Detailed Description – First Embodiment

Figure 1 illustrates from a user perspective what the invention does. To the left, the diagram depicts a source ontology graph and the generated conceptual data model (CDM) on the right-hand side. The arrows show the mappings and transformations from ontology to CDM.

Mapping diagram: Ontology graph to Conceptual Data Model
Ontology graph to Conceptual Data Model mapping

CODT transforms an ontology class, DepositoryInstitution into a data model entity, Depository Institution. The translation from ontology names, Camel Case to data model object names, LDM convention, is configurable. The PDFS subClassOf object property from the domain, Depository Institution to the range, Bank transforms to an LDM subtype relationship between the two entities. Depending on the user configuration, ontology Object Properties identifies and isProvidedBy transform into CDM Association and LDM associative entities. Depending on the configuration settings, ontology Class Restrictions and Object Property domain and Range transform into data model relationships and their cardinalities.

The first embodiment comprises the core transformation method, system, the non-transitory computer-readable medium (CRM). Subsequent embodiments build out the core with interfaces, additional configuration settings and target model types, and support for other data modeling tools.

Working Product, Prototype

The data model in Figure 1 is part of the Financial Industry Business Data Model (FIB-DM.COM), transformed from FIBO as a proof of concept (POC) for CODT. The transformation software is a Working Product or Prototype of the first embodiment of the invention. The prototype is an implementation of the core method, datasets, and system components.

  • The prototype is an MS-Excel application.
  • The encoded configuration options are tuned to generate an EDM from the FIBO source ontology.
  • The target native data model is a PowerDesigner CDM.       

The invention achieves its results with metadata sets for ontology, generic ERM, and tool-specific model. MS-Excel facilitates rapid prototyping of data set driven algorithms because Excel cells can contain formulas for populating and manipulating data values. PowerDesigner can import a data model from tool-specific Excel worksheets. The prototype is a White Box because all interim metadata sets are open to user inspection.

Hence, for a team skilled in the art, I recommend building the first embodiment as Excel/Powerdesigner prototype first, validate the output, and then proceed to encode other configuration options, other target models and tools, or a standalone Windows application. Screenshots and program source files in this description are from the POC.  

However, the First Embodiment is not the preferred embodiment.

System components

The diagram shows the Configurable Ontology to Data Model Transformation at the center and the two external Ontology Platform and Data Modeling Tool systems at the top.

CODT System component diagram
CODT System component diagram

As used herein, an Ontology Platform is a computer application that can load an ontology and provides an interface to query the ontology and return results. Hence the ontology platform is either an ontology development application or an RDF-store.

The Transformation component is the central module of the CODT system. It is self-contained with computer code to transform and move metadata from ontology to generic ER, and finally to modeling tool-specific metadata-sets. For example, in the Excel prototype spreadsheets for metadata sets, formulas and Visual Basic code comprise the transformation component.

The Extraction component, an aggregate extension of the Transformation component, populate the ontology metadata set. It comprises of SPARQL templates and computer code to parametrize and thus convert the template into executable SPARQL SELECT statements. The component further comprises structures to retrieve raw data, query results, and code to convert data values into metadata set formats. In core embodiment, the interface to the ontology platform is file-based.

The Load component is also an aggregate extension of the Transform component. In the core embodiment, the interface from the Load to Data Modeling tool is file-based. The Load component has computer code and templates to directly create data model import files for direct consumption by the modeling tool.

The Configuration allows the user to maintain  parameters, options, and settings stored on the CRM.. The associated Transformation component reads configuration values to determine the scope, mappings, and transformation rules.

Computer-readable medium

CODT stores computer instruction, metadata-sets, configuration settings, and query templates on a local or network drive.

Metadata sets

Metadata is data that provides information about other data. A data set is a collection of data records. As defined herein, a metadata set is metadata stored in data sets.

An RDBMS store information about the database schema in metadata sets, the system tables. For example, an RDBMS SYSCOLUMNS table comprises COLUMN_NAME, TABLE_NAME, DATATYPE, NULLS, and other information about the column. COLUMN_NAME and TABLE_NAME define a unique key for the SYSCOLUMNS table.
A script SQL Data Definition Language (DDL), system tables, and a Physical Data Model (PDM) can be isomorphic representations of the same database schema. A data-modeling tool can generate DDL from a PDM and reverse engineer a DDL script into a PDM. An RDBMS can populate records in the SYSTEM tables from a DDL script and generate DDL based on information in its system tables.

Likewise, the CODT metadata sets are isomorphic representations of ontology, entity-relationship, and data modeling tool-specific metadata. CODT stores the metadata sets on the CRM. The metadata sets can serialize as files, tables in a database, MS-Excel sheets, or any other suitable format. They can be local or remote. The prototype metadata sets are Excel spreadsheets.

The data movement model diagram illustrades the metadata sets.

CODT Metadata sets data flow
Metadata sets data flow diagram

The large rectangles delineate the Ontology, Generic ERM, and Data Modeling Tool-specific metadata sets. The arrows connect the input and output of instructions for OWL-to-ERM and ERM-to-DM transformation and movement on the CRM.

Ontology metadata sets

RDFS/OWL defines a vocabulary of classes and properties to describe ontology metadata in the form of triples. CODT holds the same information in the form of datasets or tables. The ontology metadata sets are a tabular isomorphic representation of the ontology schema. In other words, a SPARQL SELECT executed on the Ontology Platform can be written to produce a tabular result set. Likewise, we can generate SPARQL CONSTRUCT statements to produce metadata triples based on a CODT metadata set.

Each ontology metadata set has one or more SPARQL SELECT templates associated. The template converted to executable SPARQL is the query to populate the ontology metadata set. The ontology metadata sets are source for the subsequent metadata movement into generic ERM metadata sets.

Ontology metadata sets comprise the types of ontologies, classes, object and data properties, annotations, and instances. Metadata sets of a particular type have the same key. However, CODT does not prescribe the organization of metadata set for a type. For example, the development team may choose to have one or more metadata sets for classes to facilitate processing and performance.

The list of core ontology metadata sets below shows the central triple in the query WHERE-clause. The specification discloses queries in the appendix.  

The Ontologies metadata set comprises of ontology namespace and prefix.

The Ontology Annotations metadata set has the ontology module as a unique key and additional columns for the annotation properties listed in the configuration. For example, dct:abstract, dct:license, sm:copyright.

The Classes metadata set comprises of the following columns, shown with a sample value:
Class: fibo-be-corp-corp:BoardAgreement
Label: board agreement
Prefix: fibo-be-corp-corp
Namespace: https://spec.edmcouncil.org/fibo/ontology/BE/Corporations/Corporations/
Localname: BoardAgreement
Unique keys on the dataset are class, Localname + Prefix, and Localname + Namespace.
The CRM stores a corresponding SPARQL template to SELECT all named classes and their associated metadata set values (Tripple pattern rdf:type owl:Class).

The Subclasses metadata set has two columns, super-class, and sub-class. The two columns constitute a unique key (Tripple pattern rdfs:subClassOf).

The Superclasses metadata set has a unique column Class and a boolean indicator Exclusive. Records in the metadata set consist of all classes that have subclasses. The Exclusive indicator is TRUE for classes; whose children are disjoint (Tripple pattern owl:disjointWith).

The Class Annotations metadata set has the Class as a unique key and additional column for annotation properties listed in the configuration. For example: skos_definition, skos_editorial_note, skos_example,  rdfs_comment, rdfs_is_defined_by, rdfs_label, rdfs_see_also, sm_direct_source, sm_related_specification. (Tripple pattern owl: AnnotationProperty)

The Object Properties metadata set comprises of the Property, Label, Namespace, Localname, Parent Property, Functional, and Inverse Functional indicators. (Tripple pattern rdf:type owl:ObjectProperty, owl:FunctionalProperty, owl:InverseFunctionalProperty).

The Object Property Inverse metadata set consists of two columns, Property, and Inverse Property. (Tripple pattern owl inverseOf).

The Object Sub Properties metadata set has two columns, super-property, and sub-property. (Tripple pattern  rdfs:subPropertyOf)

The Object Property Domain Range metadata set comprises of the object property its domain and range. (Triple pattern rdfs:range, rdfs:domain)

The Object Property Annotations metadata set has the object property as a key and additional columns for annotation properties listed in the configuration. (Tripple pattern owl: AnnotationProperty)

The Object Property Restrictions metadata set has a key of class, object property, and restriction. Non-key columns express the cardinality. (Tripple pattern owl:Restriction, owl:onProperty, owl:someValuesFrom, owl:someValuesFrom, owl:cardinality, …)

The Data Properties metadata set comprises of the data property, range, domain, Label, Namespace, Localname, and parent property. (Tripple pattern owl:DatatypeProperty, rdfs:subPropertyOf)

The Data Property Restrictions metadata set comprises of Class, Restriction, Data Property, and cardinality columns.

Generic Entity-Relationship Model metadata sets

The metadata sets are an interim metadata representation between the ontology and the modeling tool-specific metadata sets. They follow the Entity-Relationship Model and describe a generic logical data model in tabular form.

The non-transitory computer-readable medium stores the metadata sets, along with mappings to the ontology metadata sets, and computer instructions to move and transform the metadata.

An ER metadata set has a reference to one or more source ontology metadata sets.

In the CODT prototype, the ER metadata sets are MS-Excel spreadsheets. The computer instructions are Excel formulas to populate cells in the ER spreadsheet from cells in the ontology spreadsheet, and Visual Basic code to move and manipulate data. The ER metadata sets may have additional derived columns to break down formula complexity.

Data Modeling tools differentiate between a model object code and name. The object name is a label to be used in diagrams and reports. The code is a technical shorthand. The modeling tool may require the code (and name) to be unique. The default configuration for ER metadata sets uses the ontology element (Prefix: Localname) as a code. The default object name convention has capitalized English words separated by spaces. For example Code=fibo-be-corp-corp:BoardAgreement, Name=Board Agreement.

The ER Modules metadata set comprises of Code, Name, Parent Module.
Its source reference is the Ontologies metadata set. The unique module code has the value of ontology code. The Name is its value derived from the Ontologies Namespace.

The ER Entities metadata set comprises of Entity Code, Name, Package, URI, and Parent Entity.
The source reference is the ontology Classes metadata set.

The ER Attributes metadata set comprises of Attribute Code, Data Type, Length and Precision

The ER Entity Attribute metadata set comprises of Entity Code, Attribute Code, and an Optional/Mandatory flag.

The ER Relationships metadata set comprises of the Relationship, Parent Entity, and Child Entity Code, Cardinalities, indicators for Generalization, Many-to-Many, the Passive Inverse Relationship, the Parent Relationship.

Four ER Annotation metadata sets, hold documentation items for Module, Entity, Attribute, and Relationships. The ontology sources are the corresponding ontology, class, data property, and object property annotation.

Data Modeling Tool-specific metadata sets

The metamodel of a modeling tool differs from the ER model and between modeling tools. There may be different structures and different names for metamodel object and properties. For example, PowerDesigner calls subtypes relationships inheritances; other tools call them generalizations. Hence, CODT has tool-specific metadata sets for direct load into the modeling tool. Therefore the tool-specific metadata sets express the data model in the target tool’s vocabulary and structure.
Modeling tools have a prominent property to define or describe a model object, the Comment field in PowerDesigner. Configuration settings specify which ER Documentation item, annotation property in the ontology should populate the Comment or Definition field.

PowerDesigner’s import facility can add objects specified in MS-Excel Workbooks to the open data model. 

The CODT prototype has 21 Excel workbooks for PowerDesigner CDM import. The numbering within the filename prescribes the import order. The highly incremental import is the best way to implement a prototype because the developer can debug and validate smaller files. The developer can then proceed to consolidate the metadata sets.

1. Entities.xlsx is the import spreadsheet to load entities into the data model.
Code: the code of the entity
Name: the name of the entity
Parent: the code of the model package that contains the entity.

1.1 Entity Inheritances.xlsx is the import spreadsheet to load metadata for subtype symbols into an LDM or CDM.
Code: a concatenation of the supertype entity code and “_ST.”
Name a concatenation of the supertype entity name and “ subtype.”
Comment: a concatenation of “Instances of” and the supertype entity name.

1.2 Entity Inheritance Links.xlsxis the import spreadsheet load the links from subtype symbol to participating subtype entities into an LDM of CDM.
Inheritance: The inheritance code from 1.1 Entity Inheritances
Child Entity: The participating entity, a valid code from 1. Entities.

2. Associations.xlsx is the spreadsheet to load associations into a conceptual data model; columns are code, name, and comment.

2.1 Association Links (parent).xlsx is the spreadsheet to load the link to the participating parent entity.
Entity: the code of the participating entity, sources from the ER Relationships Parent Entity
Association: the code of the association
Role: a concatenation of entity name and association name
Identifier: an indicator (Y/N), whether the relationship is identifying on the association or not
Cardinality: The cardinality of the relationship, for example, “0,n”.

2.2 Association Links (child).xlsx is the spreadsheet to load the link to the participating child entity.

3. Associative Entities.xlsx is the spreadsheet to load associative entities into a logical or conceptual data model. Columns Code, Name, and Comment are the same as in 2. Associations. Configuration settings determine whether ER Relationships transform into CDM Associations or Associative Entities.
Stereotype: fixed value “Associative Entity”.

3.1 Associative Entity-Relationship (parent).xlsx is the spreadsheet to load the relationship to the participating parent entity into the data model.
Entity 1: The code of the Parent Entity
Entity 2: The code of the Child Entity is the associative entity.
Code: The code for the relationship, a concatenation of Entity 1 and “-“ and Entity 2
Name: A concatenation of Entity 1 Name <space> and Entity 2 Name

3.2 Associative Entity-Relationship (child).xlsx is the spreadsheet to load the relationship to the participating child entity into the data model, analogous to 3.1 Associative Entity-Relationship (parent).

3.3 Associative Entity Inheritance.xlsx is the import spreadsheet to load metadata for subtype symbols for associative entities into an LDM or CDM, analogous to 1.1 Entity Inheritances.

3.4 Associative Entity Inheritance Links.xlsx is the import spreadsheet load the links from subtype symbol to participating subtype entities into an LDM of CDM.

3.5 Associative entity Relationship (parent of inverse).xlsx is the spreadsheet to load additional relationships of participating parent entities into the data model, analogous to 3.1. Associative Entity-Relationship (parent). See Inverse Object Properties in the method specification.

4 Data Items.xlsx is the spreadsheet to load data items into a conceptual data model. The metadata set refers to the EA Attributes metadata set.
Code: The unique code for the data item; sourced from EA Attributes.
Name: The name of the attribute. Confirming with LDM/CDM naming standards, the default configuration removes the ontology naming convention leading verb. For example, the Localname “hasPostalCode” transforms to “Postal Code”.
Comment: derived from annotation as per configuration setting.
Datatype: The generic ER attribute type, can be overwritten to translate to modeling-tool specific data types in the configuration settings.  

4.1 Entity Data Item.xlsx is the spreadsheet to load Entity Attributes into the CDM or LDM.
Entity: The code of the entity
Attribute: The code of the data item
Mandatory:  (Y/N) indicator

5.1 Entity Annotations.xlsx is the spreadsheet to load documentation from annotations into the model.
The first column, Entity is the code of the entity. Additional columns refer values from the ER Entity Annotation metadata set. The column names are PD extended attributes. (see method specification)


 5.2 Associative Entity Annotations.xlsxis the spreadsheet to load documentation for associative entities, analogous to 5.1 Entity Annotations with EA Relationship Annotations as source reference.

5.3 Association Annotations.xlsx is the spreadsheet to load documentation for associations, analogous to 5.2 Associative Entity Annotations.

5.4 Data Item Annotations.xlsx is the spreadsheet to load documentation for data items, analogous to 5.1 Entity Annotations with EA Attributes as source reference.

6 Packages flat.xlsx is the spreadsheet to create packages in the data model.
Code: the unique code for the package, referred from EA Modules.
Comment: The description of the packaged as per the annotation specified in the configuration settings.
Use parent namespace: A Y/N indicator defaulting to ‘Y’. For PowerDesigner packages using the parent-namespace means that object names must be unique across the packages.

6.1 Entities package.xlsx is the spreadsheet to placing entities within a package.
Code: The code of the entity
Parent: The code of the package

6.1 Package Annotations.xlsx is the spreadsheet to load documentation for associations, analogous to 5.2 Associative Entity Annotations.

6.8 Package hierarchy.xlsx is the spreadsheet to load the package hierarchy into the model.
Code: The code of the package
Parent: The code of the parent package.

Ontology Metadata profile stored on the CRM.

The Metadata profile is a statistical analysis of the ontology. The statistics are counts of ontology elements with a particular schema element. For example, the Annotation statistics, Table 1 lists annotation properties per object type and the number of instances.

[object_type] annotation Count
owl:Class rdfs:label 1618
owl:Class skos:definition 1615
owl:Class fibo-fnd-utl-av:adaptedFrom 981
owl:Class fibo-fnd-utl-av:explanatoryNote 782
owl:Class fibo-fnd-utl-av:abbreviation 513
owl:Class fibo-fnd-utl-av:synonym 130
owl:Class skos:editorialNote 111
owl:Class
owl:DatatypeProperty rdfs:label 151

Configuration Settings stored on the CRM

The configuration determines ontology to data model mappings and transformation rules. The configuration comprises of parameters and initial settings in the form of tag and value, and value lists. CODT stores the configuration on the computer-readable medium.

Table 1 shows the default configuration to transform a domain ontology into a conceptual data model.

CONFIGURATION
OPTION VALUE
Source Ontology
Platform Topbraid Composer
File or graph /FIBO/2018Q4/FIBO_all_llc.ttl
Excluded modules None
Target Model
Modeling Tool PowerDesigner
Type Conceptual Data Model
Name FIB-DM
Object Naming Rules
Code Prefix:Localname
Name Uncamel Localname
Transformation Rules
Infer Packages Yes
Transform anonymous classes No
Transform equivalent classes Separate entities
EA Relationships
Default Relationships
Many-to-Many Association
Parent/Child Associative Entity

Table 2 – configuration settings

The Configuration list of annotations, Table 2 specifies the annotation properties that should transform into extended data model attributes, also known as user-defined properties.

ANNOTATIONS
Ontology Class Object Property Data Property
dct_abstract dct_source skos_definition skos_definition
dct_issued av_abbreviation skos_editorial_note skos_editorial_note
dct_license av_adapted_from skos_example skos_example
dct_source av_definition_origin skos_note skos_note
av_explanatary_note av_explanatary_note skos_scope_note skos_scope_note
rdfs_label av_synonym av_adapted_from av_adapted_from
rdfs_see_also av_usage_note av_definition_origin av_definition_origin
skos_change_note owl_depreciated av_explanatary_note av_explanatary_note
skos_editorial_note rdfs_comment av_synonym av_synonym
skos_history_note rdfs_is_defined_by av_usage_note av_usage_note
skos_scope_note rdfs_label owl_depreciated owl_depreciated
sm_address_for_comments rdfs_see_also rdfs_comment rdfs_comment
sm_content_language skos_alt_label rdfs_is_defined_by rdfs_is_defined_by
sm_contributer skos_definition rdfs_label rdfs_label
sm_copyright skos_editorial_note sm_direct_source sm_direct_source
sm_depends_on skos_example
sm_direct_source skos_note
sm_file_abbreviation skos_pref_label
sm_file_abstract skos_scope_note
sm_filename sm_direct_source
sm_is_normative sm_related_specification
sm_keyword
sm_publication_date
sm_related_specification

Table 3 shows part of the inverse object properties configuration list for the FIBO proof of concept.

The user designates the ‘active’ data property to transform into an ER metadata set relationship.

Inverse Object Property (passive) Object Property to transform (active)
fibo-be-fct-fct:isCharacterizedBy fibo-fnd-rel-rel:characterizes
fibo-be-fct-pub:hasPublisher fibo-be-fct-pub:publishes
fibo-be-ge-ge:isElectedBy fibo-be-oac-exec:elects
fibo-be-ge-ge:isJurisdictionOf fibo-be-ge-ge:hasJurisdiction
fibo-be-le-fbo:isSubUnitOf fibo-be-le-fbo:hasSubUnit
fibo-fnd-pas-pas:sellsTo fibo-fnd-pas-pas:buysFrom
skos:narrowMatch skos:broadMatch
skos:narrower skos:broader
skos:narrowerTransitive skos:broaderTransitive
skos:topConceptOf skos:hasTopConcept

Query templates stored on the CRM

As used herein, the term query templates include SELECT statements with placeholders, query fragments, and static executable SPARQL. There is a least one query template liked to each ontology metadata set. For example, the static query template below is the query to populate the Ontology Classes metadata set.

SELECT ?ont_class ?label ?prefix ?namespace ?localname
WHERE {
       ?ont_class rdf:type owl:Class .
       BIND ( smf:prefix(?ont_class) AS ?prefix ) .
       OPTIONAL {?ont_class rdfs:label ?label}
       BIND(afn:namespace(?ont_class) AS ?namespace) .
       BIND (afn:localname(?ont_class) AS ?localname) .
       FILTER (smf:isBound(?namespace)) .
       FILTER(?ont_class NOT IN (owl:Thing, owl:Nothing)) .
}

How CODT achieves its results

CODT is a method to transform an ontology into a data model. The method becomes an integrated set of computer processes when implemented on a computer system. The process depends on metadata sets, query templates, and configuration setting stored on the computer-readable medium. The CODT process is specified in the Business Process Modeling Notation (BPMN).

Transformation process overview

The process diagram presents a high-level overview of the transformation.

CODT Transformation method overview
Transformation method overview

The Start symbol at the left is a BPNM start event. In this case, the user begins an ontology to data model transformation. Different source ontologies require different configuration settings. Hence the first activity is to Analyze ontology. The Source Ontology indicated with a database symbol is an input to the Analyze ontology activity. The Source Ontology resides on the external Ontology Platform system, for example, an RDF-store. The plus-sign on the Analyze activity indicates that the activity is a sub-task. In other words, the process model expands with a detailed diagram of the activity.

Analyze ontology creates an output data object, the Metadata profile. All data objects reside on the computer-readable medium. The Metadata profile consists of query results to assist the user in the following Configure Transformation activity.

The Configure Transformation activity has the Metadata profile as an input object and creates the Configuration for the following Transform into Data Model activity. Preparatory activities, ‘Analyze ontology’ and ‘Configure Transformation’ are manual user tasks without encoded computer support in the first embodiment.

‘Transform into data model’ activity is the core of CODT and implemented in the first embodiment. The Configuration settings on the CRM and the Source Ontology are input data object for the sub-task The Data Model is the output of the transformation.

Transform into data model

There are four sequential tasks, Retrieve Ontology Metadata, Transform into generic ERM, Transform into DM Tool specific metadata, and finally Import into DM Tool.

The Retrieve Ontology Metadata task reads the Retrieval Configuration setting and produces Ontology Metadata. The Load System Component is the implementation of this task. The Retrieval configuration comprises of Configuration Settings stored on the CRM that are relevant to Source Ontology and scope. Examples are Table 1 Source Ontology settings and the list of annotation properties in Table 2. The output data object of the task is the Ontology Metadata, stored as the Ontology Metadata Set on the CRM.

The Transform into Generic ERM task implements the method to transform the Ontology Metadata into ER metadata. The input data object is the Ontology Metadata set, and the output is the ER Metadata set. MS-Excel formulas are the primary way the task archives the transformation. Formulas in cells of the ER Metadata spreadsheet refer to source cells in the Ontology Metadata spreadsheet. For example, cells in column Code in the Entities.xslsx have the formula:

='[OWL Classes.xlsx]Classes’!$A2

The formula is a simple copy of all class codes into values of the entity code column. Both Class Code and Entity Code are key columns of their spreadsheet. In other words, the values in the columns are unique.

Derived columns are string manipulations, lookups, and logical evaluations based on key columns. For example, the formula to convert the entity Code, ‘fibo-be-corp-corp:BoardAgreement’ into the Entity name, ‘Board Agreement’ is:

=UNCAMEL(RIGHT(A2, LEN(A2)-FIND(“:”,A2)))

The formula extracts the string right of the colon, ‘:’ and then converts the Camel Case notation into LDM naming convention, capitalized, space-separated words.  

The task has the ERM Configuration as an input data object. These are configuration settings stored on the CRM about Ontology to generic ERM transformation. For example, the Object Naming Rules:

Name      Uncamel Localname

The formula to convert Codes into LDM Names can be changed to test for the configuration value:

=IF(‘[ConfigurationFile.xlsx]Configuration’!$B$13=”Uncamel Localname”,
UNCAMEL(RIGHT(B2,LEN(B2)-FIND(“:”,B2))))

We can nest additional configuration values in the formula. For example, support a no conversion configuration setting:

=IF(‘[Configuration File.xlsx]Configuration’!$B$13=”Uncamel Localname”,
UNCAMEL(RIGHT(B2,LEN(B2)-FIND(“:”,B2))),
IF(‘[Configuration File.xlsx]Configuration’!$B$13=”None”,
RIGHT(B2,LEN(B2)-FIND(“:”,B2))))

For configuration settings with multiple options, testing the configuration value within the formula results in overly large and complex formulas. In that case, the task evaluates the configuration setting first and then uses Visual Basic code to copy the applicable formula into the target cells.

In summary, the invention achieves the configurable transformation by testing configuration values and using Excel formulas to move data from source to target metadata spreadsheet.

The EA Modules and EA Attribute metadata spreadsheets transform analogous to ER Entities.

The EA Entity Attribute transform from Ontology Data Properties and Data Property Restrictions spreadsheets. The RDF/OWL open-world assumption means that an ontology schema may define a data property without restricting the domain and classes that are valid for the data property. The ERM closed-world assumption means that an attribute cannot exist without an entity. Hence, the EA Entity Attribute metadata set doesn’t contain standalone object properties.

The transformation infers attributes on an entity from the domain in the Data Properties spreadsheet and stores the transformation results in a temporary EA Entity Attribute from Domain spreadsheet. The default transformation furthermore assumes that all entity attributes should be single-valued, regardless of whether the source ontology explicitly declares them owl:FunctionalProperty or not. The alternative transformation is to transform non-functional data properties into entities. However, this leads to overly normalized and convoluted data models, for source ontologies that don’t rigorously specify the functional property. The user can make the proper configuration setting after analyzing the source ontology.

The transformation infers further attributes on an entity from the Data Properties Restrictions spreadsheet and stores the transformation results in a temporary EA Entity Attribute from Restriction spreadsheet. A formula determines the Optional Mandatory flag from the Data Property Restrictions cardinality columns. The attribute is mandatory if any of the following conditions apply:
Some_values is not empty; Cardinality is 1; Has Value is set; Max Qualified Cardinality is 1.

Finally, the transformation consolidates both temporary spreadsheets into the EA Entity Attribute metadata set. The temporary spreadsheets are not a source for the ERM to Modeling-Tool transformation. However, the task preserves them on the CRM for validation and diagnostics.

The EA Relationships spreadsheet transforms from the six Ontology Object Property spreadsheets.

The common notion that object properties map and transform to data model relationships is an oversimplification. As a result, available non-configurable ontology to data model transformations produce inadequate and inconsistent logical data models.

Object properties have semantics that is different from data model relationships and exceed the relationship expressiveness:

  • Object properties with an owl:inverseOf pairing must transform into a single data model relationship.
  • Object properties with many-to-many cardinalities must transform into associative entities or associations.
  • Object properties with a rdfs:subPropertyOf axiom must transform into subtypes of associative entities to preserve the ontology semantic.

The ER Relationships metadata set is a collection of all metadata about a relationship, including the above inverse relationship, cardinalities expressed in ER notation, and supertype relationships. The codes for Relationship, Parent Entity, and Child Entity constitute the natural key of the dataset.

Analogous to the EA Attributes spreadsheet, the transformation task determines Parent and Child entities from the Object Properties and Object Property Domain Range ontology metadata spreadsheet. The Object Property Restrictions ontology metadata set determines further Parent Entities.

Passive inverse object properties, specified in the configuration settings, are excluded from the ER Relationship metadata set. However, the passive object property may determine further participating entities and cardinalities for the inverse active object property in the Object Property Inverse ontology metadata set. For example, the domain of a passive object property is the range of the active inverse property and hence determines the relationship’s child entity.

The initial pass may create duplicate rows with the same relationship, parent, child entity. For example, object property domain and a class restriction may yield the parent entity. The algorithm removes duplicates and retains the most restrictive cardinality. Furthermore, the algorithm removes incomplete relationships. In other words, both Parent and Child entity must be present.

The ER Annotations metadata sets transform as a simple copy of values in the ontology metadata set.

The Transform into DM Tool-specific metadata task implements the transformation from ER metadata sets to data modeling tool-specific metadata sets.

Configuration settings specify, filters on ER metadata set members and the designated tool-specific metadata set. For example, the EA Relationships configuration setting may specify to transform the many-to-many relationship into associations or associative entities.
The easiest and most transparent way of implementation in MS-Excel is a Decision Model and Notation (DMN, an OMG specification) decision table columns.

DMN Input expressions are characteristics of the EA Relationship, such as the cardinalities or parent relationships, and the configuration setting. The DMN output entry is a Filter column with a formula referring to the input expressions. In the above example, a VB macro filters ER Relationships with the filter value set to “Association” and then copies the subset of EA Relationships into the Associations tool-specific spreadsheet.

Entities spreadsheet rows are a copy of values in the ER Entities spreadsheet.
The filter may exclude entities if specified in the configuration.

Entity Inheritances spreadsheet rows derive from patent entities, as indicated in the ER Entities Parent Entity column.

Entity Inheritance Links rows derive from ER Entities rows with a value in Parent Entity.

The Associations spreadsheet is only populated if the configuration specifies to transform relationships as associations. The default configuration for PowerDesigner CDM specifies associations for many-to-many relationships. The transformation algorithm populates the spreadsheet as a distinct list of EA Relationships with many cardinalities for parent and child entity.

The Association Links (parent) spreadsheet rows derive from EA Relationship Parent Entity for entries in Associations spreadsheet.

The Association Links (child) spreadsheet rows derive from EA Relationship Child Entity for entries in Associations spreadsheet.

The task populates the Associative Entities spreadsheet only populated if the configuration specifies to transform relationships into associative entities. The default configuration specifies associative entities for relationships that have a parent or child relationship.

The algorithm then populates Associative Entity-Relationship (parent), (child) and Associative Entity Inheritance, Associative Entity InheritanceAssociative Entity Inheritance Links.

Data Item spreadsheet rows are a copy of values in the ER Attributes spreadsheet.
In the default configuration the task transforms all EA Attributes into Data Items, but if configured a filter may exclude multi-values attributes. 

The task populates Entity Data Item spreadsheet from values in the ER Entity Attributes spreadsheet.

The task populates the Packages flat, Entities package, and Package hierarchy spreadsheets from ER Modules if set in the configuration settings.

The task populates Annotation spreadsheets from their respective ER Annotation source. The filter is a lookup to check whether configuration lists the annotation or not.

Retrieve Ontology Metadata

Retrieve ontology metadata sub process BPMN diagram
Retrieve ontology metadata sub process

In the first embodiment, the prototype the Ontology Platform gateway defaults to yes, and the Automated Retrieval gateway defaults to no. In other words, the control flow goes to the user activity, Submit Metadata queries. The user opens the SPARQL SELECT from the CRM, runs them against the Source Ontology on the Ontology Platform, and saves the results as CSV files onto the CRM. Further embodiments describe the task Parse OWL files, and scripted task Execute queries.

Load into Data Modeling Tool

CODT Import into DM tool sub process
Import into DM tool subprocess

The first gateway, Tool Ready, is a user determination. The target data model in the data modeling tool must have metamodel extensions for the Annotations. If the tool is not ready, the control flow goes to Extend Tool Metamodel. Once the tool is ready, the flow proceeds to Read configuration and import files. The import objects for Tool specific configuration determines the flow after the Import Modus gateway. In the first embodiment, the modeling tool is PowerDesigner, and the flow is manual Import Metadata.

The developer sets up Excel imports of the tool-specific metadata spreadsheets in PowerDesigner.

PowerDesigner ontology import screenshot
PowerDesigner CODT CDM import (screenshot)

In PowerDesigner the import specifies the mapping from spreadsheet column to extended attribute. The prototype setup loads the metadata spreadsheets individually to facilitate test, validation, and diagnostics.

Once, all individual spreadsheets load as expected; the developer may consolidate spreadsheets for the same object, i.e., Entities and Associative Entities and combine spreadsheets into a single Excel workbook.

Extend Tool Metadata

The input data object, Tool-specific configuration, determines the flow out of the Import modus gateway. In the first embodiment, this is always the manual user activity, Set up extended attributes.

Data Modeling tools enable the modeler to extended properties of model objects. For example, the Entity model object has a standard properties code, name, and comment or definition. The data modeler can add additional properties to describe the model object better. For example, PowerDesigner Extended Attributes or ERWin User Defined Properties.

CODT uses this feature to add ontology documentation, the Annotations to the data model. To prepare the target PowerDesigner model, the developers set up Extended Attributes matching the tool-specific Annotation import file.

PowerDesigner-Entity-extended-attributes for ontology derived models
PowerDesigner-Entity-extended-attributes for ontology derived models

The input object, ‘Metamodel extensions and extended attributes’ is a list of attribute Name, Label, Comment, and Datatype.

Additional and alternative embodiments

Table lists CODT features and additional way of implementation for Ontology Source, Transformation, Target Model.

Ontology Source  Transformation System  Data Model  
TypeSubtypeExtractionOSApplication typeUser InterfaceData Model TypeModeling ToolTool Interface
Ontology platformDevelopment ToolSPARQLWindowsMS-ExcelWhite BoxConceptualPowerDesignerImport
RDF Store, Semantic EndpointLogicalERWinApplication Programming Interface
RDF/OWL filesLocalParserUnixStandaloneGraphicalPhysicalOther
World Wide WebObjectXML Metadata Exchange

 

The Prototype, greyed relies on an Ontology Platform, an ontology Development Tool or RDF Store; the Transformation uses SPARQL for Extraction, it is a White Box implemented in Microsoft Windows and Excel; the Target Model is Conceptual, using PowerDesigner’s Import Tool Interface.

Ontology Source embodiments

The best way of implementation is to populate the Ontology Metadata sets using SPARQL queries on an ontology platform. However, there are use cases where an ontology platform is not available, or the user directly wants to load ontology files locally or from the web.

Commercial and open-source RDF/OWL parsers are widely available. For the Parser embodiment, the Load System Component, Figure 1 includes a parser. The Retrieve Metadata sub-process, Figure 8 enters the Parse OWL files task. The input data objects for the task is the root ontology file. The user selects to ontology file from CRM or the web via its Uniform Resource Identifier (URI).

The task parses triples from the ontology file and its recursive imports and populates the ontology metadata sets.

In the prototype, the Extraction component of Figure 1 is file-based. The component provides SPARQL scripts and receives result sets in CSV files. An alternative embodiment reads connection parameters from the configuration file, establishes a connection to the RDF-store or semantic endpoint, executes the SPARQL queries remotely, and retrieves the result sets programmatically. In this embodiment, the SPARQL Interface of Figure 1 is a client with API. 

Transformation Embodiments

The next step building out the prototype is to extend the User Interface component of Figure 1 into a fully protected graphical Excel application:

  • Protecting cell values and formulas in the metadata spreadsheets from accidental overwrites. The protected interface enables the user to preview and modify transformation rules, change the scope of ontology elements to be transformed, and review the metadata.
  • Changing the text-based configuration spreadsheet into a form with graphical controls.
  • Encoding a configuration wizard to guide the user settings
  • Fully automating the transformation process with Visual Basic for Applications (VBA) and Macros.

A Standalone or embedded application embodiment is an alternative to the MS-Excel implementation. Developers can implement CODT any programming language. Instead of Excel spreadsheets, matrix classes represent the metadata sets, with methods to copy rows and compute items in the matrix. The Excel formulas are pseudo code to guide item method development in the programming language. The application serializes metadata classes on the CRM. The embedded application is an extension of the Ontology Development or Modeling tool, for example:

  • Protégé, Stanford University’s popular open-source ontology development platform integrates extension as plugins. A CODT plugin generates the data models from the loaded ontology
  • PowerDesigner, ERWin, and other data modeling tools import data models in external formats; they have facilities to extend the tool. The data modeling tool embedded embodiment works with the Parser embodiment to open an RDF/OWL file or URI.

The standalone application can be UNIX or any other operating system instead of windows, where both source ontology and target data model reside on the operating system. 

Target Model Embodiments

The three areas of target model embodiments, Data Model Type, Modeling Tool, and Interface can be combined freely.

Data Model Type

CODT can generate other types of models besides the prototype’s Conceptual Data Model.

A Logical Data Model has only two significant differences in the tool-specific metadata sets:

  1. There are no Associations metadata sets. All ER Relationships transform into LDM relationships or associative entities.
  2. There are no Data Item metadata sets. All ER Attributes transform into attributes on the entity.

Physical Data Model (PDM) metadata sets have the same structure as the LDM metadata sets. The LDM-to-PDM transformation is well documented and supported by all modeling tools. PDM metadata spreadsheets define tables instead of entities, columns instead of attributes, and references instead of relationships. There is no LDM equivalent for Views in the PDM. However, Views are the perfect transformation object for defined classes in ontology. Hence, the PDM transformation has metadata sets for views.

Object Model metadata sets have the same structure as the LDM metadata sets. Mappings from logical to object-oriented model (OOM) are well documented; PowerDesigner supports LDM-to-OOM transformation. OOM metadata spreadsheets define classes instead of entities, and associations instead of relationships.

Note that all four model types, CDM, LDM, PDM, and OOM transform from the same ER metadata sets.

Tool Interfaces

Many data modeling tools, including PowerDesigner and ERWin, provide programmatic access via an Application Programming Interface (API) to the application and data model.

The advantages of API embodiments are that the user doesn’t have to switch applications, from CODT to data modeling tool. API embodiments also enable a fully automated transformation without user intervention. 

The System Load Component of Figure 1 includes the API component and wrappers for API function calls.

The method uses the API to add model objects and their properties to the data model. The task Create Model via Tool API of Figure 9, sub-process Import into DM Tool establishes a connection to application and opens the model specified in the configuration settings. The sub-process then enters a loop of task Execute API calls to load record of the metadata sets. The Complete gateway exits the loop.

The tool-specific metadata sets stored on the CRM have the same structure as in the modeling tool import.

XML Metadata Interchange (XMI) is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML). Most data modeling tools support XMI as a generic model file format for exchanging models, rather than importing other vendors native formats.

For XMI support, the System Load component implements functionality to generate XMI files.

In Import into DM Tool sub-process, Figure 9 the Import Modus gateway branches to task Create XMI file, if specified in the configuration settings. Output object of the task is the XMI Model file. The Import XMI task on the Data Modeling tool swimlane then imports the XMI into the data model.

Support for other Modeling Tools

The System Load Component of Figure 1 can create other native data model provided the modeling tool has an import for tabular metadata or an Application Programming Interface (API). This section discusses a load into ERWin, the second most widely used data modeling tool.

ERWin has the functionality to load Excel metadata spreadsheets into the active data model, the Bulk Editor. However, the import can only add properties to existing model objects; it can not create new entities, attributes, and relationships.

The Tool Interfaces section describes the method.

The tool-specific metadata sets stored on the CRM have the same structure as their PowerDesigner counterparts; only their headings refer to ERWin terminology.

Other embodiments

This section explains embodiments beyond the Ontology Source, Transformation, and Data Model categories.

Reversed Process – Data Model to Ontology Transformation

The metadata sets are by design bi-directional. In other words, metadata can also flow in reverse from tool-specific metadata sets to ER metadata set to ontology metadata sets.

The diagram shows the BPMN process for the Reverse, Data Model to Ontology Transformation.

CODT Reversed Transform data model into ontology
Reversed Transform data model into ontology

The process starts with the Create List reports task on the Data Modeling toll swimlane. Most Data Modeling tools, including PowerDesigner and ERWin, have extensive reporting capability to create MS-Excel or CSV reports. The task has the Data Model as the input object. The Export files output objects are identical to the tool-specific metadata spreadsheets. For example, the task for the PowerDesigner data modeler is to create an Entity list report to populating the Entities.xlsx.

The Transform into generic ERM task has the tool-specific metadata set, the Export files as an input object. Copy routines and formulas in the reverse ER metadata set source the mapped tool-specific metadata set. Note that the mapping between tool-specific and ER metadata set is the same for both directions.

The Transform into Ontology metadata has the ER metadata set as an input object. Support for other Modeling Tools

The task first populates the ontology metadata set from generic ER metadata. Again the mapping remains the same, only copy routines and formulas in the ontology metadata set source from ER metadata sets. Name convert from LDM notation into Camel Case. The task creates SPARQL CONSTRUCT scripts from the Ontology metadata sets as an output object.

For example:

CONSTRUCT {

       fibo-be-corp-corp:BoardAgreement  rdf:type owl:Class .

       fibo-be-corp-corp:JointStockCompany  rdf:type owl:Class .

       fibo-be-corp-corp:PrivatelyHeldCompany  rdf:type owl:Class .

}

WHERE {}

The developer can use joins in the SPARQL SELECT templates to determine the CONSTRUCT triplets. The ontology platform may support bulk construct for a faster load of data than CONSTRUCT statements.

The Reverse Transformation can adapt to most variants described for CODT: Different model types, modeling tools, and interfaces for the source data model, implementation in Excel or standalone, full configuration options, various ontology platforms, even generating RDF/OWL files.

Bypass of ER metadata sets

The benefit of the 2-stage transformation from Ontology to Tool-specific metadata sets with generic ER metadata sets in the middle is twofold. Firstly it brakes up the complexity in formulas and algorithms. Secondly, the same ER metadata set transforms into models of various types and modeling tools.

However, developers may prefer a single-stage transformation bypassing the ER metadata sets for some use cases. For example, a CODT extension integrated into a data modeling tool that imports an ontology into a logical data model. To implement the bypass, the developers can consolidate formulas and copy routines.

Metamodel extensions for lineage and semantics

CODT already extends the tool-specific metamodel with object properties for annotations as described in the Extend Tool Metamodel sub-process.

The metamodel extensions for lineage and semantics provide a traceability link to the source ontology resource and capture semantics that is beyond the data model expressivity.

Screenshot Figure 14 shows an example of the Lineage tab for Entities.

PowerDesigner Lineage tab for ontology derived data models
PowerDesigner Lineage tab for ontology derived data models

Resource Name, Local Name, Prefix, Resource Type, and URI provide the lineage from the data model to source ontology.

The Equivalent property lists equivalent classes and expressions. For the data modeler, may consolidate equivalent entities. Equivalent expressions indicate defined classes that may become Views in the physical data model.

Simple Restrictions validate the relationship cardinality. Complex OWL restrictions are beyond the capabilities of the data model. However, they provide semantic documentation for Database and Application developers encoding data integrity.

Source Ontology Analytics

The Analytics component provides statistical information about the source ontology. The User Interface component launches SPARQL queries and prepares analytical reports. The Analytics component accesses the ontology through the Extraction component. Reported values can pre-populate configuration values.

For example, an analytical report profiles the number of instances for class annotations properties. The user may choose the widely used annotations for the configuration.

The embodiment automates the manual user sub-process.