This page provides an overview of the most relevant concepts and terms in our study.
There are two major sources of concepts and terminology for our study:
- mobilityDCAT-AP for describing datasets and distributions in the mobility domain
- DATEX II, which deals with the content model, its profiling, and schemas
Tourist information office as analogy to National Access Point
Following image uses analogy of Tourist Information Office and NAP, depicts some core concepts and terms and shows systems and other deliverables (at the bottom half of the schema).
Core terms and definitions
- data portal
-
A web-based system that contains a catalog with descriptions of datasets and their distributions.
- National Access Point
-
A data portal facilitating access to transport-related data according to the EC ITS Directive.
- NAP
-
National Access Point
- (data) catalog
-
A collection of catalog records on a data portal.
- catalog record
-
A description of a dataset and its distributions published on the portal.
- dataset
-
A collection of data, published by a single publisher; sharing some common metadata (e.g., theme, rights holder, space); and provided via one or more distributions.
- (data) distribution
-
A dataset served using a single format and protocol.
- rights holder
-
An entity holding rights to the dataset.
- (data) publisher
-
An entity responsible for publishing the dataset distributions (making the dataset distributions technically accessible).
- content model
-
A class model of possible data distribution content describing its logical structure, while abstracting specific serialization format.
- (content model) profile
-
A simplified class model created from the complete content model by removing structures that are not needed, and possibly enforcing the presence of some attributes and structures.
- schema
-
A description of structural constraints for data distribution content using a specific format (e.g., XML, JSON, etc.).
- format
-
A specification that allows for the interpretation of serialized dataset distribution content. It should include a schema and may include samples and other accompanying documentation.
- sample
-
An instance of distribution content.
- protocol
-
A specification of rules and procedures allowing a consumer to get a sample of distribution content.
Terms from NAP context we try to avoid
Our goal is to present things an a way, which is easy to understand. In some cases we found concepts, which have alternative names and we did our best to pick the one, which we feel has better chance to be well understood.
Our preference to use or avoid some terms is scoped to this study. It is perfectly correct to use any of the “better avoid” terms in other scopes.
Here are some examples why we prefer one term over another:
- the term sounds too generic. E.g. “metadata entry” is hard to imagine what the entry is about. We prefer “catalogue record” as it sounds more specific.
- the same term has different meaning in similar domains. E.g. “publication” in mobilityDCAT-AP is different concept from DATEX II Publication.
- having alternative name for the same concept in similar domain. E.g. “Platform Specific Model (PIM)” in DATEX II domain is expressed in form of a (XML/JSON) schema. We prefer (where appropriate) to use a term, which is used across multiple mobility domains.
- Metadata registry
-
Better to use Catalog
- Metadata entry
-
Better to use Catalog Record
- Publication
-
Better to use Dataset. Note: Do not confuse this with a DATEX II Publication.
Terms from DATEX II context we try to avoid:
- Platform Independent Model
-
So-called PIM — see Content PIM.
- Platform Specific Model
-
So-called PSM — see Content PSM.
- Content PIM
-
Better to use content model or (content model) profile.
- Content PSM
-
Better to use schema.
- Publication
-
Better to use DATEX II Publication.
Terms in Context
National Access Point: Data Portal
The National Access Point (NAP) must always fulfill the role of a Data Portal, serving a Catalog that describes available datasets and their distributions.
The NAP may also directly or indirectly serve the content of distributions, but our study does not focus on this architectural aspect. We rely on the declared (data) publisher to take care of this responsibility.
The NAP typically takes the form of a publicly accessible web application that allows human users to search for relevant information.
The NAP may also provide an interface (e.g., SPARQL endpoint) for querying catalog records via an API programmatically.
Roles: Rights Holder and Data Publisher
The (data) publisher is responsible for making the distributions technically accessible. This is a key focus of our study as it affects many quality aspects of data provisioning.
The rights holder holds the rights to the data. Our study does not focus on this legal aspect. We rely on the catalog record to provide relevant legal information and on the user to interpret it accordingly.
A Catalog Record Publisher is responsible for publishing the actual Catalog Record. While this role is important, our study does not attempt to address the entity playing this role.
Data: Datasets and Distributions
A dataset is a rather abstract concept, as it may only be consumed via one or more distributions.
Our study focuses on distributions, as these are the only tangible items allowing for actual testing.
Format: Content Model, Profiles, and Schema
DATEX II provides a framework for modeling provided content on two levels:
- An abstract content class model, which is agnostic to the final serialization format
- A format-specific model in the form of a schema (mostly a W3C XML Schema) expressing how the content can be serialized and interpreted in a given format.
DATEX II also includes relevant class models (e.g., PublicationSituation, usable for publishing safety-related traffic information). These class models are rich in terms of the provided data structures and should be simplified before actual use. The process of simplification is called profiling and may also include strengthening some rules, such as making some optional data structures obligatory.
The publisher is expected to profile the class model and provide the resulting (XML) schema, as it serves as a contract about the used data structures between the publisher and the consumer.
When a schema is missing (which is considered a major failure of the publisher), one can derive a so-called “implicit schema” from the DATEX II content class model. Such a schema allows for validation and use in programming tools, but it adds extra complexity to the schema.
Embrace the Key Concepts!