ASIA (the name stands for Assisted Semantic Interpretation and Annotation of tables) is a table-annotation tool primarily designed to use semantic approaches for the enrichment of tabular data with new information, possibly coming from third-party data sources.
Joining tabular data that does not use the same record ids or some other identifying values is not straightforward, it requires to create associations from the table elements to a shared system of identifiers. ASIA aims to help users in creating these links, by means of specially designed semantic reconciliation algorithms and a neat graphical interface. In this context, the linking can be performed at two different levels:
- Schema-level linking: linking table schema values (i.e. the header of a table) to shared vocabularies and ontologies. User-defined concept are supported as well, though. The result of this activity is an RDF mapping suitable to be used to transform the tabular input data into a semantic format;
- Instance-level linking: linking data values (that is the content of the cells) to shared systems of identifiers. The result of this phase is a table with a new column with the ids of the reconciled values. Those ids can be then used to retrieve from knowledge bases that use them the information we deem necessary to create a richer data set from the analytics standpoint.
Schema-level and instance-level links are created by ASIA as annotations for the table. Users can create schema-level annotation through the ASIA interface, by validating suggestions about classes and properties to be used. A multi-language suggestion service exploiting ABSTAT functionalities is used to implement this functionality. However, if the user specifies a different class (or property), ASIA is able to suggest classes (or properties) that match syntactically the user’s input (autocomplete functionality). The instance-level annotations, instead, are expected to be created by ASIA automatically, as the large size of the data set often impedes to apply the annotation to the values singularly; nonetheless, ASIA features a specially designed tab that allows the user to interact with the candidate entities provided by the tool. The user can therefore validate, discard and change the threshold if needed.
Table annotations underpin two different data quality-related functionalities of ASIA:
- Generation of knowledge graphs from a tabular data set: the schema-level annotations are transformed into Grafterizer data transformations to publish tabular data as a knowledge graph; data values will be used to create new instances and populate the graph.
- Enrichment of tabular data with third-party data: instance-level annotations are used to facilitate the enrichment of user data with data retrieved from these reference knowledge graphs, often referred to as ‘core data’ in EW-Shopp lingo. Examples of those data are: GeoNames, Google GeoTargets, DBpedia, GfK products, etc.
The ASIA interface is developed as a component of Grafterizer 2.0 as shown below.
To support schema-level and instance-level annotation, ASIA interoperates with vocabulary suggestion services (e.g., ABSTAT full-text search, Linked Open Vocabulary), reconciliation services (e.g., Wikifier or GeoNames), data extension services (as the Weather extension service) and sameas services (e.g., the GeoNames2LAU service).
Below a list of high-level functionalities is reported:
- User-in-the-loop instance-level semantic annotation of tables: the user is actively involved in the creation of instance-level annotations interacting with system, that is selecting a suitable reconciliation service, validating the entity candidates, tweaking the accuracy threshold;
- Instance-level reconciliation services: the reconciliation services support users in linking values in a table to identifiers of known knowledge graphs. In this category fall those services that reconcile the values of a column against a system of identifiers (reconciliation services), those that uses the reconciled ids to guide the user in the extraction of new pieces of information from a third-party data source (extension services), and sameas services, that are utilities that map different systems of identifiers to each other to make possible to reconcile once and extend from several sources;
- RDF creation: based on the instance-level semantic annotations, RDF mappings are generated to achieve linked knowledge graphs when transformations are executed.
- Interactive schema-level semantic annotation of tables: the user interface supports the creation of schema-level semantic annotation and is integrated into the Grafterizer tool to make the semantic annotation and RDF transformation processes integral to the data transformation steps;
- Schema-level vocabulary suggestions: ASIA incorporates schema-level suggestions provided via API using ABSTAT, a knowledge graph profiling tool. It can be configured to use LOV and other terminology recommendation services like the ones;