PROPOSED Entity Reconciliation Working Group Charter

The mission of the Entity Reconciliation Working Group is to develop a protocol for entity linking on the Web.

Join the Entity Reconciliation Working Group.

This proposed charter is available on GitHub. Feel free to raise issues.

Charter Status See the group status page and detailed change history.
Start date [dd monthname yyyy] (date of the "Call for Participation", when the charter is approved)
End date [dd monthname yyyy] (Start date + 2 years)
Chairs Fabian Steeg (Hochschulbibliothekszentrum NRW)
Antonin Delpeuch (Invited Expert)
Team Contacts [team contact name] (0.1 FTE)
Meeting Schedule Teleconferences: teleconferences will focus on discussion of specifications and implementations, and will be conducted on a monthly basis.
Face-to-face: we will meet during the W3C's annual Technical Plenary week; additional face-to-face meetings may be scheduled by consent of the participants, usually no more than 3 per year.

Motivation and Background

Because the Web is decentralized, any service can publish a new URI for an entity or concept that already exists elsewhere. As a result, entities have multiple identifiers and varying attributes or descriptions across the web. Linked Open Data seeks to enable the integration of data from different publishers. To accomplish this for a specific entity, it is necessary to determine which URIs refer to the same entity across different services by comparing attributes, investigating discrepancies, and updating records accordingly. Providing an API to support this reconciliation process — which includes matching, previewing, suggesting, and extending — is the primary motivation behind this working group.

One can determine if two database records refer to the same entity by comparing their attributes. For instance, two entries about cities bearing the same name, in the same country and with the same mayor are likely to refer to the same city. The reconciliation API that we want to standardize makes it easier to discover such matches. It is a protocol that a data provider can implement, enabling its consumers to efficiently match their own data to the entities represented by the provider.

The reconciliation API was originally designed by Metaweb as a protocol for a use case in Gridworks (now OpenRefine) where entities from a local dataset are matched to Freebase records during import. It was later generalized so any database could implement it, allowing data providers such as Crossref, Nomisma, Getty, Wikidata, and VIAF to be added as a Standard Service in OpenRefine. The API, originally documented on OpenRefine’s wiki also supports additional optional features like preview for entities, auto-suggest for various inputs, and, since 2018, data extension for pulling data from reconciled records in the target database.

In 2019 the W3C Entity Reconciliation Community Group formed to promote and refine the specification beyond OpenRefine. It published two versions of the specification as CG reports.

Scope

The group aims to write specifications for a Web-based protocol for record linkage. This protocol is largely based on the one popularized by the OpenRefine application, which enables reconciliation of locally-held data to the identifiers curated by an authority database, exposed by a web server.

The Working Group therefore welcomes work in the following forms:

  • Designing the specifications of the reconciliation API;
  • Documenting how other existing protocols can be used for reconciliation on the web;
  • Writing test suites, validators, libraries and other tools for such protocols;
  • Surveying the existing workflows and available implementations;
  • Promoting any suitable protocol around reconciliation to data providers.
Reconciliation generally refers to bipartite matching, where correspondances between two distinct data sources are inferred. It is also in the scope of this group to explore broader notions of entity matching, such as deduplication of records from a single data source.

Out of Scope

The following features are out of scope, and will not be addressed by this Working group.

  • Implementation of the protocol in specific clients, such as OpenRefine, although clients should of course be kept in mind when designing any protocol;
  • Implementation of the protocol in specific data providers;
  • Creation of datasets for benchmarks of heuristics (this is done by OAEI already);
  • Hosting of reconciliation services.

Deliverables

Updated document status is available on the group publication status page.

Draft state indicates the state of the deliverable at the time of the charter approval. Expected completion indicates when the deliverable is projected to become a Recommendation, or otherwise reach a stable state.

Normative Specifications

The Working Group will deliver the following W3C normative specifications:

Reconciliation protocol

This specification describes mechanisms for web-based record linkage that can be used to reconcile locally-held data to the identifiers curated by an authority database.

Draft state: Draft Community Group Report

Expected completion: [Q1–4 yyyy]

Initial Draft: Reconciliation Service API (Draft Community Report, latest editor's draft)

Other Deliverables

Other non-normative documents may be created such as:

  • Up-to-date test bench for all normative deliverables
  • An up-to-date census of the reconciliation ecosystem

Timeline

Put here a timeline view of all deliverables.

  • Month YYYY: First teleconference
  • Month YYYY: First face-to-face meeting
  • Month YYYY: Requirements and Use Cases for FooML
  • Month YYYY: FPWD for FooML
  • Month YYYY: Requirements and Use Cases for BarML
  • Month YYYY: FPWD FooML Primer

Success Criteria

In order to advance to Proposed Recommendation, each normative specification is expected to have at least two independent implementations of every feature defined in the specification.

Mention community adoption: services, libraries, publications (link to census?)

Each specification should contain sections detailing all known security and privacy implications for implementers, Web authors, and end users.

There should be testing plans for each specification, starting from the earliest drafts.

For specifications of technologies that directly impact user experience, such as content technologies, as well as protocols and APIs which impact content: Each specification should contain a section on accessibility that describes the benefits and impacts, including ways specification features can be used to address them, and recommendations for maximising accessibility in implementations.

Consider adopting a healthy testing policy, such as: To promote interoperability, all changes made to specifications should have tests.

Coordination

For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, performance, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD. The Working Group is encouraged to engage collaboratively with the horizontal review groups throughout development of each specification. The Working Group is advised to seek a review at least 3 months before first entering CR and is encouraged to proactively notify the horizontal review groups when major changes occur in a specification following a review.

Additional technical coordination with the following Groups will be made, per the W3C Process Document:

Add: any dependencies by groups within or outside of W3C on the deliverables of this group (people/services using the protocol?)

In addition to the above catch-all reference to horizontal review which includes accessibility review, please check with chairs and staff contacts of the Accessible Platform Architectures Working Group to determine if an additional liaison statement with more specific information about concrete review issues is needed in the list below.

W3C Groups

[other name] Working Group
[specific nature of liaison]

Note: Do not list horizontal groups here, only specific WGs relevant to your work.

Note: Do not bury normative text inside the liaison section. Instead, put it in the scope section.

External Organizations

[other name] Working Group
[specific nature of liaison]

Participation

To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from the key implementors of this specification, and active Editors and Test Leads for each specification. The Chairs, specification Editors, and Test Leads are expected to contribute half of a working day per week towards the Working Group. There is no minimum requirement for other Participants. Mention: The expected time commitment and level of involvement by the Team, see https://www.w3.org/2021/Process-20211102/#WGCharter

The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.

The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.

Participants in the group are required (by the W3C Process) to follow the W3C Code of Ethics and Professional Conduct.

Communication

Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed in public repositories and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.

Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the Entity Reconciliation Working Group home page.

Most Entity Reconciliation Working Group teleconferences will focus on discussion of specifications and implementations, and will be conducted on a monthly basis.

This group primarily conducts its technical work pick one, or both, as appropriate: on the public mailing list public-[email-list]@w3.org (archive) or on GitHub issues. The public is invited to review, discuss and contribute to this work.

The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.

Decision Policy

This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 5.2.1, Consensus). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.

However, if a decision is necessary for timely progress and consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote and record a decision along with any objections.

To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email, GitHub issue or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.

All decisions made by the group should be considered resolved unless and until new information becomes available or unless reopened at the discretion of the Chairs or the Director.

This charter is written in accordance with the W3C Process Document (Section 5.2.3, Deciding by Vote) and includes no voting procedures beyond what the Process Document requires.

Patent Policy

This Working Group operates under the W3C Patent Policy (Version of 15 September 2020). To promote the widest adoption of Web standards, W3C seeks to issue Web specifications that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the licensing information.

Licensing

This Working Group will use the W3C Software and Document license for all its deliverables.

About this Charter

This charter has been created according to section 3.4 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.

Charter History

Note:Display this table and update it when appropriate. Requirements for charter extension history are documented in the Charter Guidebook (section 4).

The following table lists details of all changes from the initial charter, per the W3C Process Document (section 4.3, Advisory Committee Review of a Charter):

Charter Period Start Date End Date Changes
Initial Charter [dd monthname yyyy] [dd monthname yyyy] none
Charter Extension [dd monthname yyyy] [dd monthname yyyy] none
Rechartered [dd monthname yyyy] [dd monthname yyyy]

[description of change to charter, with link to new deliverable item in charter] Note: use the class new for all new deliverables, for ease of recognition.

Change log

Changes to this document are documented in this section.