SCORE Project: CSRepo

Title: Metadata Repository for Computing Conferences

Sponsors: Tayana Conte and Altigran Soares da Silva

DBLP Computer Science Bibliography service provides open bibliographic information on major computer science journals and proceedings. The DBLP Computer Science Bibliography lists more than 5000 conference and workshop series, and more than 1500 journals in Computer Science [1].

DBLP provides a great service to the academic computing community. However, there is a lot of manual work to transfer all bibliographic information to DBLP.

We want to automatize the bibliographic information collection, creating an open repository with metadata for major computer science conference proceedings.

Project Description

For this project, each team is expected to develop wrappers whose role is to collect metadata on papers published in conference proceedings from some editor, e.g., IEEE, ACM, Springer, Elsevier, ACL, and so on. The wrappers will compose a library of wrappers that will populate a shared repository. The final framework should have three layers: (i) Data Input, (ii) Data Quality Management, and (iii) Data Publishing.

For Data Input, the sponsors will provide a list of conference identifiers based on internationally agreed standards. The wrapper should monitor one editor, seeking for new conference proceedings published during a period of time (e.g. a month). For each new published conference proceedings, the wrapper should get the metadata for all papers. The wrapper should output a set of metadata for each paper, including its title, the list of authors, the publication year, the proceedings title, the volume and the number of pages. In addition, each metadata record must have the unique identifier of the conference (the same unique identifier of the list provided by the sponsors).

The Data Quality Management layer should be responsible for checking the data and possible de-duplications. Finally, the Data publishing layer should provide web services, such that clients can make calls to obtain the metadata from recently published proceedings.

Project Scope

Your project needs to support all the features described above, but is not limited to them.

Enhancements and extensions are possible, including the development of a plugin for EasyChair. This plugin should enable the direct exportation of all metadata from the proceedings of a conference for the CS repository.

Process Requirements

Teams are free to choose the development process; however, this choice should be motivated and discussed in the project report.

Project artifacts should be hosted on a public repository (such as Github), to track the project activity (including bug reports and bug fixes). The project should be suitable for adoption and extension by other developers by downloading or forking from the repository (which implies adequate developer documentation and lack of gratuitous dependence on the local development environment).

Environmental Constraints

The choice of platform is up to you. The result must be a Web Service that allows clients to remotely call functions to obtain the meta-data from recently published proceedings from a specific publisher.

Level of Sponsor Involvement

The sponsors will provide a list of conference identifiers based on internationally agreed standards. The sponsors will attempt to answer questions within a week throughout the contest period. Selected questions deemed important by the sponsors will be gathered and answered in a FAQ at the bottom of this page.

Sponsor contact: tayana (at) icomp (dot) ufam (dot) edu (dot) br, alti (at) icomp (dot) ufam (dot) edu (dot) br

