[CfP] CL-SciSumm Shared Task 2018 @SIGIR’2018: The Scientific Summarization Shared Task

CL-SciSumm 2018


Artificial Intelligence



=== Call for Participation ===
The 4th Computational Linguistics Scientific Summarization Shared Task,
CL-SciSumm-18 @ SIGIR 2018: http://wing.comp.nus.edu.sg/~cl-scisumm2018/
You are invited to participate in the CL-SciSumm 2018 Shared Task,
as part of the 3rd Joint Workshop of Bibliometric-enhanced IR and NLP
for Digital Libraries (BIRNDL) at SIGIR 2018 on Thursday, July 12, 2018.
The 4th CL-SciSumm Shared Task on scientific paper summarization
follows up on and extends the corpus sizes of the successful Shared Tasks
conducted as a part of the BIRNDL workshops in 2017, 2016 and
the Pilot Task conducted as a part of the BiomedSumm Track at the
Text Analysis Conference 2014. In the CL-SciSumm 2017 Shared Task,
fifteen teams from six countries signed up, and ten teams ultimately submitted
and presented their results. The task is run on CL-Scisumm corpus, the largest
annotated corpus for scientific summarization, comprising over 500
computational linguistics (CL) research papers, interlinked through a citation
network. The corpus is available for free download and use
at https://github.com/WING-NUS/scisumm-corpus. The repository also
archives all results from all prior runs of the Shared Task.
The Shared Task comprises three sub-tasks in automatic research
paper summarization on a new corpus of research papers. This task
is expected to be of interest to a broad community including those
working in CL and NLP, especially in the sub-disciplines of text
summarization, natural language generation, text reuse, discourse
structure in scholarly discourse, paraphrase, textual entailment
and text simplification.
=== The Task ===
Given: A topic consisting of a Reference Paper (RP) and ten or more
Citing Papers (CPs) that all contain citations to the RP. In each CP,
the text spans (i.e., citances) have been identified that pertain to a
particular citation to the RP.
Task 1a: For each citance, identify the spans of text (cited text
spans) in the RP that most accurately reflect the citance. These are
of the granularity of a sentence fragment, a full sentence, or several
consecutive sentences (no more than 5).
Task 1b: For each cited text span, identify what facet of the paper it
belongs to, from a predefined set of facets.
Evaluation: Task 1 will be scored by overlap of text spans in the
system output vs the gold standard created by human annotators
Task 2: (optional bonus task): Finally, generate a structured summary
of the RP from the cited text spans of the RP. The length of the summary
should not exceed 250 words.
Evaluation: Task 2 will be scored using the ROUGE evaluation metric
to compare automatic summaries against paper abstracts, human
written summaries and community summaries constructed using
the output of Task 1a.
=== Important Dates ===
March 19: Training set posted
April 8: Deadline for expression of interest and short system
descriptions due
May 1: Test set posted
May 20: System runs from the test set due
May 27: System reports (paper) due
June 25: Camera ready contributions due
July 12, 2018: Participants present at the BIRNDL 2018 workshop in Ann Arbor, MI, USA
=== The Corpus ===
The CL-SciSumm corpus is created by randomly sampling documents from
the ACL Anthology corpus and selecting their citing papers. Citing paper may
Include papers from outside the Anthology. For
CL-SciSumm 2018, we have selected three portions of this source
collection to be annotated and serve as training, development and test
collections. The training set of articles is available for download
at GitHub (https://github.com/WING-NUS/scisumm-corpus) and can be used
by participants to pilot their systems. Watch for updates to the
GitHub repository, as we will be updating the repository with announcements
and new files. The system outputs from the test set should be submitted to
the task organizers, for the collation of the final results to be presented at
the workshop.
=== Registration ===
Organizations wishing to participate in the CL Shared Task track at
BIRNDL 2018 are invited to register on EasyChair:
(https://easychair.org/conferences/?conf=birndl2018) by April 8th with
a tentative abstract. Please prefix “CLSciSumm Shared Task: ” to the
title of your submission. Participants are advised to register as soon as
possible in order to receive timely access to evaluation resources,
including training development and testing data. Registration for the
task does not commit you to participation - but is helpful to know for
planning. All participants who submit system runs are welcome to
present their systems as posters/selected presentations at the
BIRNDL 2018 Workshop at Ann Arbor, MI, USA.
Dissemination of CL-SciSumm work and results other than in the
workshop proceedings is welcomed, but the conditions of participation
specifically preclude any advertising claims based on these results.
Any questions about conference participation may be sent to the
organizers mentioned below.
=== Organising Committee ===
- Kokil Jaidka, (http://kokiljaidka.wordpress.com/)
- Muthu Kumar Chandrasekaran, (http://wing.comp.nus.edu.sg/~cmkumar/)
- Michihiro Yasunaga (https://www.linkedin.com/in/michihiro-yasunaga-616762136)
- Dragomir Radev (https://cpsc.yale.edu/people/dragomir-radev)
- Min-Yen Kan (https://www.comp.nus.edu.sg/~kanmy/)
Thanks!
Muthu, Kokil, Michi, Drago, Min
Muthu Kumar Chandrasekaran
Ph.D. Candidate | Web Information Retrieval / Natural Language Processing Group (WING)
School of Computing | National University of Singapore (NUS)
wing.comp.nus.edu.sg/~cmkumar