Resources

Some tools and resources that I'm involved with in one way or another.

BioNLP shared Task resources

The BioNLP Shared Task is a series of information extraction challenges organized in collaboration with various groups working on biomedical text mining. Since the first event organized in 2009, many manually annotated corpora and related resources have been introduced in the series. The BioNLP Shared Task resources and evaluation services remain available for assessing the performance of new methods at the tasks.

I've participated in the BioNLP Shared Task organization, in particular the following tasks (resources available from linked pages):

For the 2011 task we also created a large selection of supporting analyses using various NLP tools.

GENIA resources

The GENIA project of Tsujii lab (University of Tokyo) introduced many widely used tools and resources during its long run from 1998-2012. These resources include the GENIA corpus, manually marked with various layers of linguistic and semantic annotation, and remain available from the GENIA project home page, now hosted by NaCTeM. I was a member of the GENIA project from 2008 and GENIA coordinator from 2010.


Brat Rapid Annotation Tool


brat is a user-friendly web-based tool for text annotation, developed as an open-source project in collaboration between the Aizawa lab of the University of Tokyo, NaCTeM, and (originally) Tsujii lab of the University of Tokyo. I'm participating in brat development as project lead doing occasional programming.

Some others in brief

  • Anatomical Entity Mention (AnEM) corpus: random sample of biomedical publication abstracts and full-text extracts manually annotated for mentions of anatomical entities.
  • BioInfer: manually annotated corpus for biomedical information extraction. Annotation layers cover dependency syntax, entities, relations and events.
  • "Five PPI corpora" collection of binary relation extraction corpora focusing on protein-protein interactions, converted into a unified representation.
  • All-paths graph kernel: implementation of kernel over weighted graphs that implicitly considers all paths in similarity calculations, with application to binary relation extraction.

(more to come, this is still partly under construction.)