Supporting Resources - backup
The goal of the supporting resources for the BioNLP Shared Task 2016 is to provide the task participants with annotations from state-of-the-art automated tools in order to minimise the time-investment necessary to participate in the shared task and to allow for participants to experiment on how to leverage automated analyses provided by existing Natural Language Processing systems. [responsibility...]
Resources and Formats
In the following, are listed resources that are available for the shared task. Each resource is a file to be downloaded, it is the result of a tool (tools are presented) on the train and dev datasets.
POS Tagging
Genia Tagger is tool for part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text.
genia-tagger_train+dev_resources.zip: the resources produced by Genia tagger on the train and dev datasets
Parsing
Stanford Parser is a statistical parser...
stanford-parser_train+dev_resources.zip : the resources produced by Stanford Parser on the train and dev datasets.
Enju Parser parses sentences with the ENJU dependency parser.
enju-parser_train+dev_resources.zip : the resources produced by Enju Parser on the train and dev datasets
CCG Parser provides syntax parsing with CCG Parser.
ccg-parser_train+dev_resources.zip : the resources produced by CCG parser on the train and dev datasets.
Term Extraction
BioYatea extracts terms from the corpus using the YaTeA term extractor...
bioyatea_train+dev_resources.zip : the resources produced by BioYatea on the train and dev datasets
Named Entity Recognition
Stanford NER [synopsis...]
stanfordner_train+dev_resources.zip : the resources produced by Stanford NER on the train and dev datasets
LINNAEUS is a software for species name recognition and normalization...
linnaeus_train+dev_resources.zip : the resources produced by LINNAEUS on the train and dev datasets
OrganismTagger is a hybrid rule-based/machine-learning system that extracts organism mentions from the biomedical literature, normalizes them to their scientific name, and provides grounding to the NCBI Taxonomy database...
organismtagger_train+dev_resources.zip : the resources produced by Organism tagger on the train and dev datasets.
SR4GN is a software that provides a species recognition for gene normalization...
sr4gn_train+dev_resources.zip : the resources produced by SG4GN on the train and dev datasets
[ ! moved from bb3_supporting-resources
SPECIES identifies taxonomic mentions in documents and maps them to corresponding NCBI Taxonomy entries. If you make use of the SPECIES annotations, please cite: Pafilis, E., Frankild, S.P., Fanini, L., Faulwetter, S., Pavloudi, C., Vasileiadou, A., Arvanitidis, C. and Jensen, L.J. (2013). The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text. PLoS One, 8(6), p.e65390.
SPECIES_train+dev_resources.zip : the resources produced by SPECIES on the train and dev datasets
]
Sentence Splitting & Tokenization
Segmentation is an internal alvisnlp plan that generates...
segmentation_train+dev_resources.zip : the resources produced by segmentation on the train and dev datasets
Data Visualization
Brat is a tool for visualization of annotations...
brat_train+dev_resources.zip : the resources produced by Brat on the train and dev datasets.