This dataset is now obsolete. Check out the new iteration of the Bacteria Biotope in BioNLP Open Shared Tasks 2019. The goal of the supporting resources for the BioNLP Shared Task 2016 is
to provide the task participants with annotations from state-of-the-art
automated tools in order to minimize the time-investment necessary to
participate in the shared task and to allow for participants to
experiment on how to leverage automated analyses provided by existing
Natural Language Processing systems.
Available resources are listed below. Each resource is a file to be downloaded, it is the result of a tool (presented tools) on the train, dev and test datasets. Please note that Shared Task organizers are not
responsible for the data quality, the resources are presented as provided by the tools. If you have any questions regarding the resources, please contact: maiage-bibliome at jouy.inra.fr
Resources and formatPOS TaggingGENIA Tagger is a tool for part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. If you make use of the tagging from GENIA tagger, please cite: Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. I. (2005). Developing a robust part-of-speech tagger for biomedical text. Advances in informatics, 382-392.
ParsingStanford Parser is a widely used statistical parser. If you make use of the parses from the Stanford Parser, please cite: Klein,
D. and Manning, C. (2002). Fast Exact Inference with a Factored Model
for Natural Language Parsing. In Advances in Neural Information
Processing Systems.
The Enju parser is a robust syntactic parser for English, based on a probabilistic HPSG grammar. If you make use of the Enju parses, please cite: Miyao, Y. and Tsujii, J. (2008). Feature forest models for probabilistic HPSG parsing. Computational Linguistics.
The C&C CCG Parser is a dependency parser. If you make use of the CCG parses, please cite: Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493-552.
BioYaTeA is an extended version of the YaTeA (Aubin and Hamon, 2006) term extractor adapted to the biomedical domain. If you make use of the BioYaTeA resources, please cite: Golik, W., Bossy, R.,
Ratkovic, Z., & Nédellec, C. (2013). Improving term extraction with
linguistic analysis in the biomedical domain. In Proceedings of the
14th International Conference on Intelligent Text Processing and
Computational Linguistics (CICLing13), Special Issue of the journal
Research in Computing Science (pp. 24-30).
Stanford NER is a named entity recognition tool for person, organization and location entities. If you make use of the Stanford NER annotations, please cite: Finkel, J. R., Grenager,
T., & Manning, C. (2005, June). Incorporating non-local information
into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 363-370). Association for Computational Linguistics.
LINNAEUS is a software for species name recognition and normalization. If you make use of the LINNAEUS annotations, please cite: Gerner, M., Nenadic, G., & Bergman, C. M. (2010). LINNAEUS: a species name identification system for biomedical literature. BMC bioinformatics, 11(1), 1.
SR4GN is a software that provides species recognition for gene normalization. If you make use of the SR4GN annotations, please cite: Wei, C. H., Kao, H. Y., & Lu, Z. (2012). SR4GN: a species recognition software tool for gene normalization. PloS one, 7(6), e38460.
Species recognition and normalization annotations performed using an in-house dictionary-based approach (developed at INRA by the MaIAGE-Bibliome team) are also provided.
Sentence splitting and tokenization were performed using in-house tools developed at INRA by the MaIAGE-Bibliome team as part of the AlvisNLP suite.
|