RIASSUNTO
Outbreaks of plant pests and pathogens have the potential to significantly harm the Canadian economy, damage the environment, detrimentally affect the health of citizens, and threaten national food security. Loss of trees caused by the Emerald Ash Borer pest had a significant public health impact due to an increase in mortality related to cardiovascular and respiratory-tract illness [2]. Outbreaks of UG99 wheat rust strain can cause up to 100% crop losses of wheat and is viewed as a potential worldwide disaster with massive implications for global food security [6]. In Canada, an outbreak of Potato Wart disease in 2000 resulted in $280 million in costs to the provincial economy of Prince Edward Island [1]. The Canadian Food Inspection Agency (CFIA) has completed decades of risk analyses for various crop commodities and specific pests/pathogens. These analyses characterize the risk of exotic pests to Canada and support regulatory decision-making and response actions. This information is currently stored and maintained in a myriad of disconnected databases, spreadsheets, and technical documents under the control of the Plant Health Risk Assessment Unit. This private corpus represents a unique source of invaluable analyses tailored to the Canadian context, which are not found anywhere else in the world. Additional risk information for other countries is drawn from the European and Mediterranean Plant Protection Organization (EPPO)1 global database of country regulated pests and the U.S. Department of AgricultureâĂŹs (USDA) Animal and Plant Health Inspection Service (APHIS) regulated plant pest list2. The cost of next generation sequencing (NGS) of DNA has decreased exponentially[10]. As a consequence, academic institutions are aggressively staffing positions in environmental metagenomics, and while these researchers may not be specifically looking for plant pests or pathogens, or even human pathogens, large volumes of data with a wide range of species names will be automatically generated and be made publicly available and searchable on the web. The generation of large amount of DNA sequence data from Canadian samples creates exciting opportunities. At the same time, there is an inherent risk to trade should a quarantine organism in a Canadian sample be detected that is not currently known to exist in Canada. Interpretation or misinterpretation of species identification from such studies at their face value could potentially trigger trade issues. The study seeks to evaluate the benefits and means to actively link quantitative Next-Generation Sequencing (NGS) sample data with qualitative risk assessment information. This will guide policy and operational decision-making for rapid detection, surveillance and response against plant health threats and to set a new scientific standard for knowledge-based international trade actions for the agriculture, forestry and fishing industries. One of the goals of this study is to evaluate the use of natural language processing and machine learning techniques to extract pest risk information from the CFIA risk corpus. This study is limited to fungi that are pathogens, partially to limit its initial scope. The fungi are taxonomic group that arguably have some of the most complex taxonomic naming problems due to (among other things) the anamorph-teleomorph-holomorph[8] issue3. Collecting the fungal names synonyms is needed to canonicalize the fungal names found both in the risks sources and the NGS identification results. Previous work on risk has not looked at risk such as in this study which examines the risk of pathogenic species on the trade and agriculture. This previous work has looked at the broader topic of event extraction for decision support[3], corporate risk[4]. More recent work has examined risk in a more granular fashion, such as supplier selection[7] and more specifically, assessing the likelihood (risk) of child labour by suppliers[9], . The poster will provide early results of this stage of the project, primarily the risk extraction from the CFIA (Canadian) corpus, the risk and organism names for other countries from the EPPO and the USDA, and the collection and rationalization of the fungal synonym naming from Mycobank and Index Fungorum.