Annotated datasets from PADI-web for event-based surveillance of Avian Influenza, African Swine Fever, and West-Nile Virus Disease


These datasets concern unstructured data (articles) from news items detected by an event-based surveillance system; PADI-Web, between 2022 and 2023. Collected articles were manually annotated by relevance for epidemic intelligence purposes with the help of two epidemiologists Extracted data include relevant articles (with two possible labels; epidemiological events or general information) and irrelevant information regarding three different diseases: Avian Influenza (AI), African Swine Fever (ASF) and West Nile Virus disease(WNV). This database is extensive as it deals with different types of diseases (zoonotic, cross-border and vectorial disease ) and can be used to train or evaluate classification approaches to automatically identify written text on these diseases events and classify them by relevance. The structure of the dataset is as follow: Alert_id: Article identifier. Note that each article has a unique ID, if an article reports multiple events, it is duplicated and each line represent one event. Title: Article's title given by the news outlet. hsource: URL of the news outlet reporting the article. Source: Name of the news outlet reporting the article. url: URL information of the article reporting the considered event. Note that multiple articles can report same event. Issue_date: Date of the article publication Country: Name of the country where the event happened Place_name: Name of the administration, city or district where the event happened, if none of these is mentionned in the text, the country's name is reported. Administrative_division: The administrative level at which the information is reported (country, department, city...) Disease_name: Name of the disease that is reported in the article Species_name: Name of the affected host that is reported in the article Manualclass: Manual classification (Relevant or Irrelevant) Lat: Place_name lattitude coordinates Lon: Place_name longitude coordinates

Metadata Access
Creator Boudoua, El Bahdja; Richard, Manon; Roche, Mathieu; Teisseire, Maguelonne; Tran, Annelise
Publisher Recherche Data Gouv
Contributor Boudoua, El Bahdja
Publication Year 2023
Rights etalab 2.0; info:eu-repo/semantics/openAccess;
OpenAccess true
Contact Boudoua, El Bahdja (INRAE)
Resource Type Dataset
Format application/pdf; text/tab-separated-values
Size 101686; 347819; 99082; 106659; 124971; 140667
Version 1.0
Discipline Computer Science; Life Sciences