We have annotated the corpus (full papers and their sentences) with their ATC codes, and made this resource available as an easy to query repository. Besides, we are applying probabilistic topic modelling and clustering techniques to cluster drugs, symptoms, diseases, etc. Examples of the queries that we can solve are: papers that report the usage of drugs that combine inmunosupressors and antimalaria activities with macrolide antibiotics?
With the increase in the number of patients affected by COVID-19 and admitted to Intensive Care Units throughout the world, the demand for the drugs necessary for their treatment has increased. Although laboratories have increased production, there is a clear shortage of many drugs in pharmacies of hospitals. In March 2020, The Allen Institute for Artificial Intelligence released the CORD-19 dataset, a corpus of more than 30.000 papers related to coronavirus, mostly since 2003. Browsing this corpus may allow knowing the application or use of a drug in the treatment of COVID-19 or related previous diseases, or identify the relationships between drugs described in a protocol of action.
This is especially focused on clinicians and people working at hospitals who are defining therapeutic groups and guidelines for the treatment of the different stages of the diseases on different groups of patients.
However, the corpus is difficult to navigate through. The same drugs are often mentioned with different names (active principle, trade name, therapeutic group... ). Sometimes the mention of a medicine informs about its possible use and in other cases it may be a phrase without relevance to this objective.