Basic Usage#

Calling the CLI#

Before using any ChemREL CLI commands, first, cd into the directory in which you initialized ChemREL as follows, where [PATH] is the ChemREL Initial Direcory path in which you originally ran the chemrel init command.

$ cd [PATH]

Caution

If you fail to cd into this directory before beginning to use ChemREL in any new terminal session, the necessary files will not be visible to the program.

To print the help text for any command or subcommand, simply enter the desired command followed by the --help flag. For example, the following will print the help text for the chemrel predict command.

$ chemrel predict --help

Training New Models#

ChemREL span categorization, relation extraction, and associated transfer learning models can be trained through the ChemREL CLI.

For a demonstration on training a span categorization model for a new chemical property, see the Span Categorization Demo notebook.

For a full list of available CLI commands, view the ChemREL CLI Reference.

Importing Functions#

In addition to the CLI, the ChemREL PyPI package exposes a number of functions which can be imported from within your own code. You must first import the specific submodule containing your desired function from the functions package. For example, to import the auxiliary submodule, run the following line.

from chemrel.functions import auxiliary

You can then reference any available functions within the auxiliary submodule. For example, to call the extract_paper() function, you can run the following.

paper = auxiliary.extract_paper("/example/paper/path")

For a demonstration on importing methods in a Jupyter notebook, see the Prediction Functions Demo notebook.

For a full list of importable functions, view the ChemREL Functions Reference.

Citation#

Any alterations to the models, datasets, or functions included with ChemREL must be properly attributed according to the following citation.

@article{doi:10.1021/acs.jcim.4c00816,
  author = {Alshehri, Abdulelah S. and Horstmann, Kai A. and You, Fengqi},
  title = {Versatile Deep Learning Pipeline for Transferable Chemical Data Extraction},
  journal = {Journal of Chemical Information and Modeling},
  volume = {64},
  number = {15},
  pages = {5888-5899},
  year = {2024},
  doi = {10.1021/acs.jcim.4c00816},
  note = {PMID: 39009039},
  url = {https://doi.org/10.1021/acs.jcim.4c00816},
  eprint = {https://doi.org/10.1021/acs.jcim.4c00816},
  abstract = {Chemical information disseminated in scientific documents offers an untapped potential for deep learning-assisted insights and breakthroughs. Automated extraction efforts have shifted from resource-intensive manual extraction toward applying machine learning methods to streamline chemical data extraction. While current extraction models and pipelines have ushered in notable efficiency improvements, they often exhibit modest performance, compromising the accuracy of predictive models trained on extracted data. Further, current chemical pipelines lack both transferability─where a model trained on one task can be adapted to another relevant task with limited examples─and extensibility, which enables seamless adaptability for new extraction tasks. Addressing these gaps, we present ChemREL, a versatile chemical data extraction pipeline emphasizing performance, transferability, and extensibility. ChemREL utilizes a custom, diverse data set of chemical documents, labeled through an active learning strategy to extract two properties: normal melting point and lethal dose 50 (LD50). The normal melting point is selected for its prevalence in diverse contexts and wider literature, serving as the foundation for pipeline training. In contrast, LD50 evaluates the pipeline’s transferability to an unrelated property, underscoring variance in its biological nature, toxicological context, and units, among other differences. With pretraining and fine-tuning, our pipeline outperforms existing methods and GPT-4, achieving F1-scores of 96.1\% for entity identification and 97.0\% for relation mapping, culminating in an overall F1-score of 95.4\%. More importantly, ChemREL displays high transferability, effectively transitioning from melting point extraction to LD50 extraction with 10 randomly selected training documents. Released as an open-source package, ChemREL aims to broaden access to chemical data extraction, enabling the construction of expansive relational data sets that propel discovery.}
}