functions.auxiliary module#
- class functions.auxiliary.Paper(filename, doi)#
Bases:
object
A class to represent research papers.
- build_text(subtext, char_limit)#
Appends provided subtext to Paper text content.
- write_to_jsonl(jsonl_path)#
Outputs text content to a sequence of JSONL files each corresponding to a text chunk, where each JSONL line is tokenized by sentence. Example: if provided path is
dir/file.jsonl
and the Paper text contains two chunks, filesdir/file_1.jsonl
anddir/file_2.jsonl
will be generated; otherwise, if the Paper text contains one chunk,dir/file.jsonl
will be generated.- Parameters:
jsonl_path (str) – Filepath to save JSONL files to, ignores filename extension.
- Return type:
None
- functions.auxiliary.extract_paper(paper_path, char_limit=None)#
Converts paper PDF at specified path into a Paper object.
- Parameters:
- Returns:
Paper object containing text from specified paper PDF, chunked by character limit.
- Return type:
- functions.auxiliary.find_doi(raw_paper)#
Attempts to find DOI link in paper. Relies on assumption that DOI is present within the first page of the paper.
- functions.auxiliary.get_elsevier_paper(doi_code, api_key, char_limit=None)#
Converts Elsevier paper with specified DOI code into a Paper object.
- Parameters:
- Returns:
Paper object containing text from specified Elsevier paper with given DOI, chunked by character limit.
- Return type: