functions.auxiliary module#
- class functions.auxiliary.Paper(filename, doi)#
 Bases:
objectA class to represent research papers.
- build_text(subtext, char_limit)#
 Appends provided subtext to Paper text content.
- write_to_jsonl(jsonl_path)#
 Outputs text content to a sequence of JSONL files each corresponding to a text chunk, where each JSONL line is tokenized by sentence. Example: if provided path is
dir/file.jsonland the Paper text contains two chunks, filesdir/file_1.jsonlanddir/file_2.jsonlwill be generated; otherwise, if the Paper text contains one chunk,dir/file.jsonlwill be generated.- Parameters:
 jsonl_path (str) – Filepath to save JSONL files to, ignores filename extension.
- Return type:
 None
- functions.auxiliary.extract_paper(paper_path, char_limit=None)#
 Converts paper PDF at specified path into a Paper object.
- Parameters:
 - Returns:
 Paper object containing text from specified paper PDF, chunked by character limit.
- Return type:
 
- functions.auxiliary.find_doi(raw_paper)#
 Attempts to find DOI link in paper. Relies on assumption that DOI is present within the first page of the paper.
- functions.auxiliary.get_elsevier_paper(doi_code, api_key, char_limit=None)#
 Converts Elsevier paper with specified DOI code into a Paper object.
- Parameters:
 - Returns:
 Paper object containing text from specified Elsevier paper with given DOI, chunked by character limit.
- Return type: