proquest¶

Overview¶

These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.

Functions¶

metaknowledge.proquest.proQuestHandlers.isProQuestFile(infile, checkedLines=2)¶

Determines if infile is the path to a ProQuest file. A file is considered to be a Proquest file if it has the correct encoding (utf-8) and within the first checkedLines the following starts.

____________________________________________________________

Report Information from ProQuest

Parameters¶

infile : str

The path to the targets file

checkedLines : optional [int]

default 2, the number of lines to check for the header

Returns¶

bool

True if the file is a valid ProQuest file

metaknowledge.proquest.proQuestHandlers.proQuestParser(proFile)¶

Parses a ProQuest file, proFile, to extract the individual entries.

A ProQuest file has three sections, first a list of the contained entries, second the full metadata and finally a bibtex formatted entry for the record. This parser only uses the first two as the bibtex contains no information the second section does not. Also, the first section is only used to verify the second section. The returned ProQuestRecord contains the data from the second section, with the same key strings as ProQuest uses and the unlabeled sections are called in order, 'Name', 'Author' and 'url'.

Parameters¶

proFile : str

A path to a valid ProQuest file, use isProQuestFile to verify

Returns¶

set[ProQuestRecord]

Records for each of the entries

Special Functions¶

Tag Functions¶

metaknowledge.proquest.tagProcessing.tagFunctions.proQuestClassification(value)¶

metaknowledge.proquest.tagProcessing.tagFunctions.proQuestIdentifier_Keyword(value)¶

metaknowledge.proquest.tagProcessing.tagFunctions.proQuestSubject(value)¶

metaknowledge.proquest.tagProcessing.tagFunctions.proQuestTagToFunc(tag)¶: Takes a tag string, tag, and returns the processing function for its data. If their is not a predefined function returns the identity function (lambda x : x).

Parameters¶

tag : str

The requested tag

Returns¶

function

A function to process the tag’s data

Backend¶

class metaknowledge.proquest.recordProQuest.ProQuestRecord(inRecord, recNum=None, sFile='', sLine=0)¶

Bases: metaknowledge.mkRecord.ExtendedRecord

Class for full ProQuest entries.

This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use proQuestParser() on a ProQuest file.

authGenders(countsOnly=False, fractionsMode=False, _countsTuple=False)¶: Creates a dict mapping 'Male', 'Female' and 'Unknown' to lists of the names of all the authors.

Parameters¶

countsOnly : optional bool

Default False, if True the counts (lengths of the lists) will be given instead of the lists of names

fractionsMode : optional bool

Default False, if True the fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly

Returns¶

dict[str:str or int]

The mapping of genders to author’s names or counts

authors¶

bibString(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶

Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (PT J) the bibtext type is set to 'article', otherwise it is set to 'misc'. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.

Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.

Note Record entries that are lists have their values seperated with the string ' and '

Parameters¶

maxLength : optional [int]

default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the '#' character) to allow for longer strings

WOSMode : optional [bool]

default False, if True the data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.

restrictedOutput : optional [bool]

default False, if True the tags output will be limited to tose found in metaknowledge.commonRecordFields

niceID : optional [bool]

default True, if True the ID used will be derived from the authors, publishing date and title, if False it will be the UT tag

Returns¶

str

The bibTex string of the Record

copy()¶: Correctly copies the Record

Returns¶

Record

A completely decoupled copy of the original

createCitation(multiCite=False)¶: Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags ('year', 'J9', 'volume', 'beginningPage', 'DOI') and using it to create a Citation object.

Parameters¶

multiCite : optional [bool]

Default False, if True a tuple of Citations is returned with each having a different one of the records authors as the author

Returns¶

Citation

A Citation object containing a citation for the Record.

encoding()¶: An abstractmethod, gives the encoding string of the record.

Returns¶

str

The encoding

get(tag, default=None, raw=False)¶: Allows access to the raw values or is an Exception safe wrapper to __getitem__.

Parameters¶

tag : str

The requested tag

default : optional [Object]

Default None, the object returned when tag is not found

raw : optional [bool]

Default False, if True the unprocessed value of tag is returned

Returns¶

Object

The processed value of tag or default

static getAltName(tag)¶: An abstractmethod, gives the alternate name of tag or None

Parameters¶

tag : str

The requested tag

Returns¶

str

The alternate name of tag or None

getCitations(field=None, values=None, pandasFriendly=True)¶

Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.

There are also options to filter the output citations with field and values

Parameters¶

field : optional str

Default None, if given all citations missing the named field will be dropped.

values : optional str or list[str]

Default None, if field is also given only those citations with one of the strings given in values will be included.

e.g. to get only citations from 1990 or 1991: field = year, values = [1991, 1990]

pandasFriendly : optional bool

Default True, if False a list of the citations will be returned instead of the more complicated pandas dict

Returns¶

dict

A pandas ready dict with all the citations

id¶

items(raw=False)¶: Like items for dicts but with a raw option

Parameters¶

raw : optional [bool]

Default False, if True the KeysView contains the raw values as the values

Returns¶

KeysView

The key-value pairs of the record

keys() → a set-like object providing a view on D's keys¶

sourceFile¶

sourceLine¶

specialFuncs(key)¶: An abstractmethod, process the special tag, key using the whole Record

Parameters¶

key : str

One of the special tags: 'authorsFull', 'keywords', 'grants', 'j9', 'authorsShort', 'volume', 'selfCitation', 'citations', 'address', 'abstract', 'title', 'month', 'year', 'journal', 'beginningPage' and 'DOI'

Returns¶

The processed value of key

subDict(tags, raw=False)¶: Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be None.

Parameters¶

tags : list[str]

The list of tags requested

raw : optional [bool]

default False if True the retuned values of the dict will be unprocessed

Returns¶

dict

A dictionary with the keys tags and the values from the record

static tagProcessingFunc(tag)¶: An abstractmethod, gives the function for processing tag

Parameters¶

tag : optional [str]

The tag in need of processing

Returns¶

function

The function to process the raw tag

title¶

values(raw=False)¶: Like values for dicts but with a raw option

Parameters¶

raw : optional [bool]

Default False, if True the ValuesView contains the raw values

Returns¶

ValuesView

The values of the record

writeRecord(infile)¶: An abstractmethod, writes the record in its original form to infile

Parameters¶

infile : writable file

The file to be written to

metaknowledge.proquest.recordProQuest.proQuestRecordParser(enRecordFile, recNum)¶: The parser ProQuestRecords use. This takes an entry from proQuestParser() and parses it a part of the creation of a ProQuestRecord.

Parameters¶

enRecordFile : enumerate object

a file wrapped by enumerate()

recNum : int

The number given to the entry in the first section of the ProQuest file

Returns¶

collections.OrderedDict

An ordered dictionary of the key-vaue pairs in the entry