ExtendedRecord(Record)

class metaknowledge.ExtendedRecord(fieldDict, idValue, bad, error, sFile='', sLine=0)

A subclass of Record that adds processing to the dictionary. It also cannot be use directly and must be subclassed.

The ExtendedRecord class is a extension of Record that is intended for use with the records on scientific papers provided by different organizations such as WOS or Pubmed. The 5 abstract (virtual) methods must be defined for each subclass and define how the data in the different fields is processed and how the record can be rewritten to a file.

Processing fields

When an ExtendedRecord is created a dictionary, fieldDict, must be provided this contains the raw data from the file reader, usually as lists of strings. tagProcessingFunc is a staticmethod function that takes in a tag string an returns another function to process it.

Each tag may also be given a second name, as usually what the they are called in the raw data are not very easy to understand (e.g. 'SO' is the journal name for WOs records). The mapping from the raw tag ('SO') to the human friendly string ('journal') is done with the getAltName staticmethod. getAltName takes in a tag string and returns either None or the other name for that string. Note, getAltName must go both directions WOSRecord.getAltName(WOSRecord.getAltName('SO')) == 'SO'.

The last method for processing entries is specialFuncs The following are the special keys for ExtendedRecords. These must be the alternate names of tags or strings accepted by the specialFuncs method.

  • 'authorsFull'
  • 'keywords'
  • 'grants'
  • 'j9'
  • 'authorsShort'
  • 'volume'
  • 'selfCitation'
  • 'citations'
  • 'address'
  • 'abstract'
  • 'title'
  • 'month'
  • 'year'
  • 'journal'
  • 'beginningPage'
  • 'DOI'

specialFuncs when given one of these must raise a KeyError or return an object of the same type as that returned by the MedlineRecord or WOSRecord. e.g. 'title' would return a string giving the title of the record.

For an example of how this works lets first look at the 'SO' tag on a WOSRecord accessed with the alternate name 'journal'.

t = R['journal']

First the private dictionary _computedFields is checked for the key 'title', which will fail if this is the first time 'journal' or 'SO' has been requested, after this the results will be added to the dictionary to speed up future requests.

Then the fieldDict will be checked for the key and when that fails the key will go through getAltName and be checked again. If the record had a journal entry this will succeed and the raw data will be given to the tagProcessingFunc using the same key as fieldDict, in this case SO.

The results will then be written to _computedFields and returned.

If the requested key was instead 'grants' (g = R['grants'])the both lookups to fieldDict would have failed and the string 'grants' would have been given to specialFuncs which would return a list of all the grants in the WOSRecord (this is always [] as WOS does not provided grant information).

What if the key were not present anywhere? Then the specialFuncs should raise a KeyError which will be caught then re-raised like a dictionary would with an invalid key look up.

File Handling fields

The two other required methods encoding and writeRecord define how the records can be rewritten to a file. encoding is should return a string giving the encoding python would use, e.g. 'utf-8' or 'latin-1'. This is the same encoding that the files written by writeRecord should have, writeRecord when called should write the original record to the provided open file, infile. The opening, closing, header and footer of the file will be handled by RecordCollection’s writeFile function which should me modified accordingly. If the order of the fields in a record is important you can use a collections.OrderedDict for fieldDict.

__Init__

The __init__ of ExtendedRecord takes the same arguments as Record

__contains__(item)

Checks if the tag item is in the Record

__getitem__(key)

Processes the tag requested with key and memoize it.

Allows long names, but will still raise a KeyError if the tag is missing, regardless of name used.

__init__(fieldDict, idValue, bad, error, sFile='', sLine=0)

Base constructor for Records

fieldDict : is the unpared entry dict with tags as keys and their lines as a list of strings

idValue : is the unique ID of the Record, e.g. the WOS number

titleKey : is the tag giving the title of the Record, e.g. the WOS tag is 'TI'

bad : is the bool to flag the Record as having encountered an errror

error : is the error that bad indicates

sFile : is the name of the source file

sLine : is the line number of the start of the Record entry

altNames : is a dict that maps the names of tags to an alternative name, i.e. the long names dict. It must be bidirectional: map long to short and short to long

proccessingFuncs : is a dict of functions to proccess the tags. It has the short names as keys and their proccessing fucntions as values. Missing tags will result in the unparsed value to be returned.

The Records inheting from this must implement, calling the implementations in Record with super() will not cause errors:

  • writeRecord
  • tagProcessingFunc
  • encoding
  • titleTag
  • getAltName
authGenders(countsOnly=False, fractionsMode=False, _countsTuple=False)

Creates a dict mapping 'Male', 'Female' and 'Unknown' to lists of the names of all the authors.

bibString(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)

Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (PT J) the bibtext type is set to 'article', otherwise it is set to 'misc'. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.

Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.

Note Record entries that are lists have their values seperated with the string ' and '

createCitation(multiCite=False)

Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags ('year', 'J9', 'volume', 'beginningPage', 'DOI') and using it to create a Citation object.

encoding()

An abstractmethod, gives the encoding string of the record.

get(tag, default=None, raw=False)

Allows access to the raw values or is an Exception safe wrapper to __getitem__.

static getAltName(tag)

An abstractmethod, gives the alternate name of tag or None

getCitations(field=None, values=None, pandasFriendly=True)

Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.

There are also options to filter the output citations with field and values

items(raw=False)

Like items for dicts but with a raw option

specialFuncs(key)

An abstractmethod, process the special tag, key using the whole Record

subDict(tags, raw=False)

Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be None.

static tagProcessingFunc(tag)

An abstractmethod, gives the function for processing tag

values(raw=False)

Like values for dicts but with a raw option

writeRecord(infile)

An abstractmethod, writes the record in its original form to infile