ExtendedRecord(Record)¶
-
class
metaknowledge.
ExtendedRecord
(fieldDict, idValue, bad, error, sFile='', sLine=0)¶ A subclass of
Record
that adds processing to the dictionary. It also cannot be use directly and must be subclassed.The
ExtendedRecord
class is a extension ofRecord
that is intended for use with the records on scientific papers provided by different organizations such as WOS or Pubmed. The 5 abstract (virtual) methods must be defined for each subclass and define how the data in the different fields is processed and how the record can be rewritten to a file.Processing fields¶
When an
ExtendedRecord
is created a dictionary, fieldDict, must be provided this contains the raw data from the file reader, usually as lists of strings.tagProcessingFunc
is astaticmethod
function that takes in a tag string an returns another function to process it.Each tag may also be given a second name, as usually what the they are called in the raw data are not very easy to understand (e.g.
'SO'
is the journal name for WOs records). The mapping from the raw tag ('SO'
) to the human friendly string ('journal'
) is done with thegetAltName
staticmethod
.getAltName
takes in a tag string and returns eitherNone
or the other name for that string. Note,getAltName
must go both directionsWOSRecord.getAltName(WOSRecord.getAltName('SO')) == 'SO'
.The last method for processing entries is
specialFuncs
The following are the special keys forExtendedRecords
. These must be the alternate names of tags or strings accepted by thespecialFuncs
method.'authorsFull'
'keywords'
'grants'
'j9'
'authorsShort'
'volume'
'selfCitation'
'citations'
'address'
'abstract'
'title'
'month'
'year'
'journal'
'beginningPage'
'DOI'
specialFuncs
when given one of these must raise aKeyError
or return an object of the same type as that returned by theMedlineRecord
orWOSRecord
. e.g.'title'
would return a string giving the title of the record.For an example of how this works lets first look at the
'SO'
tag on aWOSRecord
accessed with the alternate name'journal'
.t = R['journal']
First the private dictionary
_computedFields
is checked for the key'title'
, which will fail if this is the first time'journal'
or'SO'
has been requested, after this the results will be added to the dictionary to speed up future requests.Then the fieldDict will be checked for the key and when that fails the key will go through
getAltName
and be checked again. If the record had a journal entry this will succeed and the raw data will be given to thetagProcessingFunc
using the same key as fieldDict, in this caseSO
.The results will then be written to
_computedFields
and returned.If the requested key was instead
'grants'
(g = R['grants']
)the both lookups to fieldDict would have failed and the string'grants'
would have been given tospecialFuncs
which would return a list of all the grants in theWOSRecord
(this is always[]
as WOS does not provided grant information).What if the key were not present anywhere? Then the
specialFuncs
should raise aKeyError
which will be caught then re-raised like a dictionary would with an invalid key look up.File Handling fields¶
The two other required methods
encoding
andwriteRecord
define how the records can be rewritten to a file.encoding
is should return a string giving the encoding python would use, e.g.'utf-8'
or'latin-1'
. This is the same encoding that the files written bywriteRecord
should have,writeRecord
when called should write the original record to the provided open file, infile. The opening, closing, header and footer of the file will be handled byRecordCollection
’swriteFile
function which should me modified accordingly. If the order of the fields in a record is important you can use a collections.OrderedDict for fieldDict.__Init__¶
The
__init__
ofExtendedRecord
takes the same arguments as Record-
__contains__
(item)¶ Checks if the tag item is in the Record
-
__getitem__
(key)¶ Processes the tag requested with key and memoize it.
Allows long names, but will still raise a KeyError if the tag is missing, regardless of name used.
-
__init__
(fieldDict, idValue, bad, error, sFile='', sLine=0)¶ Base constructor for Records
fieldDict : is the unpared entry dict with tags as keys and their lines as a list of strings
idValue : is the unique ID of the Record, e.g. the WOS number
titleKey : is the tag giving the title of the Record, e.g. the WOS tag is
'TI'
bad : is the bool to flag the Record as having encountered an errror
error : is the error that bad indicates
sFile : is the name of the source file
sLine : is the line number of the start of the Record entry
altNames : is a dict that maps the names of tags to an alternative name, i.e. the long names dict. It must be bidirectional: map long to short and short to long
proccessingFuncs : is a dict of functions to proccess the tags. It has the short names as keys and their proccessing fucntions as values. Missing tags will result in the unparsed value to be returned.
The Records inheting from this must implement, calling the implementations in Record with super() will not cause errors:
- writeRecord
- tagProcessingFunc
- encoding
- titleTag
- getAltName
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
-
items
(raw=False)¶ Like
items
for dicts but with araw
option
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
writeRecord
(infile)¶ An
abstractmethod
, writes the record in its original form to infile