WOS

Overview

These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.

Functions

metaknowledge.WOS.wosHandlers.isWOSFile(infile, checkedLines=3)

Determines if infile is the path to a WOS file. A file is considerd to be a WOS file if it has the correct encoding (utf-8 with a BOM) and within the first checkedLines a line starts with "VR 1.0".

Parameters

infile : str

The path to the targets file

checkedLines : optional [int]

default 2, the number of lines to check for the header

Returns

bool

True if the file is a WOS file
metaknowledge.WOS.wosHandlers.wosParser(isifile)

This is a function that is used to create RecordCollections from files.

wosParser() reads the file given by the path isifile, checks that the header is correct then reads until it reaches EF. All WOS records it encounters are parsed with recordParser() and converted into Records. A list of these Records is returned.

BadWOSFile is raised if an issue is found with the file.

Parameters

isifile : str

The path to the target file

Returns

List[Record]

All the Records found in isifile

Help Functions

metaknowledge.WOS.tagProcessing.helpFuncs.getMonth(s)
Known formats:
Month (“%b”)
Month Day (“%b %d”)
Month-Month (“%b-%b”) — this gets coerced to the first %b, dropping the month range
Season (“%s”) — this gets coerced to use the first month of the given season
Month Day Year (“%b %d %Y”)
Month Year (“%b %Y”)
Year Month Day (“%Y %m %d”)
metaknowledge.WOS.tagProcessing.helpFuncs.makeBiDirectional(d)
Helper for generating tagNameConverter
Makes dict that maps from key to value and back
metaknowledge.WOS.tagProcessing.helpFuncs.reverseDict(d)
Helper for generating fullToTag
Makes dict of value to key

Tag Functions

metaknowledge.WOS.tagProcessing.tagFunctions.DOI(val)

The DI Tag

return the DOI number of the record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The DOI number string
metaknowledge.WOS.tagProcessing.tagFunctions.ISBN(val)

The BN Tag

extracts a list of ISBNs associated with the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list

The ISBNs
metaknowledge.WOS.tagProcessing.tagFunctions.ISSN(val)

The SN Tag

extracts the ISSN of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The ISSN string
metaknowledge.WOS.tagProcessing.tagFunctions.ResearcherIDnumber(val)

The RI Tag

extracts a list of the research IDs of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

The list of the research IDs
metaknowledge.WOS.tagProcessing.tagFunctions.abstract(val)

The AB Tag

return abstract of the record, with newlines hopefully in the correct places

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The abstract
metaknowledge.WOS.tagProcessing.tagFunctions.articleNumber(val)

The AR Tag

extracts a string giving the article number, not all are integers

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The article number
metaknowledge.WOS.tagProcessing.tagFunctions.authAddress(val)

The C1 Tag

extracts the address of the authors as given by WOS. Warning the mapping of author to address is not very good and is given in multiple ways.

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of addresses
metaknowledge.WOS.tagProcessing.tagFunctions.authKeywords(val)

The DE Tag

extracts the keywords assigned by the author of the Record. The WOS description is:

Author keywords are included in records of articles from 1991 forward. They are also include in conference proceedings records.

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

The list of keywords
metaknowledge.WOS.tagProcessing.tagFunctions.authorsFull(val)

The AF Tag

extracts a list of authors full names

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of author’s names
metaknowledge.WOS.tagProcessing.tagFunctions.authorsShort(val)

The AU Tag

extracts a list of authors shortened names

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of shortened author’s names
metaknowledge.WOS.tagProcessing.tagFunctions.beginningPage(val)

The BP Tag

extracts the first page the record occurs on, not all are integers

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The first page number
metaknowledge.WOS.tagProcessing.tagFunctions.bookAuthor(val)

The BA Tag

extracts a list of the short names of the authors of a book Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of shortened author’s names
metaknowledge.WOS.tagProcessing.tagFunctions.bookAuthorFull(val)

The BF Tag

extracts a list of the long names of the authors of a book Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of author’s names
metaknowledge.WOS.tagProcessing.tagFunctions.bookDOI(val)

The D2 Tag

extracts the book DOI of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The DOI number
metaknowledge.WOS.tagProcessing.tagFunctions.citations(val)

The CR Tag

extracts a list of all the citations in the record, the citations are the metaknowledge.Citation class.

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[metaknowledge.Citation]

A list of Citations
metaknowledge.WOS.tagProcessing.tagFunctions.citedRefsCount(val)

The NR Tag

extracts the number citations, length of CR list

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

The number of CRs
metaknowledge.WOS.tagProcessing.tagFunctions.confDate(val)

The CY Tag

extracts the date string of the conference associated with the Record, the date is not normalized

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The data of the conference
metaknowledge.WOS.tagProcessing.tagFunctions.confHost(val)

The HO Tag

extracts the host of the conference

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The host
metaknowledge.WOS.tagProcessing.tagFunctions.confLocation(val)

The CL Tag

extracts the sting giving the conference’s location

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The conferences address
metaknowledge.WOS.tagProcessing.tagFunctions.confSponsors(val)

The SP Tag

extracts a list of sponsors for the conference associated with the record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

A the list of of sponsors
metaknowledge.WOS.tagProcessing.tagFunctions.confTitle(val)

The CT Tag

extracts the title of the conference associated with the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The title of the conference
metaknowledge.WOS.tagProcessing.tagFunctions.docType(val)

The DT Tag

extracts the type of document the Record contains

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The type of the Record
metaknowledge.WOS.tagProcessing.tagFunctions.documentDeliveryNumber(val)

The GA Tag

extracts the document delivery number of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The document delivery number
metaknowledge.WOS.tagProcessing.tagFunctions.eISSN(val)

The EI Tag

extracts the EISSN of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The EISSN string
metaknowledge.WOS.tagProcessing.tagFunctions.editedBy(val)

The BE Tag

extracts a list of the editors of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of editors
metaknowledge.WOS.tagProcessing.tagFunctions.editors(val)

Needs Work

currently not well understood, returns val

metaknowledge.WOS.tagProcessing.tagFunctions.email(val)

The EM Tag

extracts a list of emails given by the authors of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of emails
metaknowledge.WOS.tagProcessing.tagFunctions.endingPage(val)

The EP Tag

return the last page the record occurs on as a string, not aall are intergers

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The final page number
metaknowledge.WOS.tagProcessing.tagFunctions.funding(val)

The FU Tag

extracts a list of the groups funding the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of funding groups
metaknowledge.WOS.tagProcessing.tagFunctions.fundingText(val)

The FX Tag

extracts a string of the funding thanks

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The funding thank-you
metaknowledge.WOS.tagProcessing.tagFunctions.group(val)

The GP Tag

extracts the group associated with the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

A the name of the group
metaknowledge.WOS.tagProcessing.tagFunctions.groupName(val)

The CA Tag

extracts the name of the group associated with the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The group’s name
metaknowledge.WOS.tagProcessing.tagFunctions.isoAbbreviation(val)

The JI Tag

extracts the iso abbreviation of the journal

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The iso abbreviation of the journal
metaknowledge.WOS.tagProcessing.tagFunctions.issue(val)

The IS Tag

extracts a string giving the issue or range of issues the Record was in, not all are integers

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The issue number/range
metaknowledge.WOS.tagProcessing.tagFunctions.j9(val)

The J9 Tag

extracts the J9 (29-Character Source Abbreviation) of the publication

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The 29-Character Source Abbreviation
metaknowledge.WOS.tagProcessing.tagFunctions.journal(val)

The SO Tag

extracts the full name of the publication and normalizes it to uppercase

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The name of the journal
metaknowledge.WOS.tagProcessing.tagFunctions.keywords(val)

The ID Tag

extracts the WOS keywords of the Record. The WOS description is:

KeyWords Plus are index terms created by Thomson Reuters from significant, frequently occurring words in the titles of an article's cited references.

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

The keyWords list
metaknowledge.WOS.tagProcessing.tagFunctions.language(val)

The LA Tag

extracts the languages of the Record as a string with languages separated by ‘, ‘, usually there is only one language

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The language(s) of the record
metaknowledge.WOS.tagProcessing.tagFunctions.meetingAbstract(val)

The MA Tag

extracts the ID of the meeting abstract prefixed by ‘EPA-‘

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The meeting abstract prefixed
metaknowledge.WOS.tagProcessing.tagFunctions.month(val)

The PD Tag

extracts the month the record was published in as an int with January as 1, February 2, …

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

A integer giving the month
metaknowledge.WOS.tagProcessing.tagFunctions.orcID(val)

The OI Tag

extracts a list of orc IDs of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The orc ID
metaknowledge.WOS.tagProcessing.tagFunctions.pageCount(val)

The PG Tag

returns an integer giving the number of pages of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

The page count
metaknowledge.WOS.tagProcessing.tagFunctions.partNumber(val)

The PN Tag

return an integer giving the part of the issue the Record is in

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

The part of the issue of the Record
metaknowledge.WOS.tagProcessing.tagFunctions.pubMedID(val)

The PM Tag

extracts the pubmed ID of the record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The pubmed ID
metaknowledge.WOS.tagProcessing.tagFunctions.pubType(val)

The PT Tag

extracts the type of publication as a character: conference, book, journal, book in series, or patent

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

A string
metaknowledge.WOS.tagProcessing.tagFunctions.publisher(val)

The PU Tag

extracts the publisher of the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The publisher
metaknowledge.WOS.tagProcessing.tagFunctions.publisherAddress(val)

The PA Tag

extracts the publishers address

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The publisher address
metaknowledge.WOS.tagProcessing.tagFunctions.publisherCity(val)

The PI Tag

extracts the city the publisher is in

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The city of the publisher
metaknowledge.WOS.tagProcessing.tagFunctions.reprintAddress(val)

The RP Tag

extracts the reprint address string

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The reprint address
metaknowledge.WOS.tagProcessing.tagFunctions.seriesSubtitle(val)

The BS Tag

extracts the title of the series the Record is in

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The subtitle of the series
metaknowledge.WOS.tagProcessing.tagFunctions.seriesTitle(val)

The SE Tag

extracts the title of the series the Record is in

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The title of the series
metaknowledge.WOS.tagProcessing.tagFunctions.specialIssue(val)

The SI Tag

extracts the special issue value

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The special issue value
metaknowledge.WOS.tagProcessing.tagFunctions.subjectCategory(val)

The SC Tag

extracts a list of the subjects associated with the Record

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

A list of the subjects associated with the Record
metaknowledge.WOS.tagProcessing.tagFunctions.subjects(val)

The WC Tag

extracts a list of subjects as assigned by WOS

Parameters

val: list[str]

The raw data from a WOS file

Returns

list[str]

The subjects list
metaknowledge.WOS.tagProcessing.tagFunctions.supplement(val)

The SU Tag

extracts the supplement number

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The supplement number
metaknowledge.WOS.tagProcessing.tagFunctions.title(val)

The TI Tag

extracts the title of the record

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The title of the record
metaknowledge.WOS.tagProcessing.tagFunctions.totalTimesCited(val)

The Z9 Tag

extracts the total number of citations of the record

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

The total number of citations
metaknowledge.WOS.tagProcessing.tagFunctions.volume(val)

The VL Tag

return the volume the record is in as a string, not all are integers

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The volume number
metaknowledge.WOS.tagProcessing.tagFunctions.wosString(val)

The UT Tag

extracts the WOS number of the record as a string preceded by “WOS:”

Parameters

val: list[str]

The raw data from a WOS file

Returns

str

The WOS number
metaknowledge.WOS.tagProcessing.tagFunctions.wosTimesCited(val)

The TC Tag

extracts the number of times the Record has been cited by records in WOS

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

The number of time the Record has been cited
metaknowledge.WOS.tagProcessing.tagFunctions.year(val)

The PY Tag

extracts the year the record was published in as an int

Parameters

val: list[str]

The raw data from a WOS file

Returns

int

The year

Dict Functions

metaknowledge.WOS.tagProcessing.funcDicts.isTagOrName(val)

Checks if val is a tag or full name of tag if so returns True

Parameters

val: str

A string possible forming a tag or name

Returns

bool

True if val is a tag or name, otherwise False
metaknowledge.WOS.tagProcessing.funcDicts.normalizeToName(val)

Converts tags or full names to full names, case sensitive

Parameters

val: str

A two character string giving the tag or its full name

Returns

str

The full name of val
metaknowledge.WOS.tagProcessing.funcDicts.normalizeToTag(val)

Converts tags or full names to 2 character tags, case insensitive

Parameters

val: str

A two character string giving the tag or its full name

Returns

str

The short name of val
metaknowledge.WOS.tagProcessing.funcDicts.tagToFull(tag)

A wrapper for tagToFullDict, it maps 2 character tags to their full names.

Parameters

tag: str

A two character string giving the tag

Returns

str

The full name of tag

Backend

This file contains the Record class for metaknowledge and one helper function for parsing WOS records, recordParser. The record class is used to represent a single records meta-data from WOS.

class metaknowledge.WOS.recordWOS.WOSRecord(inRecord, sFile='', sLine=0)

Bases: metaknowledge.mkRecord.ExtendedRecord

Class for full WOS records

It is meant to be immutable; many of the methods and attributes are evaluated when first called, not when the object is created, and the results are stored privately.

The record’s meta-data is stored in an ordered dictionary labeled by WOS tags. To access the raw data stored in the original record the tags() method can be used. To access data that has been processed and cleaned the attributes named after the tags are used.

Customizations

The Record’s hashing and equality testing are based on the WOS number (the tag is ‘UT’, and also called the accession number). They are strings starting with 'WOS:' and followed by 15 or so numbers and letters, although both the length and character set are known to vary. The numbers are unique to each record so are used for comparisons. If a record is bad all equality checks return False.

When converted to a string the records title is used so for a record R, R.TI == R.title == str(R) and its representation uses the WOS number instead of memory location.

Attributes

When a record is created if the parsing of the WOS file failed it is marked as bad. The bad attribute is set to True and the error attribute is created to contain the exception object.

Generally, to get the information from a Record its attributes should be used. For a Record R, calling R.CR causes citations() from the the tagProcessing module to be called on the contents of the raw ‘CR’ field. Then the result is saved and returned. In this case, a list of Citation objects is returned. You can also call R.citations to get the same effect, as each known field tag has a longer name (currently there are 61 field tags). These names are meant to make accessing tags more readable and mapping from tag to name can be found in the tagToFull dict. If a tag is known (in tagToFull) but not in the raw data None is returned instead. Most tags when cleaned return a string or list of strings, the exact results can be found in the help for the particular function.

The attribute authors is also defined as a convenience and returns the same as ‘AF’ or if that is not found ‘AU’.

__Init__

Records are generally created as collections in Recordcollections, and not as individual objects. If you wish to create one on its own it is possible, the arguments are as follows.

Parameters

inRecord: files stream, dict, str or itertools.chain

If it is a file stream the file must be open at the location of the first tag in the record, usually ‘PT’, and the file will be read until ‘ER’ is found, which indicates the end of the record in the file.

If a dict is passed the dictionary is used as the database of fields and tags, so each key is considered a WOS tag and each value a list of the lines of the original associated with the tag. This is the same form of dict that recordParser returns.

For a string the input must be the raw textual data of a single record in the WOS style, like the file stream it must start at the first tag and end in 'ER'.

itertools.chain is treated identically to a file stream and is used by RecordCollections.

sFile : optional [str]

Is the name of the file the raw data was in, by default it is blank. It is mostly used to make error messages more informative.

sLine : optional [int]

Is the line the record starts on in the raw data file. It is mostly used to make error messages more informative.
UT

Returns the UT tag (WOS number) of the record

authGenders(countsOnly=False, fractionsMode=False, _countsTuple=False)

Creates a dict mapping 'Male', 'Female' and 'Unknown' to lists of the names of all the authors.

authors
bibString(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)

Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (PT J) the bibtext type is set to 'article', otherwise it is set to 'misc'. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.

Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.

Note Record entries that are lists have their values seperated with the string ' and '

copy()

Correctly copies the Record

createCitation(multiCite=False)

Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags ('year', 'J9', 'volume', 'beginningPage', 'DOI') and using it to create a Citation object.

encoding()

An abstractmethod, gives the encoding string of the record.

get(tag, default=None, raw=False)

Allows access to the raw values or is an Exception safe wrapper to __getitem__.

static getAltName(tag)

An abstractmethod, gives the alternate name of tag or None

getCitations(field=None, values=None, pandasFriendly=True)

Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.

There are also options to filter the output citations with field and values

id
items(raw=False)

Like items for dicts but with a raw option

keys() → a set-like object providing a view on D's keys
sourceFile
sourceLine
specialFuncs(key)

An abstractmethod, process the special tag, key using the whole Record

subDict(tags, raw=False)

Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be None.

static tagProcessingFunc(tag)

An abstractmethod, gives the function for processing tag

title
values(raw=False)

Like values for dicts but with a raw option

wosString

Returns the WOS number (UT tag) of the record

writeRecord(infile)

Writes to infile the original contents of the Record. This is intended for use by RecordCollections to write to file. What is written to infile is bit for bit identical to the original record file (if utf-8 is used). No newline is inserted above the write but the last character is a newline.

metaknowledge.WOS.recordWOS.recordParser(paper)

This is function that is used to create Records from files.

recordParser() reads the file paper until it reaches ‘ER’. For each field tag it adds an entry to the returned dict with the tag as the key and a list of the entries as the value, the list has each line separately, so for the following two lines in a record:

AF BREVIK, I
   ANICIN, B

The entry in the returned dict would be {'AF' : ["BREVIK, I", "ANICIN, B"]}

Record objects can be created with these dictionaries as the initializer.

Parameters

paper : file stream

An open file, with the current line at the beginning of the WOS record.

Returns

OrderedDict[str : List[str]]

A dictionary mapping WOS tags to lists, the lists are of strings, each string is a line of the record associated with the tag.