WOS¶
Overview¶
These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.
Functions¶
-
metaknowledge.WOS.wosHandlers.
isWOSFile
(infile, checkedLines=3)¶ Determines if infile is the path to a WOS file. A file is considerd to be a WOS file if it has the correct encoding (
utf-8
with a BOM) and within the first checkedLines a line starts with"VR 1.0"
.Parameters¶
infile :
str
The path to the targets filecheckedLines :
optional [int]
default 2, the number of lines to check for the header
-
metaknowledge.WOS.wosHandlers.
wosParser
(isifile)¶ This is a function that is used to create RecordCollections from files.
wosParser() reads the file given by the path isifile, checks that the header is correct then reads until it reaches EF. All WOS records it encounters are parsed with recordParser() and converted into Records. A list of these
Records
is returned.BadWOSFile
is raised if an issue is found with the file.
Help Functions¶
-
metaknowledge.WOS.tagProcessing.helpFuncs.
getMonth
(s)¶ - Known formats:Month (“%b”)Month Day (“%b %d”)Month-Month (“%b-%b”) — this gets coerced to the first %b, dropping the month rangeSeason (“%s”) — this gets coerced to use the first month of the given seasonMonth Day Year (“%b %d %Y”)Month Year (“%b %Y”)Year Month Day (“%Y %m %d”)
-
metaknowledge.WOS.tagProcessing.helpFuncs.
makeBiDirectional
(d)¶ - Helper for generating tagNameConverterMakes dict that maps from key to value and back
-
metaknowledge.WOS.tagProcessing.helpFuncs.
reverseDict
(d)¶ - Helper for generating fullToTagMakes dict of value to key
Tag Functions¶
-
metaknowledge.WOS.tagProcessing.tagFunctions.
DOI
(val)¶ The DI Tag¶
return the DOI number of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
ISBN
(val)¶ The BN Tag¶
extracts a list of ISBNs associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
ResearcherIDnumber
(val)¶ The RI Tag¶
extracts a list of the research IDs of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
abstract
(val)¶ The AB Tag¶
return abstract of the record, with newlines hopefully in the correct places
-
metaknowledge.WOS.tagProcessing.tagFunctions.
articleNumber
(val)¶ The AR Tag¶
extracts a string giving the article number, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
authAddress
(val)¶ The C1 Tag¶
extracts the address of the authors as given by WOS. Warning the mapping of author to address is not very good and is given in multiple ways.
-
metaknowledge.WOS.tagProcessing.tagFunctions.
authKeywords
(val)¶ The DE Tag¶
extracts the keywords assigned by the author of the Record. The WOS description is:
Author keywords are included in records of articles from 1991 forward. They are also include in conference proceedings records.
The AF Tag¶
extracts a list of authors full names
The AU Tag¶
extracts a list of authors shortened names
-
metaknowledge.WOS.tagProcessing.tagFunctions.
beginningPage
(val)¶ The BP Tag¶
extracts the first page the record occurs on, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
bookAuthor
(val)¶ The BA Tag¶
extracts a list of the short names of the authors of a book Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
bookAuthorFull
(val)¶ The BF Tag¶
extracts a list of the long names of the authors of a book Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
bookDOI
(val)¶ The D2 Tag¶
extracts the book DOI of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
citations
(val)¶ The CR Tag¶
extracts a list of all the citations in the record, the citations are the metaknowledge.Citation class.
-
metaknowledge.WOS.tagProcessing.tagFunctions.
citedRefsCount
(val)¶ The NR Tag¶
extracts the number citations, length of CR list
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confDate
(val)¶ The CY Tag¶
extracts the date string of the conference associated with the Record, the date is not normalized
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confHost
(val)¶ The HO Tag¶
extracts the host of the conference
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confLocation
(val)¶ The CL Tag¶
extracts the sting giving the conference’s location
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confSponsors
(val)¶ The SP Tag¶
extracts a list of sponsors for the conference associated with the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confTitle
(val)¶ The CT Tag¶
extracts the title of the conference associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
docType
(val)¶ The DT Tag¶
extracts the type of document the Record contains
-
metaknowledge.WOS.tagProcessing.tagFunctions.
documentDeliveryNumber
(val)¶ The GA Tag¶
extracts the document delivery number of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
eISSN
(val)¶ The EI Tag¶
extracts the EISSN of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
editedBy
(val)¶ The BE Tag¶
extracts a list of the editors of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
editors
(val)¶ Needs Work¶
currently not well understood, returns val
-
metaknowledge.WOS.tagProcessing.tagFunctions.
email
(val)¶ The EM Tag¶
extracts a list of emails given by the authors of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
endingPage
(val)¶ The EP Tag¶
return the last page the record occurs on as a string, not aall are intergers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
funding
(val)¶ The FU Tag¶
extracts a list of the groups funding the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
fundingText
(val)¶ The FX Tag¶
extracts a string of the funding thanks
-
metaknowledge.WOS.tagProcessing.tagFunctions.
group
(val)¶ The GP Tag¶
extracts the group associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
groupName
(val)¶ The CA Tag¶
extracts the name of the group associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
isoAbbreviation
(val)¶ The JI Tag¶
extracts the iso abbreviation of the journal
-
metaknowledge.WOS.tagProcessing.tagFunctions.
issue
(val)¶ The IS Tag¶
extracts a string giving the issue or range of issues the Record was in, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
j9
(val)¶ The J9 Tag¶
extracts the J9 (29-Character Source Abbreviation) of the publication
-
metaknowledge.WOS.tagProcessing.tagFunctions.
journal
(val)¶ The SO Tag¶
extracts the full name of the publication and normalizes it to uppercase
-
metaknowledge.WOS.tagProcessing.tagFunctions.
keywords
(val)¶ The ID Tag¶
extracts the WOS keywords of the Record. The WOS description is:
KeyWords Plus are index terms created by Thomson Reuters from significant, frequently occurring words in the titles of an article's cited references.
-
metaknowledge.WOS.tagProcessing.tagFunctions.
language
(val)¶ The LA Tag¶
extracts the languages of the Record as a string with languages separated by ‘, ‘, usually there is only one language
-
metaknowledge.WOS.tagProcessing.tagFunctions.
meetingAbstract
(val)¶ The MA Tag¶
extracts the ID of the meeting abstract prefixed by ‘EPA-‘
-
metaknowledge.WOS.tagProcessing.tagFunctions.
month
(val)¶ The PD Tag¶
extracts the month the record was published in as an int with January as 1, February 2, …
-
metaknowledge.WOS.tagProcessing.tagFunctions.
orcID
(val)¶ The OI Tag¶
extracts a list of orc IDs of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
pageCount
(val)¶ The PG Tag¶
returns an integer giving the number of pages of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
partNumber
(val)¶ The PN Tag¶
return an integer giving the part of the issue the Record is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
pubMedID
(val)¶ The PM Tag¶
extracts the pubmed ID of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
pubType
(val)¶ The PT Tag¶
extracts the type of publication as a character: conference, book, journal, book in series, or patent
-
metaknowledge.WOS.tagProcessing.tagFunctions.
publisher
(val)¶ The PU Tag¶
extracts the publisher of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
publisherAddress
(val)¶ The PA Tag¶
extracts the publishers address
-
metaknowledge.WOS.tagProcessing.tagFunctions.
publisherCity
(val)¶ The PI Tag¶
extracts the city the publisher is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
reprintAddress
(val)¶ The RP Tag¶
extracts the reprint address string
-
metaknowledge.WOS.tagProcessing.tagFunctions.
seriesSubtitle
(val)¶ The BS Tag¶
extracts the title of the series the Record is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
seriesTitle
(val)¶ The SE Tag¶
extracts the title of the series the Record is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
specialIssue
(val)¶ The SI Tag¶
extracts the special issue value
-
metaknowledge.WOS.tagProcessing.tagFunctions.
subjectCategory
(val)¶ The SC Tag¶
extracts a list of the subjects associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
subjects
(val)¶ The WC Tag¶
extracts a list of subjects as assigned by WOS
-
metaknowledge.WOS.tagProcessing.tagFunctions.
supplement
(val)¶ The SU Tag¶
extracts the supplement number
-
metaknowledge.WOS.tagProcessing.tagFunctions.
title
(val)¶ The TI Tag¶
extracts the title of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
totalTimesCited
(val)¶ The Z9 Tag¶
extracts the total number of citations of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
volume
(val)¶ The VL Tag¶
return the volume the record is in as a string, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
wosString
(val)¶ The UT Tag¶
extracts the WOS number of the record as a string preceded by “WOS:”
Dict Functions¶
-
metaknowledge.WOS.tagProcessing.funcDicts.
isTagOrName
(val)¶ Checks if val is a tag or full name of tag if so returns
True
-
metaknowledge.WOS.tagProcessing.funcDicts.
normalizeToName
(val)¶ Converts tags or full names to full names, case sensitive
-
metaknowledge.WOS.tagProcessing.funcDicts.
normalizeToTag
(val)¶ Converts tags or full names to 2 character tags, case insensitive
-
metaknowledge.WOS.tagProcessing.funcDicts.
tagToFull
(tag)¶ A wrapper for
tagToFullDict
, it maps 2 character tags to their full names.
Backend¶
This file contains the Record class for metaknowledge and one helper function for parsing WOS records, recordParser. The record class is used to represent a single records meta-data from WOS.
-
class
metaknowledge.WOS.recordWOS.
WOSRecord
(inRecord, sFile='', sLine=0)¶ Bases:
metaknowledge.mkRecord.ExtendedRecord
Class for full WOS records
It is meant to be immutable; many of the methods and attributes are evaluated when first called, not when the object is created, and the results are stored privately.
The record’s meta-data is stored in an ordered dictionary labeled by WOS tags. To access the raw data stored in the original record the tags() method can be used. To access data that has been processed and cleaned the attributes named after the tags are used.
Customizations¶
The
Record
’s hashing and equality testing are based on the WOS number (the tag is ‘UT’, and also called the accession number). They are strings starting with'WOS:'
and followed by 15 or so numbers and letters, although both the length and character set are known to vary. The numbers are unique to each record so are used for comparisons. If a record isbad
all equality checks returnFalse
.When converted to a string the records title is used so for a record
R
,R.TI == R.title == str(R)
and its representation uses the WOS number instead of memory location.Attributes¶
When a record is created if the parsing of the WOS file failed it is marked as
bad
. Thebad
attribute is set to True and theerror
attribute is created to contain the exception object.Generally, to get the information from a Record its attributes should be used. For a Record
R
, callingR.CR
causes citations() from the the tagProcessing module to be called on the contents of the raw ‘CR’ field. Then the result is saved and returned. In this case, a list of Citation objects is returned. You can also callR.citations
to get the same effect, as each known field tag has a longer name (currently there are 61 field tags). These names are meant to make accessing tags more readable and mapping from tag to name can be found in the tagToFull dict. If a tag is known (in tagToFull) but not in the raw dataNone
is returned instead. Most tags when cleaned return a string or list of strings, the exact results can be found in the help for the particular function.The attribute
authors
is also defined as a convenience and returns the same as ‘AF’ or if that is not found ‘AU’.__Init__¶
Records are generally created as collections in Recordcollections, and not as individual objects. If you wish to create one on its own it is possible, the arguments are as follows.
Parameters¶
inRecord:
files stream, dict, str or itertools.chain
If it is a file stream the file must be open at the location of the first tag in the record, usually ‘PT’, and the file will be read until ‘ER’ is found, which indicates the end of the record in the file.
If a dict is passed the dictionary is used as the database of fields and tags, so each key is considered a WOS tag and each value a list of the lines of the original associated with the tag. This is the same form of dict that recordParser returns.
For a string the input must be the raw textual data of a single record in the WOS style, like the file stream it must start at the first tag and end in
'ER'
.itertools.chain is treated identically to a file stream and is used by RecordCollections.
sFile :
optional [str]
Is the name of the file the raw data was in, by default it is blank. It is mostly used to make error messages more informative.sLine :
optional [int]
Is the line the record starts on in the raw data file. It is mostly used to make error messages more informative.-
UT
¶ Returns the UT tag (WOS number) of the record
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
-
copy
()¶ Correctly copies the
Record
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
-
id
¶
-
items
(raw=False)¶ Like
items
for dicts but with araw
option
-
keys
() → a set-like object providing a view on D's keys¶
-
sourceFile
¶
-
sourceLine
¶
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
title
¶
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
wosString
¶ Returns the WOS number (UT tag) of the record
-
writeRecord
(infile)¶ Writes to infile the original contents of the Record. This is intended for use by RecordCollections to write to file. What is written to infile is bit for bit identical to the original record file (if utf-8 is used). No newline is inserted above the write but the last character is a newline.
-
-
metaknowledge.WOS.recordWOS.
recordParser
(paper)¶ This is function that is used to create Records from files.
recordParser() reads the file paper until it reaches ‘ER’. For each field tag it adds an entry to the returned dict with the tag as the key and a list of the entries as the value, the list has each line separately, so for the following two lines in a record:
AF BREVIK, I ANICIN, B
The entry in the returned dict would be
{'AF' : ["BREVIK, I", "ANICIN, B"]}
Record
objects can be created with these dictionaries as the initializer.Parameters¶
paper :
file stream
An open file, with the current line at the beginning of the WOS record.Returns¶
OrderedDict[str : List[str]]
A dictionary mapping WOS tags to lists, the lists are of strings, each string is a line of the record associated with the tag.