metaknowledge¶
A Python3 package for doing computational research on knowledge
metaknowledge is a Python3 package for doing computational research in bibliometrics, scientometrics, and network analysis. It can also be easily used to simplify the process of doing systematic reviews in any disciplinary context.
metaknowledge reads a directory of plain text files containing meta-data on publications and citations, and writes to a variety of data structures that are suitable for longitudinal research, computational text analysis (e.g. topic models and burst analysis), Reference Publication Year Spectroscopy (RPYS), and network analysis (including multi-modal, multi-level, and dynamic). It handles large datasets (e.g. several million records) efficiently.
metaknowledge currently handles data from the Web of Science, PubMed, Scopus, Proquest Dissertations & Theses, and administrative data from the National Science Foundation and the Canadian tri-council granting agencies: SSHRC, CIHR, and NSERC.
Datasets created with metaknowledge can be analyzed using NetworkX and the standard libraries for data analysis in Python. It is also easy to write data to csv
or graphml
files for analysis and visualization in R, Stata, Visone, Gephi, or any other tools for data analysis.
metaknowledge also has a simple command line tool for extracting quantitative datasets and network files from Web of Science files. This makes the library more accessible to researchers who do not know Python, and makes it easier to quickly explore new datasets.
Contact¶
Citation¶
If you are using metaknowledge for research that will be published or publicly distributed, please acknowledge us with the following citation:
Reid McIlroy-Young, John McLevey, and Jillian Anderson. 2015. metaknowledge: open source software for social networks, bibliometrics, and sociology of knowledge research. URL: http://www.networkslab.org/metaknowledge.
License¶
metaknowledge is free and open source software, distributed under the GPL License.
Installation¶
Note: For a more recent guide to getting started, please visit the NetLab blog.
metaknowledge has two distributions. The simplest is found under the release branch of the git repo, which can be installed the usual way with pip:
pip3 install metaknowledge
The second version is at the master branch on Github. It comes with extra documents and resources for teaching.
The download from Github includes a customized Vagrant file that installs metaknowledge and other useful Python libraries into a virtual machine. It is the easiest way of getting metaknowledge working if you are not familiar with Python.
Install with Vagrant¶
The Vagrant method is intended for students and anyone not familiar with Python. It creates a virtual machine with metaknowledge installed, as well as the Python scientific stack numpy, scipy, and matplotlib, as well as a series of iPython notebooks for teaching metaknowledge and Python. Some notebooks are more complete than others.
The instructions for those familiar with the command line use the advanced instructions. Otherwise, it is probably best to use the student install.
Student Install¶
First, you need to install Vagrant and VirtualBox. You need to do this before you can install metaknowledge.
Once Vagrant and VirtualBox are installed, download metaknowledge. Unzip the file. If you are unable to unzip the file, download 7-zip.
Open the directory metaknowledge and go to the vagrant subdirectory. Depending on your operating system, double click either: win_run
, mac_run
, or linux_run
.
A window should pop up and say something like:
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'ubuntu/trusty64' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
You will also see an estimate of how long the download and installation process will take (typically 20 minutes). All you have to do it wait for it to finish. When it is done, a browser window will appear at the showing the notebooks. If a browser window opens and it is showing No data received
, hit refresh a couple times.
When you see a page with the following, you have installed everything successfully:
Lesson-1-Getting-Started
Lesson-2-Reading-Files
Lesson-3-Objects
...
To open the page again, just double click on which ever of win_run
, mac_run
, or linux_run
you used. It should take less than a minute the second time.
Advanced Instructions¶
- Install Vagrant and VirtualBox.
- Clone the git repo <https://github.com/networks-lab/metaknowledge.git>.
- Make sure you are on the master branch.
- Go to the vagrant directory.
- Run
vagrant up
- Once vagrant has finished go to http://localhost:1159/
What you are doing by running vagrant up
is creating an Ubuntu VM and provisioning it with the script bootstrap
, which is also in the vagrant directory. If you run vagrant up
again it only starts the VM. To access the VM’s notebook once it is created:
- Go to the vagrant directory.
- Run
vagrant up
- Once vagrant has finished go to http://localhost:1159/
You can also use vagrant ssh
to ssh into the VM or vagrant provision
to rerun bootstrap. If vagrant ssh
does not work on your machine, you should be able to ssh into it at:
HostName: 127.0.0.1
Port: 2222
Username: vagrant
Password: vagrant
On Windows PuTTY has been tested and works well.</p>
Install without Vagrant¶
Installing without Vagrant is done with setuptools. Go to the metaknowledge directory and run python3 setup.py install
. This is the same version that is installed via pip
plus some extra development command line tools.
Extending MK¶
Coming soon
Questions?¶
If you find bugs, or have questions, please write to:
Documentation¶
Basic Example¶
metaknoweldge is a Python3 package that simplifies bibliometric and computational analysis of Web of Science data.
- To load the data from files and make a network: ::
>>> import metaknowledge as mk >>> RC = mk.RecordCollection("records/") >>> print(RC) Collection of 33 records >>> G = RC.coCiteNetwork(nodeType = 'journal') Done making a co-citation network of files-from-records 1.1s >>> print(len(G.nodes())) 223 >>> mk.writeGraph(G, "Cocitation-Network-of-Journals")
There is also a simple command line program called metaknowledge
that comes with the package. It allows for creating networks without any need to know Python. More information about it can be found here.
Overview¶
This package can read the files downloaded from the Thomson Reuters’ Web of Science (WOS), Elsevier’s Scopus, ProQuest and Medline files from PubMed. These files contain entries on the metadata of scientific records, such as authors, title, and citations. metaknowledge can also read grants from various organizations including NSF and NSERC which are handled similarly to records.
The metaknowledge.RecordCollection class can take a path to one or more of these files load and parse them. The object is the main way for work to be done on multiple records. For each individual record it creates an instance of the metaknowledge.Record class that contains the results of the parsing of the record.
The files read by metaknowledge are a databases containing a series of tags (implicitly or explicitly), e.g. 'TI'
is the title for WOS. Each tag has one or more values and metaknowledge can read them and extract useful information. As the tags differ between providers a small set of values can be accessed by special tags, the tags are listed in commonRecordFields
. These special tags can act on the whole Record
and as such may contain information provided by any number of other tags.
Citations are handled by a special Citation class. This class can parse the citations given by WOS and journals cited by Scopus and allows for better comparisons when they are used in graphs.
Note for those reading the docstrings metaknowledge’s docs are written in markdown and are processed to produce the documentation found at metaknowledge.readthedocs.io, but you should have no problem reading them from the help function.
Modules¶
contour¶
Overview¶
This is the only module that depends on anything besides networkx, it depends on numpy, scipy and matplotlib.
Functions¶
-
metaknowledge.contour.plotting.
graphDensityContourPlot
(G, iters=50, layout=None, layoutScaleFactor=1, overlay=False, nodeSize=10, axisSamples=100, blurringFactor=0.1, contours=15, graphType='coloured')¶ Creates a 3D plot giving the density of nodes on a 2D plane, as a surface in 3D.
Most of the options are for tweaking the final appearance. layout and layoutScaleFactor allow a pre-layout graph to be provided. If a layout is not provided the networkx.spring_layout() is used after iters iterations. Then, once the graph has been laid out a grid of axisSamples cells by axisSamples cells is overlaid and the number of nodes in each cell is determined, a gaussian blur is then applied with a sigma of blurringFactor. This then forms a surface in 3 dimensions, which is then plotted.
If you find the resultant image looks too banded raise the the contours number to ~50.
Parameters¶ G :
networkx Graph
The graph to be plottediters :
optional [int]
Default50
, the number of iterations for the spring layout if layout is not provided.layout :
optional [networkx layout dictionary]
DefaultNone
, if provided will be used as a layout of the graph, the maximum distance from the origin along any axis must also given as layoutScaleFactor, which is by default1
.layoutScaleFactor :
optional [double]
Default1
, The maximum distance from the origin allowed along any axis given by layout, i.e. the layout must fit in a square centered at the origin with side lengths 2 * layoutScaleFactoroverlay :
optional [bool]
DefaultFalse
, ifTrue
the 2D graph will be plotted on the X-Y plane at Z = 0.nodeSize :
optional [double]
Default10
, the size of the nodes dawn in the overlayaxisSamples :
optional [int]
Default 100, the number of cells used along each axis for sampling. A larger number will mean a lower average density.blurringFactor :
optional [double]
Default0.1
, the sigma value used for smoothing the surface density. The higher this number the smoother the surface.contours :
optional [int]
Default 15, the number of different heights drawn. If this number is low the resultant image will look very banded. It is recommended this be raised above50
if you want your images to look good, Warning this will make them much slower to generate and interact with.graphType :
optional [str]
Default'coloured'
, if'coloured'
the image will have a destiny based colourization applied, the only other option is'solid'
which removes the colourization.
-
metaknowledge.contour.plotting.
quickVisual
(G, showLabel=False)¶ Just makes a simple matplotlib figure and displays it, with each node coloured by its type. You can add labels with showLabel. This looks a bit nicer than the one provided my networkx’s defaults.
Parameters¶ showLabel :
optional [bool]
DefaultFalse
, ifTrue
labels will be added to the nodes giving their IDs.
grants¶
Overview¶
baseGrant¶
-
class
metaknowledge.grants.baseGrant.
FallbackGrant
(original, grantdDict, sFile='', sLine=0) A subclass of Grant, it has the same attributes and is returned from the fall back constructor for grants.
-
class
metaknowledge.grants.baseGrant.
Grant
(original, grantdDict, idValue, bad, error, sFile='', sLine=0) -
getInstitutions
(tags=None, seperator=';', _getTag=False) Returns a list of the names of institutions. This is done by looking (in order) for any of fields in tags and splitting the strings on seperator (in case of multiple institutions). If no strings are found an empty list will be returned.
Note for some Grants
getInstitutions
has been overwritten and will ignore the arguments and simply provide the investigators.Parameters¶ tags :
optional list[str]
A list of the tags to look for institutions inseperator :
optional str
The string that separators each institutions name within the column
-
getInvestigators
(tags=None, seperator=';', _getTag=False) Returns a list of the names of investigators. This is done by looking (in order) for any of fields in tags and splitting the strings on seperator. If no strings are found an empty list will be returned.
Note for some Grants
getInvestigators
has been overwritten and will ignore the arguments and simply provide the investigators.Parameters¶ tags :
optional list[str]
A list of the tags to look for investigators inseperator :
optional str
The string that separators each investigators name within the column
-
update
(other) Adds all the tag-entry pairs from other to the
Grant
. If there is a conflict other takes precedence.
-
-
metaknowledge.grants.baseGrant.
csvAndLinesReader
(enumeratedFile, *csvArgs, **csvKwargs)
-
metaknowledge.grants.baseGrant.
isFallbackGrantFile
(fileName, useFileName=True, encoding='latin-1', dialect='excel')
-
metaknowledge.grants.baseGrant.
parserFallbackGrantFile
(fileName, encoding='latin-1', dialect='excel')
cihrGrant¶
-
class
metaknowledge.grants.cihrGrant.
CIHRGrant
(original, grantdDict, sFile, sLine)
-
metaknowledge.grants.cihrGrant.
isCIHRfile
(fileName, useFileName=True)
-
metaknowledge.grants.cihrGrant.
parserCIHRfile
(fileName)
medlineGrant¶
-
class
metaknowledge.grants.medlineGrant.
MedlineGrant
(grantString)
nsercGrant¶
-
class
metaknowledge.grants.nsercGrant.
NSERCGrant
(original, grantdDict, sFile, sLine) -
getInstitutions
(tags=None, seperator=';', _getTag=False) Returns a list with the names of the institution. The optional arguments are ignored
-
getInvestigators
(tags=None, seperator=';', _getTag=False) Returns a list of the names of investigators. The optional arguments are ignored.
-
update
(other) Adds all the tag-entry pairs from other to the
Grant
. If there is a conflict other takes precedence.
-
-
metaknowledge.grants.nsercGrant.
isNSERCfile
(fileName, useFileName=True)
-
metaknowledge.grants.nsercGrant.
parserNSERCfile
(fileName)
nsfGrant¶
-
class
metaknowledge.grants.nsfGrant.
NSFGrant
(grantdDict, sFile) -
getInstitutions
(tags=None, seperator=';', _getTag=False) Returns a list with the names of the institution. The optional arguments are ignored
-
getInvestigators
(tags=None, seperator=';', _getTag=False) Returns a list of the names of investigators. The optional arguments are ignored.
-
-
metaknowledge.grants.nsfGrant.
isNSFfile
(fileName, useFileName=True)
-
metaknowledge.grants.nsfGrant.
parserNSFfile
(fileName)
scopusGrant¶
-
class
metaknowledge.grants.scopusGrant.
ScopusGrant
(grantString)
journalAbbreviations¶
Overview¶
This module handles the abbreviations, known as J29 abbreviations and given by the J9 tag in WOS Records and for journal titles that WOS employs in citations.
The citations provided by WOS used abbreviated journal titles instead of
the full names. The full list of abbreviations can be found at a series
pages divided by letter starting at
images.webofknowledge.com/WOK46/help/WOS/A_abrvjt.html.
The function
updatej9DB()
is used to scape and parse the pages, it must be run without error
before the other features can be used. metaknowledge. If the database
is requested by getj9dict()
, which is what
Citations
use, and the database is not found or is corrupted then
updatej9DB()
will be run to download the database if this fails an mkException
will be raised, the download and parsing usually takes less than a
second on a good internet connection.
The other functions of the module are for manually adding and removing
abbreviations from the database. It is recommended that this be done
with the command-line tool metaknowledge
instead of with a script.
Functions¶
-
metaknowledge.journalAbbreviations.backend.
addToDB
(abbr=None, dbname='manualj9Abbreviations')¶ Adds abbr to the database of journals. The database is kept separate from the one scraped from WOS, this supersedes it. The database by default is stored with the WOS one and the name is given by
metaknowledge.journalAbbreviations.manualDBname
. To create an empty database run addToDB without an abbr argument.Parameters¶ abbr :
optional [str or dict[str : str]]
The journal abbreviation to be added to the database, it can either be a single string in which case that string will be added with its self as the full name, or a dict can be given with the abbreviations as keys and their names as strings, use pipes ('|'
) to separate multiple names. Note, if the empty string is given as a name the abbreviation will be considered manually excluded, i.e. having excludeFromDB() run on it.dbname :
optional [str]
The name of the database file, default ismetaknowledge.journalAbbreviations.manualDBname
.
-
metaknowledge.journalAbbreviations.backend.
excludeFromDB
(abbr=None, dbname='manualj9Abbreviations')¶ Marks abbr to be excluded the database of journals. The database is kept separate from the one scraped from WOS, this supersedes it. The database by default is stored with the WOS one and the name is given by
metaknowledge.journalAbbreviations.manualDBname
. To create an empty database run addToDB() without an abbr argument.Parameters¶ abbr :
optional [str or tuple[str] or list[str]
The journal abbreviation to be excluded from the database, it can either be a single string in which case that string will be exclude or a list/tuple of strings can be given with the abbreviations.dbname :
optional [str]
The name of the database file, default ismetaknowledge.journalAbbreviations.manualDBname
.
-
metaknowledge.journalAbbreviations.backend.
getj9dict
(dbname='j9Abbreviations', manualDB='manualj9Abbreviations', returnDict='both')¶ Returns the dictionary of journal abbreviations mapping to a list of the associated journal names. By default the local database is used. The database is in the file dbname in the same directory as this source file
Parameters¶ dbname :
optional [str]
The name of the downloaded database file, the default is determined at run time. It is recommended that this remain untouched.manualDB :
optional [str]
The name of the manually created database file, the default is determined at run time. It is recommended that this remain untouched.returnDict :
optional [str]
default'both'
, can be used to get both databases or only one with'WOS'
or'manual'
.
-
metaknowledge.journalAbbreviations.backend.
j9urlGenerator
(nameDict=False)¶ How to get all the urls for the WOS Journal Title Abbreviations. Each is varies by only a few characters. These are the currently in use urls they may change.
They are of the form:
Where {VAL} is a capital letter or the string “0-9”
-
metaknowledge.journalAbbreviations.backend.
updatej9DB
(dbname='j9Abbreviations', saveRawHTML=False)¶ Updates the database of Journal Title Abbreviations. Requires an internet connection. The data base is saved relative to the source file not the working directory.
Parameters¶ dbname :
optional [str]
The name of the database file, default is “j9Abbreviations.db”saveRawHTML :
optional [bool]
Determines if the original HTML of the pages is stored, defaultFalse
. IfTrue
they are saved in a directory inside j9Raws begining with todays date.
medline¶
Overview¶
These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.
Functions¶
-
metaknowledge.medline.medlineHandlers.
isMedlineFile
(infile, checkedLines=2)¶ Determines if infile is the path to a Medline file. A file is considerd to be a Medline file if it has the correct encoding (
latin-1
) and within the first checkedLines a line starts with"PMID- "
.Parameters¶ infile :
str
The path to the targets filecheckedLines :
optional [int]
default 2, the number of lines to check for the header
-
metaknowledge.medline.medlineHandlers.
medlineParser
(pubFile)¶ Parses a medline file, pubFile, to extract the individual entries as MedlineRecords.
A medline file is a series of entries, each entry is a series of tags. A tag is a 2 to 4 character string each tag is padded with spaces on the left to make it 4 characters which is followed by a dash and a space (
'- '
). Everything after the tag and on all lines after it not starting with a tag is considered associated with the tag. Each entry’s first tag isPMID
, so a first line looks something likePMID- 26524502
. Entries end with a single blank line.
Special Functions¶
-
metaknowledge.medline.tagProcessing.specialFunctions.
DOI
(R)¶
-
metaknowledge.medline.tagProcessing.specialFunctions.
address
(R)¶ Gets the first address of the first author
-
metaknowledge.medline.tagProcessing.specialFunctions.
beginningPage
(R)¶ As pages may not be given as numbers this is the most accurate this function can be
-
metaknowledge.medline.tagProcessing.specialFunctions.
month
(R)¶
-
metaknowledge.medline.tagProcessing.specialFunctions.
volume
(R)¶ Returns the first number/word of the volume field, hopefully trimming something like:
'49 Suppl 20'
to49
-
metaknowledge.medline.tagProcessing.specialFunctions.
year
(R)¶
Tag Functions¶
-
metaknowledge.medline.tagProcessing.tagFunctions.
AB
(val)¶ - Abstractbasically a one liner after parsing
-
metaknowledge.medline.tagProcessing.tagFunctions.
AD
(val)¶ - AffiliationUndoing what the parser does then splitting at the semicolons and dropping newlines extra fitlering is required beacuse some AD’s end with a semicolon
-
metaknowledge.medline.tagProcessing.tagFunctions.
AID
(val)¶ - ArticleIdentifierThe given values do not require any work
-
metaknowledge.medline.tagProcessing.tagFunctions.
AU
(val)¶ Author
-
metaknowledge.medline.tagProcessing.tagFunctions.
AUID
(val)¶ - AuthorIdentifierone line only just need to undo the parser’s effects
-
metaknowledge.medline.tagProcessing.tagFunctions.
BTI
(val)¶ BookTitle
-
metaknowledge.medline.tagProcessing.tagFunctions.
CI
(val)¶ CopyrightInformation
-
metaknowledge.medline.tagProcessing.tagFunctions.
CIN
(val)¶ CommentIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
CN
(val)¶ CorporateAuthor
-
metaknowledge.medline.tagProcessing.tagFunctions.
CRDT
(val)¶ CreateDate
-
metaknowledge.medline.tagProcessing.tagFunctions.
CRF
(val)¶ CorrectedRepublishedFrom
-
metaknowledge.medline.tagProcessing.tagFunctions.
CRI
(val)¶ CorrectedRepublishedIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
CTI
(val)¶ CollectionTitle
-
metaknowledge.medline.tagProcessing.tagFunctions.
DA
(val)¶ DateCreated
-
metaknowledge.medline.tagProcessing.tagFunctions.
DCOM
(val)¶ DateCompleted
-
metaknowledge.medline.tagProcessing.tagFunctions.
DDIN
(val)¶ DatasetIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
DEP
(val)¶ DateElectronicPublication
-
metaknowledge.medline.tagProcessing.tagFunctions.
DP
(val)¶ DatePublication
-
metaknowledge.medline.tagProcessing.tagFunctions.
DRIN
(val)¶ DatasetUseReportedIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
EDAT
(val)¶ EntrezDate
-
metaknowledge.medline.tagProcessing.tagFunctions.
EFR
(val)¶ ErratumFor
-
metaknowledge.medline.tagProcessing.tagFunctions.
EIN
(val)¶ ErratumIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
EN
(val)¶ Edition
-
metaknowledge.medline.tagProcessing.tagFunctions.
FAU
(val)¶ FullAuthor
-
metaknowledge.medline.tagProcessing.tagFunctions.
FED
(val)¶ Editor
-
metaknowledge.medline.tagProcessing.tagFunctions.
FIR
(val)¶ InvestigatorFull
-
metaknowledge.medline.tagProcessing.tagFunctions.
FPS
(val)¶ FullPersonalNameSubject
-
metaknowledge.medline.tagProcessing.tagFunctions.
GN
(val)¶ GeneralNote
-
metaknowledge.medline.tagProcessing.tagFunctions.
GR
(val)¶ GrantNumber
-
metaknowledge.medline.tagProcessing.tagFunctions.
GS
(val)¶ GeneSymbol
-
metaknowledge.medline.tagProcessing.tagFunctions.
IP
(val)¶ Issue
-
metaknowledge.medline.tagProcessing.tagFunctions.
IR
(val)¶ Investigator
-
metaknowledge.medline.tagProcessing.tagFunctions.
IRAD
(val)¶ InvestigatorAffiliation
-
metaknowledge.medline.tagProcessing.tagFunctions.
IS
(val)¶ ISSN
-
metaknowledge.medline.tagProcessing.tagFunctions.
ISBN
(val)¶
-
metaknowledge.medline.tagProcessing.tagFunctions.
JID
(val)¶ NLMID
-
metaknowledge.medline.tagProcessing.tagFunctions.
JT
(val)¶ - JournalTitleOne line only
-
metaknowledge.medline.tagProcessing.tagFunctions.
LA
(val)¶ Language
-
metaknowledge.medline.tagProcessing.tagFunctions.
LID
(val)¶ LocationIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
LR
(val)¶ DateLastRevised
-
metaknowledge.medline.tagProcessing.tagFunctions.
MH
(val)¶ MeSHTerms
-
metaknowledge.medline.tagProcessing.tagFunctions.
MHDA
(val)¶ MeSHDate
-
metaknowledge.medline.tagProcessing.tagFunctions.
MID
(val)¶ ManuscriptIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
NM
(val)¶ SubstanceName
-
metaknowledge.medline.tagProcessing.tagFunctions.
OABL
(val)¶ OtherAbstract
-
metaknowledge.medline.tagProcessing.tagFunctions.
OCI
(val)¶ OtherCopyright
-
metaknowledge.medline.tagProcessing.tagFunctions.
OID
(val)¶ OtherID
-
metaknowledge.medline.tagProcessing.tagFunctions.
ORI
(val)¶ OriginalReportIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
OT
(val)¶ - OtherTermNothing needs to be done
-
metaknowledge.medline.tagProcessing.tagFunctions.
OTO
(val)¶ - OtherTermOwnerone line field
-
metaknowledge.medline.tagProcessing.tagFunctions.
OWN
(val)¶ Owner
-
metaknowledge.medline.tagProcessing.tagFunctions.
PG
(val)¶ - Paginationall pagination seen so far seems to be only on one line
-
metaknowledge.medline.tagProcessing.tagFunctions.
PHST
(val)¶ PublicationHistoryStatus
-
metaknowledge.medline.tagProcessing.tagFunctions.
PL
(val)¶ PlacePublication
-
metaknowledge.medline.tagProcessing.tagFunctions.
PMC
(val)¶ PubMedCentralIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
PMCR
(val)¶ PubMedCentralRelease
-
metaknowledge.medline.tagProcessing.tagFunctions.
PMID
(val)¶ PubMedUniqueIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
PRIN
(val)¶ PartialRetractionIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
PROF
(val)¶ PartialRetractionOf
-
metaknowledge.medline.tagProcessing.tagFunctions.
PS
(val)¶ PersonalNameSubject
-
metaknowledge.medline.tagProcessing.tagFunctions.
PST
(val)¶ PublicationStatus
-
metaknowledge.medline.tagProcessing.tagFunctions.
PT
(val)¶ PublicationType
-
metaknowledge.medline.tagProcessing.tagFunctions.
PUBM
(val)¶ PublishingModel
-
metaknowledge.medline.tagProcessing.tagFunctions.
RF
(val)¶ NumberReferences
-
metaknowledge.medline.tagProcessing.tagFunctions.
RIN
(val)¶ RetractionIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
RN
(val)¶ RegistryNumber
-
metaknowledge.medline.tagProcessing.tagFunctions.
ROF
(val)¶ RetractionOf
-
metaknowledge.medline.tagProcessing.tagFunctions.
RPF
(val)¶ RepublishedFrom
-
metaknowledge.medline.tagProcessing.tagFunctions.
RPI
(val)¶ RepublishedIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
SB
(val)¶ Subset
-
metaknowledge.medline.tagProcessing.tagFunctions.
SFM
(val)¶ SpaceFlightMission
-
metaknowledge.medline.tagProcessing.tagFunctions.
SI
(val)¶ SecondarySourceID
-
metaknowledge.medline.tagProcessing.tagFunctions.
SO
(val)¶ Source
-
metaknowledge.medline.tagProcessing.tagFunctions.
SPIN
(val)¶ SummaryForPatients
-
metaknowledge.medline.tagProcessing.tagFunctions.
STAT
(val)¶ Status
-
metaknowledge.medline.tagProcessing.tagFunctions.
TA
(val)¶ - JournalTitleAbbreviationOne line only
-
metaknowledge.medline.tagProcessing.tagFunctions.
TI
(val)¶ - Titleonly one per record
-
metaknowledge.medline.tagProcessing.tagFunctions.
TT
(val)¶ TransliteratedTitle
-
metaknowledge.medline.tagProcessing.tagFunctions.
UIN
(val)¶ UpdateIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
UOF
(val)¶ UpdateOf
-
metaknowledge.medline.tagProcessing.tagFunctions.
VI
(val)¶ - VolumeThe volumes as a string as volume is single line
-
metaknowledge.medline.tagProcessing.tagFunctions.
VTI
(val)¶ VolumeTitle
Backend¶
-
class
metaknowledge.medline.recordMedline.
MedlineRecord
(inRecord, sFile='', sLine=0)¶ Bases:
metaknowledge.mkRecord.ExtendedRecord
Class for full Medline(Pubmed) entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use medlineParser() on a medline file.
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.Parameters¶ countsOnly :
optional bool
DefaultFalse
, ifTrue
the counts (lengths of the lists) will be given instead of the lists of namesfractionsMode :
optional bool
DefaultFalse
, ifTrue
the fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
Parameters¶ maxLength :
optional [int]
default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the'#'
character) to allow for longer stringsWOSMode :
optional [bool]
defaultFalse
, ifTrue
the data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.restrictedOutput :
optional [bool]
defaultFalse
, ifTrue
the tags output will be limited to tose found inmetaknowledge.commonRecordFields
niceID :
optional [bool]
defaultTrue
, ifTrue
the ID used will be derived from the authors, publishing date and title, ifFalse
it will be the UT tag
-
copy
()¶ Correctly copies the
Record
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.Parameters¶ multiCite :
optional [bool]
DefaultFalse
, ifTrue
a tuple of Citations is returned with each having a different one of the records authors as the author
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.Parameters¶ tag :
str
The requested tagdefault :
optional [Object]
DefaultNone
, the object returned when tag is not foundraw :
optional [bool]
DefaultFalse
, ifTrue
the unprocessed value of tag is returned
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
Parameters¶ field :
optional str
DefaultNone
, if given all citations missing the named field will be dropped.values :
optional str or list[str]
Default
None
, if field is also given only those citations with one of the strings given in values will be included.e.g. to get only citations from 1990 or 1991:
field = year, values = [1991, 1990]
pandasFriendly :
optional bool
DefaultTrue
, ifFalse
a list of the citations will be returned instead of the more complicated pandas dict
-
id
¶
-
items
(raw=False)¶ Like
items
for dicts but with araw
optionParameters¶ raw :
optional [bool]
DefaultFalse
, ifTrue
theKeysView
contains the raw values as the values
-
keys
() → a set-like object providing a view on D's keys¶
-
sourceFile
¶
-
sourceLine
¶
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶ key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶ The processed value of key
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.Parameters¶ tags :
list[str]
The list of tags requestedraw :
optional [bool]
defaultFalse
ifTrue
the retuned values of the dict will be unprocessed
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
title
¶
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
writeRecord
(f)¶ This is nearly identical to the original the FAU tag is the only tag not writen in the same place, doing so would require changing the parser and lots of extra logic.
-
-
metaknowledge.medline.recordMedline.
medlineRecordParser
(record)¶ The parser
`MedlineRecord
<../classes/MedlineRecord.html#metaknowledge.medline.MedlineRecord>`__ use. This takes an entry from medlineParser() and parses it a part of the creation of aMedlineRecord
.
proquest¶
Overview¶
These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.
Functions¶
-
metaknowledge.proquest.proQuestHandlers.
isProQuestFile
(infile, checkedLines=2)¶ Determines if infile is the path to a ProQuest file. A file is considered to be a Proquest file if it has the correct encoding (
utf-8
) and within the first checkedLines the following starts.____________________________________________________________ Report Information from ProQuest
Parameters¶ infile :
str
The path to the targets filecheckedLines :
optional [int]
default 2, the number of lines to check for the header
-
metaknowledge.proquest.proQuestHandlers.
proQuestParser
(proFile)¶ Parses a ProQuest file, proFile, to extract the individual entries.
A ProQuest file has three sections, first a list of the contained entries, second the full metadata and finally a bibtex formatted entry for the record. This parser only uses the first two as the bibtex contains no information the second section does not. Also, the first section is only used to verify the second section. The returned ProQuestRecord contains the data from the second section, with the same key strings as ProQuest uses and the unlabeled sections are called in order,
'Name'
,'Author'
and'url'
.
Special Functions¶
Tag Functions¶
-
metaknowledge.proquest.tagProcessing.tagFunctions.
proQuestClassification
(value)¶
-
metaknowledge.proquest.tagProcessing.tagFunctions.
proQuestIdentifier_Keyword
(value)¶
-
metaknowledge.proquest.tagProcessing.tagFunctions.
proQuestSubject
(value)¶
-
metaknowledge.proquest.tagProcessing.tagFunctions.
proQuestTagToFunc
(tag)¶ Takes a tag string, tag, and returns the processing function for its data. If their is not a predefined function returns the identity function (
lambda x : x
).
Backend¶
-
class
metaknowledge.proquest.recordProQuest.
ProQuestRecord
(inRecord, recNum=None, sFile='', sLine=0)¶ Bases:
metaknowledge.mkRecord.ExtendedRecord
Class for full ProQuest entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use proQuestParser() on a ProQuest file.
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.Parameters¶ countsOnly :
optional bool
DefaultFalse
, ifTrue
the counts (lengths of the lists) will be given instead of the lists of namesfractionsMode :
optional bool
DefaultFalse
, ifTrue
the fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
Parameters¶ maxLength :
optional [int]
default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the'#'
character) to allow for longer stringsWOSMode :
optional [bool]
defaultFalse
, ifTrue
the data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.restrictedOutput :
optional [bool]
defaultFalse
, ifTrue
the tags output will be limited to tose found inmetaknowledge.commonRecordFields
niceID :
optional [bool]
defaultTrue
, ifTrue
the ID used will be derived from the authors, publishing date and title, ifFalse
it will be the UT tag
-
copy
()¶ Correctly copies the
Record
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.Parameters¶ multiCite :
optional [bool]
DefaultFalse
, ifTrue
a tuple of Citations is returned with each having a different one of the records authors as the author
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.Parameters¶ tag :
str
The requested tagdefault :
optional [Object]
DefaultNone
, the object returned when tag is not foundraw :
optional [bool]
DefaultFalse
, ifTrue
the unprocessed value of tag is returned
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
Parameters¶ field :
optional str
DefaultNone
, if given all citations missing the named field will be dropped.values :
optional str or list[str]
Default
None
, if field is also given only those citations with one of the strings given in values will be included.e.g. to get only citations from 1990 or 1991:
field = year, values = [1991, 1990]
pandasFriendly :
optional bool
DefaultTrue
, ifFalse
a list of the citations will be returned instead of the more complicated pandas dict
-
id
¶
-
items
(raw=False)¶ Like
items
for dicts but with araw
optionParameters¶ raw :
optional [bool]
DefaultFalse
, ifTrue
theKeysView
contains the raw values as the values
-
keys
() → a set-like object providing a view on D's keys¶
-
sourceFile
¶
-
sourceLine
¶
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶ key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶ The processed value of key
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.Parameters¶ tags :
list[str]
The list of tags requestedraw :
optional [bool]
defaultFalse
ifTrue
the retuned values of the dict will be unprocessed
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
title
¶
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
writeRecord
(infile)¶ An
abstractmethod
, writes the record in its original form to infile
-
-
metaknowledge.proquest.recordProQuest.
proQuestRecordParser
(enRecordFile, recNum)¶ The parser ProQuestRecords use. This takes an entry from proQuestParser() and parses it a part of the creation of a
ProQuestRecord
.Parameters¶ enRecordFile :
enumerate object
a file wrapped byenumerate()
recNum :
int
The number given to the entry in the first section of the ProQuest file
scopus¶
Overview¶
Functions¶
-
metaknowledge.scopus.scopusHandlers.
isScopusFile
(infile, checkedLines=2, maxHeaderDiff=3)¶ Determines if infile is the path to a Scopus csv file. A file is considerd to be a Scopus file if it has the correct encoding (
utf-8
with BOM (Byte Order Mark)) and within the first checkedLines a line contains the complete header, the list of all header entries in order is found in`scopus.scopusHeader
<#metaknowledge.scopus>`__.Note this is for csv files not plain text files from scopus, plain text files are not complete.
Parameters¶ infile :
str
The path to the targets filecheckedLines :
optional [int]
default 2, the number of lines to check for the headermaxHeaderDiff :
optional [int]
default 3, maximum number of different entries in the potetial file from the current known headermetaknowledge.scopus.scopusHeader
, if exceeded anFalse
will be returned
-
metaknowledge.scopus.scopusHandlers.
scopusParser
(scopusFile)¶ Parses a scopus file, scopusFile, to extract the individual lines as ScopusRecords.
A Scopus file is a csv (Comma-separated values) with a complete header, see
`scopus.scopusHeader
<#metaknowledge.scopus>`__ for the entries, and each line after it containing a record’s entry. The string valued entries are quoted with double quotes which means double quotes inside them can cause issues, see scopusRecordParser() for more information.
Special Functions¶
Tag Functions¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
citeValue
(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
commaSpaceSeperated
(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
grantValue
(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
integralValue
(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
semicolonSeperated
(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
semicolonSpaceSeperated
(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.
stringValue
(val)¶
Backend¶
-
class
metaknowledge.scopus.recordScopus.
ScopusRecord
(inRecord, sFile='', sLine=0, header=None)¶ Bases:
metaknowledge.mkRecord.ExtendedRecord
Class for full Scopus entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use scopusParser() on a scopus CSV file.
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.Parameters¶ countsOnly :
optional bool
DefaultFalse
, ifTrue
the counts (lengths of the lists) will be given instead of the lists of namesfractionsMode :
optional bool
DefaultFalse
, ifTrue
the fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
Parameters¶ maxLength :
optional [int]
default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the'#'
character) to allow for longer stringsWOSMode :
optional [bool]
defaultFalse
, ifTrue
the data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.restrictedOutput :
optional [bool]
defaultFalse
, ifTrue
the tags output will be limited to tose found inmetaknowledge.commonRecordFields
niceID :
optional [bool]
defaultTrue
, ifTrue
the ID used will be derived from the authors, publishing date and title, ifFalse
it will be the UT tag
-
copy
()¶ Correctly copies the
Record
-
createCitation
(multiCite=False)¶ Overwriting the general citation creator to deal with scopus weirdness.
Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.Parameters¶ multiCite :
optional [bool]
DefaultFalse
, ifTrue
a tuple of Citations is returned with each having a different one of the records authors as the author
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.Parameters¶ tag :
str
The requested tagdefault :
optional [Object]
DefaultNone
, the object returned when tag is not foundraw :
optional [bool]
DefaultFalse
, ifTrue
the unprocessed value of tag is returned
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
Parameters¶ field :
optional str
DefaultNone
, if given all citations missing the named field will be dropped.values :
optional str or list[str]
Default
None
, if field is also given only those citations with one of the strings given in values will be included.e.g. to get only citations from 1990 or 1991:
field = year, values = [1991, 1990]
pandasFriendly :
optional bool
DefaultTrue
, ifFalse
a list of the citations will be returned instead of the more complicated pandas dict
-
id
¶
-
items
(raw=False)¶ Like
items
for dicts but with araw
optionParameters¶ raw :
optional [bool]
DefaultFalse
, ifTrue
theKeysView
contains the raw values as the values
-
keys
() → a set-like object providing a view on D's keys¶
-
sourceFile
¶
-
sourceLine
¶
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶ key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶ The processed value of key
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.Parameters¶ tags :
list[str]
The list of tags requestedraw :
optional [bool]
defaultFalse
ifTrue
the retuned values of the dict will be unprocessed
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
title
¶
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
writeRecord
(f)¶ An
abstractmethod
, writes the record in its original form to infile
-
-
metaknowledge.scopus.recordScopus.
scopusRecordParser
(record, header=None)¶ The parser ScopusRecords use. This takes a line from scopusParser() and parses it as a part of the creation of a
ScopusRecord
.Note this is for csv files downloaded from scopus not the text records as those are less complete. Also, Scopus uses double quotes (
"
) to quote strings, such as abstracts, in the csv so double quotes in the string must be escaped. For reasons not fully understandable by mortals they choose to use two double quotes in a row (""
) to represent an escaped double quote. This parser does not unescape these quotes, but it does correctly handle their interacts with the outer double quotes.
WOS¶
Overview¶
These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.
Functions¶
-
metaknowledge.WOS.wosHandlers.
isWOSFile
(infile, checkedLines=3)¶ Determines if infile is the path to a WOS file. A file is considerd to be a WOS file if it has the correct encoding (
utf-8
with a BOM) and within the first checkedLines a line starts with"VR 1.0"
.Parameters¶ infile :
str
The path to the targets filecheckedLines :
optional [int]
default 2, the number of lines to check for the header
-
metaknowledge.WOS.wosHandlers.
wosParser
(isifile)¶ This is a function that is used to create RecordCollections from files.
wosParser() reads the file given by the path isifile, checks that the header is correct then reads until it reaches EF. All WOS records it encounters are parsed with recordParser() and converted into Records. A list of these
Records
is returned.BadWOSFile
is raised if an issue is found with the file.
Help Functions¶
-
metaknowledge.WOS.tagProcessing.helpFuncs.
getMonth
(s)¶ - Known formats:Month (“%b”)Month Day (“%b %d”)Month-Month (“%b-%b”) — this gets coerced to the first %b, dropping the month rangeSeason (“%s”) — this gets coerced to use the first month of the given seasonMonth Day Year (“%b %d %Y”)Month Year (“%b %Y”)Year Month Day (“%Y %m %d”)
-
metaknowledge.WOS.tagProcessing.helpFuncs.
makeBiDirectional
(d)¶ - Helper for generating tagNameConverterMakes dict that maps from key to value and back
-
metaknowledge.WOS.tagProcessing.helpFuncs.
reverseDict
(d)¶ - Helper for generating fullToTagMakes dict of value to key
Tag Functions¶
-
metaknowledge.WOS.tagProcessing.tagFunctions.
DOI
(val)¶ The DI Tag¶ return the DOI number of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
ISBN
(val)¶ The BN Tag¶ extracts a list of ISBNs associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
ResearcherIDnumber
(val)¶ The RI Tag¶ extracts a list of the research IDs of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
abstract
(val)¶ The AB Tag¶ return abstract of the record, with newlines hopefully in the correct places
-
metaknowledge.WOS.tagProcessing.tagFunctions.
articleNumber
(val)¶ The AR Tag¶ extracts a string giving the article number, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
authAddress
(val)¶ The C1 Tag¶ extracts the address of the authors as given by WOS. Warning the mapping of author to address is not very good and is given in multiple ways.
-
metaknowledge.WOS.tagProcessing.tagFunctions.
authKeywords
(val)¶ The DE Tag¶ extracts the keywords assigned by the author of the Record. The WOS description is:
Author keywords are included in records of articles from 1991 forward. They are also include in conference proceedings records.
The AF Tag¶ extracts a list of authors full names
The AU Tag¶ extracts a list of authors shortened names
-
metaknowledge.WOS.tagProcessing.tagFunctions.
beginningPage
(val)¶ The BP Tag¶ extracts the first page the record occurs on, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
bookAuthor
(val)¶ The BA Tag¶ extracts a list of the short names of the authors of a book Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
bookAuthorFull
(val)¶ The BF Tag¶ extracts a list of the long names of the authors of a book Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
bookDOI
(val)¶ The D2 Tag¶ extracts the book DOI of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
citations
(val)¶ The CR Tag¶ extracts a list of all the citations in the record, the citations are the metaknowledge.Citation class.
-
metaknowledge.WOS.tagProcessing.tagFunctions.
citedRefsCount
(val)¶ The NR Tag¶ extracts the number citations, length of CR list
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confDate
(val)¶ The CY Tag¶ extracts the date string of the conference associated with the Record, the date is not normalized
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confHost
(val)¶ The HO Tag¶ extracts the host of the conference
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confLocation
(val)¶ The CL Tag¶ extracts the sting giving the conference’s location
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confSponsors
(val)¶ The SP Tag¶ extracts a list of sponsors for the conference associated with the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
confTitle
(val)¶ The CT Tag¶ extracts the title of the conference associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
docType
(val)¶ The DT Tag¶ extracts the type of document the Record contains
-
metaknowledge.WOS.tagProcessing.tagFunctions.
documentDeliveryNumber
(val)¶ The GA Tag¶ extracts the document delivery number of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
eISSN
(val)¶ The EI Tag¶ extracts the EISSN of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
editedBy
(val)¶ The BE Tag¶ extracts a list of the editors of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
editors
(val)¶ Needs Work¶ currently not well understood, returns val
-
metaknowledge.WOS.tagProcessing.tagFunctions.
email
(val)¶ The EM Tag¶ extracts a list of emails given by the authors of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
endingPage
(val)¶ The EP Tag¶ return the last page the record occurs on as a string, not aall are intergers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
funding
(val)¶ The FU Tag¶ extracts a list of the groups funding the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
fundingText
(val)¶ The FX Tag¶ extracts a string of the funding thanks
-
metaknowledge.WOS.tagProcessing.tagFunctions.
group
(val)¶ The GP Tag¶ extracts the group associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
groupName
(val)¶ The CA Tag¶ extracts the name of the group associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
isoAbbreviation
(val)¶ The JI Tag¶ extracts the iso abbreviation of the journal
-
metaknowledge.WOS.tagProcessing.tagFunctions.
issue
(val)¶ The IS Tag¶ extracts a string giving the issue or range of issues the Record was in, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
j9
(val)¶ The J9 Tag¶ extracts the J9 (29-Character Source Abbreviation) of the publication
-
metaknowledge.WOS.tagProcessing.tagFunctions.
journal
(val)¶ The SO Tag¶ extracts the full name of the publication and normalizes it to uppercase
-
metaknowledge.WOS.tagProcessing.tagFunctions.
keywords
(val)¶ The ID Tag¶ extracts the WOS keywords of the Record. The WOS description is:
KeyWords Plus are index terms created by Thomson Reuters from significant, frequently occurring words in the titles of an article's cited references.
-
metaknowledge.WOS.tagProcessing.tagFunctions.
language
(val)¶ The LA Tag¶ extracts the languages of the Record as a string with languages separated by ‘, ‘, usually there is only one language
-
metaknowledge.WOS.tagProcessing.tagFunctions.
meetingAbstract
(val)¶ The MA Tag¶ extracts the ID of the meeting abstract prefixed by ‘EPA-‘
-
metaknowledge.WOS.tagProcessing.tagFunctions.
month
(val)¶ The PD Tag¶ extracts the month the record was published in as an int with January as 1, February 2, …
-
metaknowledge.WOS.tagProcessing.tagFunctions.
orcID
(val)¶ The OI Tag¶ extracts a list of orc IDs of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
pageCount
(val)¶ The PG Tag¶ returns an integer giving the number of pages of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
partNumber
(val)¶ The PN Tag¶ return an integer giving the part of the issue the Record is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
pubMedID
(val)¶ The PM Tag¶ extracts the pubmed ID of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
pubType
(val)¶ The PT Tag¶ extracts the type of publication as a character: conference, book, journal, book in series, or patent
-
metaknowledge.WOS.tagProcessing.tagFunctions.
publisher
(val)¶ The PU Tag¶ extracts the publisher of the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
publisherAddress
(val)¶ The PA Tag¶ extracts the publishers address
-
metaknowledge.WOS.tagProcessing.tagFunctions.
publisherCity
(val)¶ The PI Tag¶ extracts the city the publisher is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
reprintAddress
(val)¶ The RP Tag¶ extracts the reprint address string
-
metaknowledge.WOS.tagProcessing.tagFunctions.
seriesSubtitle
(val)¶ The BS Tag¶ extracts the title of the series the Record is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
seriesTitle
(val)¶ The SE Tag¶ extracts the title of the series the Record is in
-
metaknowledge.WOS.tagProcessing.tagFunctions.
specialIssue
(val)¶ The SI Tag¶ extracts the special issue value
-
metaknowledge.WOS.tagProcessing.tagFunctions.
subjectCategory
(val)¶ The SC Tag¶ extracts a list of the subjects associated with the Record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
subjects
(val)¶ The WC Tag¶ extracts a list of subjects as assigned by WOS
-
metaknowledge.WOS.tagProcessing.tagFunctions.
supplement
(val)¶ The SU Tag¶ extracts the supplement number
-
metaknowledge.WOS.tagProcessing.tagFunctions.
title
(val)¶ The TI Tag¶ extracts the title of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
totalTimesCited
(val)¶ The Z9 Tag¶ extracts the total number of citations of the record
-
metaknowledge.WOS.tagProcessing.tagFunctions.
volume
(val)¶ The VL Tag¶ return the volume the record is in as a string, not all are integers
-
metaknowledge.WOS.tagProcessing.tagFunctions.
wosString
(val)¶ The UT Tag¶ extracts the WOS number of the record as a string preceded by “WOS:”
Dict Functions¶
-
metaknowledge.WOS.tagProcessing.funcDicts.
isTagOrName
(val)¶ Checks if val is a tag or full name of tag if so returns
True
-
metaknowledge.WOS.tagProcessing.funcDicts.
normalizeToName
(val)¶ Converts tags or full names to full names, case sensitive
-
metaknowledge.WOS.tagProcessing.funcDicts.
normalizeToTag
(val)¶ Converts tags or full names to 2 character tags, case insensitive
-
metaknowledge.WOS.tagProcessing.funcDicts.
tagToFull
(tag)¶ A wrapper for
tagToFullDict
, it maps 2 character tags to their full names.
Backend¶
This file contains the Record class for metaknowledge and one helper function for parsing WOS records, recordParser. The record class is used to represent a single records meta-data from WOS.
-
class
metaknowledge.WOS.recordWOS.
WOSRecord
(inRecord, sFile='', sLine=0)¶ Bases:
metaknowledge.mkRecord.ExtendedRecord
Class for full WOS records
It is meant to be immutable; many of the methods and attributes are evaluated when first called, not when the object is created, and the results are stored privately.
The record’s meta-data is stored in an ordered dictionary labeled by WOS tags. To access the raw data stored in the original record the tags() method can be used. To access data that has been processed and cleaned the attributes named after the tags are used.
Customizations¶ The
Record
’s hashing and equality testing are based on the WOS number (the tag is ‘UT’, and also called the accession number). They are strings starting with'WOS:'
and followed by 15 or so numbers and letters, although both the length and character set are known to vary. The numbers are unique to each record so are used for comparisons. If a record isbad
all equality checks returnFalse
.When converted to a string the records title is used so for a record
R
,R.TI == R.title == str(R)
and its representation uses the WOS number instead of memory location.Attributes¶ When a record is created if the parsing of the WOS file failed it is marked as
bad
. Thebad
attribute is set to True and theerror
attribute is created to contain the exception object.Generally, to get the information from a Record its attributes should be used. For a Record
R
, callingR.CR
causes citations() from the the tagProcessing module to be called on the contents of the raw ‘CR’ field. Then the result is saved and returned. In this case, a list of Citation objects is returned. You can also callR.citations
to get the same effect, as each known field tag has a longer name (currently there are 61 field tags). These names are meant to make accessing tags more readable and mapping from tag to name can be found in the tagToFull dict. If a tag is known (in tagToFull) but not in the raw dataNone
is returned instead. Most tags when cleaned return a string or list of strings, the exact results can be found in the help for the particular function.The attribute
authors
is also defined as a convenience and returns the same as ‘AF’ or if that is not found ‘AU’.__Init__¶ Records are generally created as collections in Recordcollections, and not as individual objects. If you wish to create one on its own it is possible, the arguments are as follows.
Parameters¶ inRecord:
files stream, dict, str or itertools.chain
If it is a file stream the file must be open at the location of the first tag in the record, usually ‘PT’, and the file will be read until ‘ER’ is found, which indicates the end of the record in the file.
If a dict is passed the dictionary is used as the database of fields and tags, so each key is considered a WOS tag and each value a list of the lines of the original associated with the tag. This is the same form of dict that recordParser returns.
For a string the input must be the raw textual data of a single record in the WOS style, like the file stream it must start at the first tag and end in
'ER'
.itertools.chain is treated identically to a file stream and is used by RecordCollections.
sFile :
optional [str]
Is the name of the file the raw data was in, by default it is blank. It is mostly used to make error messages more informative.sLine :
optional [int]
Is the line the record starts on in the raw data file. It is mostly used to make error messages more informative.-
UT
¶ Returns the UT tag (WOS number) of the record
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
-
copy
()¶ Correctly copies the
Record
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
-
id
¶
-
items
(raw=False)¶ Like
items
for dicts but with araw
option
-
keys
() → a set-like object providing a view on D's keys¶
-
sourceFile
¶
-
sourceLine
¶
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
title
¶
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
wosString
¶ Returns the WOS number (UT tag) of the record
-
writeRecord
(infile)¶ Writes to infile the original contents of the Record. This is intended for use by RecordCollections to write to file. What is written to infile is bit for bit identical to the original record file (if utf-8 is used). No newline is inserted above the write but the last character is a newline.
-
-
metaknowledge.WOS.recordWOS.
recordParser
(paper)¶ This is function that is used to create Records from files.
recordParser() reads the file paper until it reaches ‘ER’. For each field tag it adds an entry to the returned dict with the tag as the key and a list of the entries as the value, the list has each line separately, so for the following two lines in a record:
AF BREVIK, I ANICIN, B
The entry in the returned dict would be
{'AF' : ["BREVIK, I", "ANICIN, B"]}
Record
objects can be created with these dictionaries as the initializer.Parameters¶ paper :
file stream
An open file, with the current line at the beginning of the WOS record.Returns¶ OrderedDict[str : List[str]]
A dictionary mapping WOS tags to lists, the lists are of strings, each string is a line of the record associated with the tag.
Classes¶
CIHRGrant(Grant)¶
-
class
metaknowledge.grants.cihrGrant.
CIHRGrant
(original, grantdDict, sFile, sLine)¶
-
metaknowledge.grants.cihrGrant.
isCIHRfile
(fileName, useFileName=True)¶
-
metaknowledge.grants.cihrGrant.
parserCIHRfile
(fileName)¶
Citation(Hashable)¶
-
class
metaknowledge.citation.
Citation
(cite, scopusMode=False)¶ A class to hold citation strings and allow for comparison between them.
The initializer takes in a string representing a WOS citation in the form:
Author, Year, Journal, Volume, Page, DOI
Author
is the author’s name in the form of first last name first initial sometimes followed by a period.Year
is the year of publication.Journal
being the 29-Character Source Abbreviation of the journal.Volume
is the volume number(s) of the publication preceded by a VPage
is the page number the record starts onDOI
is the DOI number of the cited record preceeded by the letters'DOI'
Combined they look like:Nunez R., 1998, MATH COGNITION, V4, P85, DOI 10.1080/135467998387343
Note: any of the fields have been known to be missing and the requirements for the fields are not always met. If something is in the source string that cannot be interpreted as any of these it is put in the
misc
attribute. That is the reason to use this class, it gracefully handles missing information while still allowing for comparison between WOS citation strings.Customizations¶
Citation’s hashing and equality checking are based on ID() and use the values of
author
,year
andjournal
.When converted to a string a Citation will return the original string.
Attributes¶
As noted above, citations are considered to be divided into six distinct fields (
Author
,Year
,Journal
,Volume
,Page
andDOI
) with a seventhmisc
for anything not in those. Records thus have an attribute with a name corresponding to eachauthor
,year
,journal
,V
,P
,DOI
andmisc
respectively. These are created if there is anything in the field. So aCitation
created from the string:'Nunez R., 1998, MATH COGNITION'
would haveauthor
,year
andjournal
defined. While one from'Nunez R.'
would have only the attributemisc
.If the parsing of a citation string fails the attribute
bad
is set toTrue
and the attributeerror
is created to contain said error, which is a BadCitation object. If no errors occurbad
isFalse
.The attribute
original
is the unmodified string (cite) given to create the Citation, it can also be accessed by converting to a string, e.g. withstr()
.__Init__¶
Citations can be created by Records or by giving the initializer a string containing a WOS style citation.
Parameters¶
cite :
str
A str containing a WOS style citation.-
Extra
()¶ Returns any
V
,P
,DOI
ormisc
values as a string. These are all the values not returned by ID(), they are separated by' ,'
.
-
FullJournalName
()¶ Returns the full name of the Citation’s journal field. Requires the j9Abbreviations database file.
Note: Requires the j9Abbreviations database file and will raise an error if it cannot be found.
-
ID
()¶ Returns all of
author
,year
andjournal
available separated by' ,'
. It is for shortening labels when creating networks as the resultant strings are often unique. Extra() gets everything not returned by ID().This is also used for hashing and equality checking.
-
__eq__
(other)¶ First checks DOI for equality then checks each attribute if any are not equal False is returned
-
__hash__
()¶ A hash for Citation that should be equal to the hash of other citations that are equal to it. Based on the values returned by ID().
-
__init__
(cite, scopusMode=False)¶ Initialize self. See help(type(self)) for accurate signature.
-
__repr__
()¶ the representation of the Citation is its original form
-
__str__
()¶ returns the original string
-
__weakref__
¶ list of weak references to the object (if defined)
-
addToDB
(manualName=None, manualDB='manualj9Abbreviations', invert=False)¶ Adds the journal of this Citation to the user created database of journals. This will cause isJournal() to return
True
for this Citation and all others with itsjournal
.Note: Requires the j9Abbreviations database file and will raise an error if it cannot be found.
-
allButDOI
()¶ Returns a string of the normalized values from the Citation excluding the DOI number. Equivalent to getting the ID with ID() then appending the extra values from Extra() and then removing the substring containing the DOI number.
-
isAnonymous
()¶ Checks if the author is given as
'[ANONYMOUS]'
and returnsTrue
if so.
-
isJournal
(dbname='j9Abbreviations', manualDB='manualj9Abbreviations', returnDict='both', checkIfExcluded=False)¶ Returns
True
if theCitation
’sjournal
field is a journal abbreviation from the WOS listing found at http://images.webofknowledge.com/WOK46/help/WOS/A_abrvjt.html, i.e. checks if the citation is citing a journal.Note: Requires the j9Abbreviations database file and will raise an error if it cannot be found.
Note: All parameters are used for getting the data base with getj9dict.
-
Collection(MutableSet, Hashable)¶
-
class
metaknowledge.
Collection
(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)¶ A named hashable set with some error reporting.
Collections
have all the methods of builtinsets
as well as error reporting with bad and error, and control over the contained items with allowedTypes and collectedTypes.Customizations¶
When created name should be a string that allows users to easily determine the source of the
Collection
When created the you must provided a set of types, allowedTypes, when new items are added they will be checked and if they are not instances of any of the types an
CollectionTypeError
exception will be raised. The collectedTypes set that is provided should be a set of only the types in theCollection
.If any of the elements in the
Collection
are bad then bad should be set toTrue
and thedict
errors should map the item to it’s exception.All of these customizations are managed when operations occur on the
Collection
and if 2Collections
are modified with one of the binary operators (|
,-
, etc) the_collectedTypes
anderrors
attributes will be modified the same way.name
will be updated to explain the operation(s) that occurred.__Init__
As
Collection
is mostly meant to be base for other classes all but one of the arguments in the __Init__ are not optional and the optional one is not used.Parameters¶
inSet :
set
The objects to be containedallowedTypes :
set[type]
A set of types,{object}
will allow virtually everythingcollectedTypes :
set[type]
The types (or supertypes) of the objects in inSetname :
str
The name of theCollection
bad :
bool
If any of the elements are baderrors :
dict[:Exception]
A mapping from items to their errorsquietStart :
optional [bool]
DefaultFalse
, does nothing. This is here for use as a interface by subclasses-
__eq__
(other)¶ Return self==value.
-
__ge__
(other)¶ Return self>=value.
-
__hash__
()¶ Return hash(self).
-
__init__
(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)¶ Basically a collections.abc.MutableSet wrapper for a set with a bunch of extra record keeping attached.
-
__le__
(other)¶ Return self<=value.
-
__repr__
()¶ Return repr(self).
-
__str__
()¶ Return str(self).
-
__weakref__
¶ list of weak references to the object (if defined)
-
add
(elem)¶ Adds elem to the collection.
-
chunk
(maxSize)¶ Splits the
Collection
into maxSize size or smallerCollections
-
clear
()¶ “Removes all elements from the collection and resets the error handling
-
copy
()¶ Creates a shallow copy of the collection
-
discard
(elem)¶ Removes elem from the collection, will not raise an Exception if elem is missing
-
peek
()¶ returns a random element from the collection. If ran twice the same element will usually be returned
-
pop
()¶ Removes a random element from the collection and returns it
-
remove
(elem)¶ Removes elem from the collection, will raise a KeyError is elem is missing
-
split
(maxSize)¶ Destructively, splits the
Collection
into maxSize size or smallerCollections
. The sourceCollection
will be empty after this operation
-
CollectionWithIDs(Collection)¶
-
class
metaknowledge.
CollectionWithIDs
(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)¶ A Collection with a few extra methods that assume all the contained items have an id attribute and a bad attribute, e.g. Records or Grants.
__Init__
As
CollectionWithIDs
is mostly meant to be base for other classes all but one of the arguments in the__init__
are not optional and the optional one is not used. The__init__()
function is the same as a Collection.-
__init__
(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)¶ Basically a collections.abc.MutableSet wrapper for a set with a bunch of extra record keeping attached.
-
badEntries
()¶ Creates a new collection of the same type with only the bad entries
-
containsID
(idVal)¶ Checks if the collected items contains the give idVal
-
cooccurrenceCounts
(keyTag, *countedTags)¶ Counts the number of times values from any of the countedTags occurs with keyTag. The counts are retuned as a dictionary with the values of keyTag mapping to dictionaries with each of the countedTags values mapping to thier counts.
Parameters¶
keyTag :
str
The tag used as the key for the returned dictionary_*countedTags_ :
str, str, str, ...
The tags used as the key for the returned dictionary’s values
-
discardID
(idVal)¶ Checks if the collected items contains the give idVal and discards it if it is found, will not raise an exception if item is not found
-
dropBadEntries
()¶ Removes all the bad entries from the collection
-
getID
(idVal)¶ Looks up an item with idVal and returns it if it is found, returns
None
if it does not find the item
-
glimpse
(*tags, compact=False)¶ Creates a printable table with the most frequently occurring values of each of the requested tags, or if none are provided the top authors, journals and citations. The table will be as wide and as tall as the terminal (or 80x24 if there is no terminal) so
print(RC.glimpse())
should always create a nice looking table. Below is a table created from some of the testing files:>>> print(RC.glimpse()) +RecordCollection glimpse made at: 2016-01-01 12:00:00++++++++++++++++++++++++++ |33 Records from testFile++++++++++++++++++++++++++++++++++++++++++++++++++++++| |Columns are ranked by num. of occurrences and are independent of one another++| |-------Top Authors--------+------Top Journals-------+--------Top Cited--------| |1 Girard, S|1 CANADIAN JOURNAL OF PH.|1 LEVY Y, 1975, OPT COMM.| |1 Gilles, H|1 JOURNAL OF THE OPTICAL.|2 GOOS F, 1947, ANN PHYS.| |2 IMBERT, C|2 APPLIED OPTICS|3 LOTSCH HKV, 1970, OPTI.| |2 Pillon, F|2 OPTICS COMMUNICATIONS|4 RENARD RH, 1964, J OPT.| |3 BEAUREGARD, OCD|2 NUOVO CIMENTO DELLA SO.|5 IMBERT C, 1972, PHYS R.| |3 Laroche, M|2 JOURNAL OF THE OPTICAL.|6 ARTMANN K, 1948, ANN P.| |3 HUARD, S|2 JOURNAL OF THE OPTICAL.|6 COSTADEB.O, 1973, PHYS.| |4 PURI, A|2 NOUVELLE REVUE D OPTIQ.|6 ROOSEN G, 1973, CR ACA.| |4 COSTADEB.O|3 PHYSICS REPORTS-REVIEW.|7 Imbert C., 1972, Nouve.| |4 PATTANAYAK, DN|3 PHYSICAL REVIEW LETTERS|8 HOROWITZ BR, 1971, J O.| |4 Gazibegovic, A|3 USPEKHI FIZICHESKIKH N.|8 BRETENAKER F, 1992, PH.| |4 ROOSEN, G|3 APPLIED PHYSICS B-LASE.|8 SCHILLIN.H, 1965, ANN .| |4 BIRMAN, JL|3 AEU-INTERNATIONAL JOUR.|8 FEDOROV FI, 1955, DOKL.| |4 Kaiser, R|3 COMPTES RENDUS HEBDOMA.|8 MAZET A, 1971, CR ACAD.| |5 LEVY, Y|3 CHINESE PHYSICS LETTERS|9 IMBERT C, 1972, CR ACA.| |5 BEAUREGA.OC|3 PHYSICAL REVIEW B|9 LOTSCH HKV, 1971, OPTI.| |5 PAVLOV, VI|3 LETTERE AL NUOVO CIMEN.|9 ASHBY N, 1973, PHYS RE.| |5 BREVIK, I|3 PROGRESS IN QUANTUM EL.|9 BOULWARE DG, 1973, PHY.| >>>
Parameters¶
tags :
str, str, ...
Any number of tag strings to be made into columns in the output table
-
networkMultiLevel
(*modes, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None, nodeAttribute=None, _networkTypeString='n-level network')¶ Creates a network of the objects found by any number of tags modes, with edges between all co-occurring values. IF you only want edges between co-occurring values from different tags use networkMultiMode().
A networkMultiLevel() looks are each entry in the collection and extracts its values for the tag given by each of the modes, e.g. the
'authorsFull'
tag. Then if multiple are returned an edge is created between them. So in the case of the author tag'authorsFull'
a co-authorship network is created. Then for each other tag the entries are also added and edges between the first tag’s node and theirs are created.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.Note Do not use this for the construction of co-citation networks use Recordcollection.networkCoCitation() it is more accurate and has more options.
Parameters¶
mode :
str
A two character WOS tag or one of the full names for a tagnodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called “count” that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.stemmer :
optional [func]
Default
None
, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, all IDs are strings. For example:The function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.Returns¶
networkx Graph
A networkx Graph with the objects of the tag mode as nodes and their co-occurrences as edges
-
networkMultiMode
(*tags, recordType=True, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None)¶ Creates a network of the objects found by all tags in tags, each node is marked by which tag spawned it making the resultant graph n-partite.
A networkMultiMode() looks are each item in the collection and extracts its values for the tags given by tags. Then for all objects returned an edge is created between them, regardless of their type. Each node will have an attribute call
'type'
that gives the tag that created it or both if both created it, e.g. if'LA'
were in tags node'English'
would have the type attribute be'LA'
.For example if tags was set to
['CR', 'UT', 'LA']
, a three mode network would be created, composed of a co-citation network from the'CR'
tag. Then each citation would also have edges to all the languages of Records that cited it and to the WOS number of the those Records.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.Parameters¶
tags :
str
,str
,str
, … orlist [str]
Any number of tags, or a list of tagsnodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called'count'
that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called'weight'
that contains an int giving the number of time the two objects co-occurrenced.stemmer :
optional [func]
Default
None
, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, note that all IDs are strings.For example: the function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.Returns¶
networkx Graph
A networkx Graph with the objects of the tags tags as nodes and their co-occurrences as edges
-
networkOneMode
(mode, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None, nodeAttribute=None)¶ Creates a network of the objects found by one tag mode. This is the same as networkMultiLevel() with only one tag.
A networkOneMode() looks are each entry in the collection and extracts its values for the tag given by mode, e.g. the
'authorsFull'
tag. Then if multiple are returned an edge is created between them. So in the case of the author tag'authorsFull'
a co-authorship network is created.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.Note Do not use this for the construction of co-citation networks use Recordcollection.networkCoCitation() it is more accurate and has more options.
Parameters¶
mode :
str
A two character WOS tag or one of the full names for a tagnodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called “count” that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.stemmer :
optional [func]
Default
None
, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, all IDs are strings. For example:The function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.Returns¶
networkx Graph
A networkx Graph with the objects of the tag mode as nodes and their co-occurrences as edges
-
networkTwoMode
(tag1, tag2, directed=False, recordType=True, nodeCount=True, edgeWeight=True, stemmerTag1=None, stemmerTag2=None, edgeAttribute=None)¶ Creates a network of the objects found by two WOS tags tag1 and tag2, each node marked by which tag spawned it making the resultant graph bipartite.
A networkTwoMode() looks at each Record in the
RecordCollection
and extracts its values for the tags given by tag1 and tag2, e.g. the'WC'
and'LA'
tags. Then for each object returned by each tag and edge is created between it and every other object of the other tag. So the WOS defined subject tag'WC'
and language tag'LA'
, will give a two-mode network showing the connections between subjects and languages. Each node will have an attribute call'type'
that gives the tag that created it or both if both created it, e.g. the node'English'
would have the type attribute be'LA'
.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.The directed parameter if
True
will cause the network to be directed with the first tag as the source and the second as the destination.Parameters¶
tag1 :
str
A two character WOS tag or one of the full names for a tag, the source of edges on the graphtag1 :
str
A two character WOS tag or one of the full names for a tag, the target of edges on the graphdirected :
optional [bool]
DefaultFalse
, ifTrue
the returned network is directednodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called “count” that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.stemmerTag1 :
optional [func]
Default
None
, If stemmerTag1 is a callable object, basically a function or possibly a class, it will be called for the ID of every node given by tag1 in the graph, all IDs are strings.For example: the function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.stemmerTag2 :
optional [func]
DefaultNone
, see stemmerTag1 as it is the same but for tag2Returns¶
networkx Graph or networkx DiGraph
A networkx Graph with the objects of the tags tag1 and tag2 as nodes and their co-occurrences as edges.
-
rankedSeries
(tag, outputFile=None, giveCounts=True, giveRanks=False, greatestFirst=True, pandasMode=True, limitTo=None)¶ Creates an pandas dict of the ordered list of all the values of tag, with and ranked by their number of occurrences. A list can also be returned with the the counts or ranks added or it can be written to a file.
Parameters¶
tag :
str
The tag to be rankedoutputFile :
optional str
A file path to write a csv with 2 columns, one the tag values the other their countsgiveCounts :
optional bool
DefaultTrue
, ifTrue
the retuned list will be composed of tuples the first values being the tag value and the second their counts. This supersedes giveRanks.giveRanks :
optional bool
DefaultFalse
, ifTrue
and giveCounts isFalse
, the retuned list will be composed of tuples the first values being the tag value and the second their ranks. This is superseded by giveCounts.greatestFirst :
optional bool
DefaultTrue
, ifTrue
the returned list will be ordered with the highest ranked value first, otherwise the lowest ranked will be first.pandasMode :
optional bool
DefaultTrue
, ifTrue
adict
ready for pandas will be returned, otherwise a listlimitTo :
optional list[values]
DefaultNone
, if a list is provided only those values in the list will be counted or returnedReturns¶
dict[str:list[value]] or list[str]
Adict
orlist
will be returned depending on if pandasMode isTrue
-
removeID
(idVal)¶ Checks if the collected items contains the give idVal and removes it if it is found, will raise a
KeyError
if item is not found
Creates a list of all the tags of the contained items
-
timeSeries
(tag=None, outputFile=None, giveYears=True, greatestFirst=True, limitTo=False, pandasMode=True)¶ Creates an pandas dict of the ordered list of all the values of tag, with and ranked by the year the occurred in, multiple year occurrences will create multiple entries. A list can also be returned with the the counts or years added or it can be written to a file.
If no tag is given the
Records
in the collection will be usedParameters¶
tag :
optional str
DefaultNone
, if provided the tag will be orderedoutputFile :
optional str
A file path to write a csv with 2 columns, one the tag values the other their yearsgiveYears :
optional bool
DefaultTrue
, ifTrue
the retuned list will be composed of tuples the first values being the tag value and the second their years.greatestFirst :
optional bool
DefaultTrue
, ifTrue
the returned list will be ordered with the highest years first, otherwise the lowest years will be first.pandasMode :
optional bool
DefaultTrue
, ifTrue
adict
ready for pandas will be returned, otherwise a listlimitTo :
optional list[values]
DefaultNone
, if a list is provided only those values in the list will be counted or returnedReturns¶
dict[str:list[value]] or list[str]
Adict
orlist
will be returned depending on if pandasMode isTrue
-
ExtendedRecord(Record)¶
-
class
metaknowledge.
ExtendedRecord
(fieldDict, idValue, bad, error, sFile='', sLine=0)¶ A subclass of
Record
that adds processing to the dictionary. It also cannot be use directly and must be subclassed.The
ExtendedRecord
class is a extension ofRecord
that is intended for use with the records on scientific papers provided by different organizations such as WOS or Pubmed. The 5 abstract (virtual) methods must be defined for each subclass and define how the data in the different fields is processed and how the record can be rewritten to a file.Processing fields¶
When an
ExtendedRecord
is created a dictionary, fieldDict, must be provided this contains the raw data from the file reader, usually as lists of strings.tagProcessingFunc
is astaticmethod
function that takes in a tag string an returns another function to process it.Each tag may also be given a second name, as usually what the they are called in the raw data are not very easy to understand (e.g.
'SO'
is the journal name for WOs records). The mapping from the raw tag ('SO'
) to the human friendly string ('journal'
) is done with thegetAltName
staticmethod
.getAltName
takes in a tag string and returns eitherNone
or the other name for that string. Note,getAltName
must go both directionsWOSRecord.getAltName(WOSRecord.getAltName('SO')) == 'SO'
.The last method for processing entries is
specialFuncs
The following are the special keys forExtendedRecords
. These must be the alternate names of tags or strings accepted by thespecialFuncs
method.'authorsFull'
'keywords'
'grants'
'j9'
'authorsShort'
'volume'
'selfCitation'
'citations'
'address'
'abstract'
'title'
'month'
'year'
'journal'
'beginningPage'
'DOI'
specialFuncs
when given one of these must raise aKeyError
or return an object of the same type as that returned by theMedlineRecord
orWOSRecord
. e.g.'title'
would return a string giving the title of the record.For an example of how this works lets first look at the
'SO'
tag on aWOSRecord
accessed with the alternate name'journal'
.t = R['journal']
First the private dictionary
_computedFields
is checked for the key'title'
, which will fail if this is the first time'journal'
or'SO'
has been requested, after this the results will be added to the dictionary to speed up future requests.Then the fieldDict will be checked for the key and when that fails the key will go through
getAltName
and be checked again. If the record had a journal entry this will succeed and the raw data will be given to thetagProcessingFunc
using the same key as fieldDict, in this caseSO
.The results will then be written to
_computedFields
and returned.If the requested key was instead
'grants'
(g = R['grants']
)the both lookups to fieldDict would have failed and the string'grants'
would have been given tospecialFuncs
which would return a list of all the grants in theWOSRecord
(this is always[]
as WOS does not provided grant information).What if the key were not present anywhere? Then the
specialFuncs
should raise aKeyError
which will be caught then re-raised like a dictionary would with an invalid key look up.File Handling fields¶
The two other required methods
encoding
andwriteRecord
define how the records can be rewritten to a file.encoding
is should return a string giving the encoding python would use, e.g.'utf-8'
or'latin-1'
. This is the same encoding that the files written bywriteRecord
should have,writeRecord
when called should write the original record to the provided open file, infile. The opening, closing, header and footer of the file will be handled byRecordCollection
’swriteFile
function which should me modified accordingly. If the order of the fields in a record is important you can use a collections.OrderedDict for fieldDict.__Init__¶
The
__init__
ofExtendedRecord
takes the same arguments as Record-
__contains__
(item)¶ Checks if the tag item is in the Record
-
__getitem__
(key)¶ Processes the tag requested with key and memoize it.
Allows long names, but will still raise a KeyError if the tag is missing, regardless of name used.
-
__init__
(fieldDict, idValue, bad, error, sFile='', sLine=0)¶ Base constructor for Records
fieldDict : is the unpared entry dict with tags as keys and their lines as a list of strings
idValue : is the unique ID of the Record, e.g. the WOS number
titleKey : is the tag giving the title of the Record, e.g. the WOS tag is
'TI'
bad : is the bool to flag the Record as having encountered an errror
error : is the error that bad indicates
sFile : is the name of the source file
sLine : is the line number of the start of the Record entry
altNames : is a dict that maps the names of tags to an alternative name, i.e. the long names dict. It must be bidirectional: map long to short and short to long
proccessingFuncs : is a dict of functions to proccess the tags. It has the short names as keys and their proccessing fucntions as values. Missing tags will result in the unparsed value to be returned.
The Records inheting from this must implement, calling the implementations in Record with super() will not cause errors:
- writeRecord
- tagProcessingFunc
- encoding
- titleTag
- getAltName
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
-
items
(raw=False)¶ Like
items
for dicts but with araw
option
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
writeRecord
(infile)¶ An
abstractmethod
, writes the record in its original form to infile
FallbackGrant(Grant)¶
-
class
metaknowledge.grants.
FallbackGrant
(original, grantdDict, sFile='', sLine=0)¶ A subclass of Grant, it has the same attributes and is returned from the fall back constructor for grants.
-
__init__
(original, grantdDict, sFile='', sLine=0)¶ Initialize self. See help(type(self)) for accurate signature.
-
Grant(Record, MutableMapping)¶
-
class
metaknowledge.grants.
Grant
(original, grantdDict, idValue, bad, error, sFile='', sLine=0)¶ -
__init__
(original, grantdDict, idValue, bad, error, sFile='', sLine=0)¶ Initialize self. See help(type(self)) for accurate signature.
-
getInstitutions
(tags=None, seperator=';', _getTag=False)¶ Returns a list of the names of institutions. This is done by looking (in order) for any of fields in tags and splitting the strings on seperator (in case of multiple institutions). If no strings are found an empty list will be returned.
Note for some Grants
getInstitutions
has been overwritten and will ignore the arguments and simply provide the investigators.Parameters¶
tags :
optional list[str]
A list of the tags to look for institutions inseperator :
optional str
The string that separators each institutions name within the column
-
getInvestigators
(tags=None, seperator=';', _getTag=False)¶ Returns a list of the names of investigators. This is done by looking (in order) for any of fields in tags and splitting the strings on seperator. If no strings are found an empty list will be returned.
Note for some Grants
getInvestigators
has been overwritten and will ignore the arguments and simply provide the investigators.Parameters¶
tags :
optional list[str]
A list of the tags to look for investigators inseperator :
optional str
The string that separators each investigators name within the column
-
update
(other)¶ Adds all the tag-entry pairs from other to the
Grant
. If there is a conflict other takes precedence.
-
GrantCollection(CollectionWithIDs)¶
-
class
metaknowledge.
GrantCollection
(inGrants=None, name='', extension='', cached=False, quietStart=False)¶ -
__init__
(inGrants=None, name='', extension='', cached=False, quietStart=False)¶ Basically a collections.abc.MutableSet wrapper for a set with a bunch of extra record keeping attached.
-
networkCoInvestigator
(targetTags=None, tagSeperator=';', count=True, weighted=True, _institutionLevel=False)¶ Creates a co-investigator from the collection
Most grants do not have a known investigator tag so it must be provided by the user in targetTags and the separator character if it is not a semicolon should also be given.
Parameters¶
targetTags :
optional list[str]
A list of all the Grant tags to check for investigators
tagSeperator :
optional str
The character that separates the individual investigator’s namescount :
optional bool
DefaultTrue
, ifTrue
the number of time a name occurs will be givenweighted :
optional bool
DefaultTrue
, ifTrue
the edge weights will be calculated and added to the edges
-
networkCoInvestigatorInstitution
(targetTags=None, tagSeperator=';', count=True, weighted=True)¶ This works the same as networkCoInvestigator() see it for details.
-
MedlineGrant(Grant)¶
MedlineRecord(ExtendedRecord)¶
-
class
metaknowledge.medline.
MedlineRecord
(inRecord, sFile='', sLine=0)¶ Class for full Medline(Pubmed) entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use medlineParser() on a medline file.
-
__init__
(inRecord, sFile='', sLine=0)¶ Base constructor for Records
fieldDict : is the unpared entry dict with tags as keys and their lines as a list of strings
idValue : is the unique ID of the Record, e.g. the WOS number
titleKey : is the tag giving the title of the Record, e.g. the WOS tag is
'TI'
bad : is the bool to flag the Record as having encountered an errror
error : is the error that bad indicates
sFile : is the name of the source file
sLine : is the line number of the start of the Record entry
altNames : is a dict that maps the names of tags to an alternative name, i.e. the long names dict. It must be bidirectional: map long to short and short to long
proccessingFuncs : is a dict of functions to proccess the tags. It has the short names as keys and their proccessing fucntions as values. Missing tags will result in the unparsed value to be returned.
The Records inheting from this must implement, calling the implementations in Record with super() will not cause errors:
- writeRecord
- tagProcessingFunc
- encoding
- titleTag
- getAltName
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶
key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶
The processed value of key
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
writeRecord
(f)¶ This is nearly identical to the original the FAU tag is the only tag not writen in the same place, doing so would require changing the parser and lots of extra logic.
-
NSERCGrant(Grant)¶
-
class
metaknowledge.grants.
NSERCGrant
(original, grantdDict, sFile, sLine)¶ -
__init__
(original, grantdDict, sFile, sLine)¶ Initialize self. See help(type(self)) for accurate signature.
-
getInstitutions
(tags=None, seperator=';', _getTag=False)¶ Returns a list with the names of the institution. The optional arguments are ignored
-
getInvestigators
(tags=None, seperator=';', _getTag=False)¶ Returns a list of the names of investigators. The optional arguments are ignored.
-
update
(other)¶ Adds all the tag-entry pairs from other to the
Grant
. If there is a conflict other takes precedence.
-
NSFGrant(Grant)¶
-
class
metaknowledge.grants.
NSFGrant
(grantdDict, sFile)¶ -
__init__
(grantdDict, sFile)¶ Initialize self. See help(type(self)) for accurate signature.
-
getInstitutions
(tags=None, seperator=';', _getTag=False)¶ Returns a list with the names of the institution. The optional arguments are ignored
-
getInvestigators
(tags=None, seperator=';', _getTag=False)¶ Returns a list of the names of investigators. The optional arguments are ignored.
-
ProQuestRecord(ExtendedRecord)¶
-
class
metaknowledge.proquest.
ProQuestRecord
(inRecord, recNum=None, sFile='', sLine=0)¶ Class for full ProQuest entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use proQuestParser() on a ProQuest file.
-
__init__
(inRecord, recNum=None, sFile='', sLine=0)¶ Base constructor for Records
fieldDict : is the unpared entry dict with tags as keys and their lines as a list of strings
idValue : is the unique ID of the Record, e.g. the WOS number
titleKey : is the tag giving the title of the Record, e.g. the WOS tag is
'TI'
bad : is the bool to flag the Record as having encountered an errror
error : is the error that bad indicates
sFile : is the name of the source file
sLine : is the line number of the start of the Record entry
altNames : is a dict that maps the names of tags to an alternative name, i.e. the long names dict. It must be bidirectional: map long to short and short to long
proccessingFuncs : is a dict of functions to proccess the tags. It has the short names as keys and their proccessing fucntions as values. Missing tags will result in the unparsed value to be returned.
The Records inheting from this must implement, calling the implementations in Record with super() will not cause errors:
- writeRecord
- tagProcessingFunc
- encoding
- titleTag
- getAltName
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶
key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶
The processed value of key
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
writeRecord
(infile)¶ An
abstractmethod
, writes the record in its original form to infile
-
Record(Mapping, Hashable)¶
-
class
metaknowledge.
Record
(fieldDict, idValue, bad, error, sFile='', sLine=0)¶ A dictionary with error handling and an id string.
Record
is the base class of the all objects in metaknowledge that contain information as key-value pairs, these are the grants and the records from different sources.The error handling of the
Record
is done with thebad
attribute. If there is some issue with the data bad should beTrue
and error given anException
that was caused by or explains the error.Customizations¶
Record
is a subclass ofabc.collections.Mapping
which means it has almost all the methods a dictionary does, the missing ones are those that modify entries. So to access the value of the key'title'
from aRecord
R
, you would use either the square brace notationt = R['title']
or theget()
functiont = R.get('title')
just like a dictionary. The other methods likekeys()
orcopy()
also work.In addition to being a mapping
Records
are also hashable with their hashes being based on a unique id string they are given on creation, usually some kind of accession number the source gives them. The two optional arguments sFile and sLine, which should be given the name of the file the records came from and the line it started on respectively, are used to make the errors more useful.__Init__¶
fieldDict is the dictionary the
Record
will use and idValue is the unique identifier of theRecord
.Parameters¶
fieldDict :
dict[str:]
A dictionary that maps from strings to valuesidValue :
str
A unique identifier string for theRecord
bad :
bool
True
if there are issues with theRecord
, otherwiseFalse
error :
Exception
TheException
that caused whatever error made the record be marked as bad orNone
sFile :
str
A string that gives the source file of the original recordssLine :
int
The first line the original record is found on in the source file-
__bytes__
()¶ Returns the binary form of the original
-
__contains__
(item)¶ Checks if the tag item is in the Record
-
__eq__
(other)¶ Compares
Records
using their hashes if their hashes are the same thenTrue
is returned.
-
__getitem__
(key)¶ This is redfined as something interesting for ExtendedRecord
-
__hash__
()¶ Gives a hash of the id or if
bad
returns a hash of the fields combined with the error messages, either of these could be blankbad
Records are more likely to cause hash collisions due to their lack of entropy when created.
-
__init__
(fieldDict, idValue, bad, error, sFile='', sLine=0)¶ Initialize self. See help(type(self)) for accurate signature.
-
__iter__
()¶ Iterates over the tags in the Record
-
__len__
()¶ Returns the number of tags
-
__repr__
()¶ Makes a string with the id of the file and its type
-
__str__
()¶ Makes a string with the title of the file as given by self.title, if there is not one it returns “Untitled record”
-
__weakref__
¶ list of weak references to the object (if defined)
-
copy
()¶ Correctly copies the
Record
-
RecordCollection(CollectionWithIDs)¶
-
class
metaknowledge.
RecordCollection
(inCollection=None, name='', extension='', cached=False, quietStart=False)¶ A container for a large number of indivual records.
RecordCollection
provides ways of creating Records from an isi file, string, list of records or directory containing isi files.When being created if there are issues the Record collection will be declared bad,
bad
wil be set toFalse
, it will then mostly returnNone
or False. The attributeerror
contains the exception that occurred.They also possess an attribute
name
also accessed with__repr__()
, this is used to auto generate the names of files and can be set at creation, note though that any operations that modify the RecordCollection’s contents will update the name to include what occurred.Customizations¶
The Records are containing within a set and as such many of the set operations are defined, pop, union, in … also records are hashed with their WOS string so no duplication can occur. The comparison operators
<
,<=
,>
,>=
are based strictly on the number of Records within the collection, while equality looks for an exact match on the Records__Init__¶
inCollection is the object containing the information about the Records to be constructed it can be an isi file, string, list of records or directory containing isi files
Parameters¶
inCollection :
optional [str] or None
the name of the source of WOS records. It can be skipped to produce an empty collection.
If a file is provided. First it is checked to see if it is a WOS file (the header is checked). Then records are read from it one by one until the ‘EF’ string is found indicating the end of the file.
If a directory is provided. First each file in the directory is checked for the correct header and all those that do are then read like indivual files. The records are then collected into a single set in the RecordCollection.
name :
optional [str]
The name of the RecordCollection, defaults to empty string. If left empty the name of the Record collection is set to the name of the file or directory used to create the collection. If provided the name id set to nameextension :
optional [str]
The extension to search for when reading a directory for files. extension is the suffix searched for when a directory is read for files, by default it is empty so all files are read.cached :
optional [bool]
Default
False
, ifTrue
and the inCollection is a directory (a string giving the path to a directory) then the initializedRecordCollection
will be saved in the directory as a Python pickle with the suffix'.mkDirCache'
. Then if theRecordCollection
is initialized a second time it will be recovered from the file, which is much faster than reprising every file in the directory.metaknowledge saves the names of the parsed files as well as their last modification times and will check these when recreating the
RecordCollection
, so modifying existing files or adding new ones will result in the entire directory being reanalyzed and a new cache file being created. The extension given to__init__()
is taken into account as well and each suffix is given its own cache.Note The pickle allows for arbitrary python code execution so only use caches that you trust.
-
__init__
(inCollection=None, name='', extension='', cached=False, quietStart=False)¶ Basically a collections.abc.MutableSet wrapper for a set with a bunch of extra record keeping attached.
-
citeFilter
(keyString='', field='all', reverse=False, caseSensitive=False)¶ Filters
Records
by some string, keyString, in their citations and returns allRecords
with at least one citation possessing keyString in the field given by field.
-
dropNonJournals
(ptVal='J', dropBad=True, invert=False)¶ Drops the non journal type
Records
from the collection, this is done by checking ptVal against the PT tag
-
findProbableCopyright
()¶ Finds the (likely) copyright string from all abstracts in the
RecordCollection
-
forBurst
(tag, outputFile=None, dropList=None, lower=True, removeNumbers=True, removeNonWords=True, removeWhitespace=True, stemmer=None)¶ Creates a pandas friendly dictionary with 2 columns one
'year'
and the other'word'
. Each row is a word that occurred in the field given by tag in aRecord
and the year of the record. Unfortunately getting the month or day with any type of accuracy has proved to be impossible so year is the only option.
-
forNLP
(outputFile=None, extraColumns=None, dropList=None, lower=True, removeNumbers=True, removeNonWords=True, removeWhitespace=True, removeCopyright=False, stemmer=None)¶ Creates a pandas friendly dictionary with each row a
Record
in theRecordCollection
and the columns fields natural language processing uses (id, title, publication year, keywords and the abstract). The abstract is by default is processed to remove non-word, non-space characters and the case is lowered.
-
genderStats
(asFractions=False)¶ Creates a dict (
{'Male' : maleCount, 'Female' : femaleCount, 'Unknown' : unknownCount}
) with the numbers of male, female and unknown names in the collection.
-
getCitations
(field=None, values=None, pandasFriendly=True, counts=True)¶ Creates a pandas ready dict with each row a different citation the contained Records and columns containing the original string, year, journal, author’s name and the number of times it occured.
There are also options to filter the output citations with field and values
-
localCiteStats
(pandasFriendly=False, keyType='citation')¶ Returns a dict with all the citations in the CR field as keys and the number of times they occur as the values
-
localCitesOf
(rec)¶ Takes in a Record, WOS string, citation string or Citation and returns a RecordCollection of all records that cite it.
-
makeDict
(onlyTheseTags=None, longNames=False, raw=False, numAuthors=True, genderCounts=True)¶ Returns a dict with each key a tag and the values being lists of the values for each of the Records in the collection,
None
is given when there is no value and they are in the same order across each tag.When used with pandas:
pandas.DataFrame(RC.makeDict())
returns a data frame with each column a tag and each row a Record.
-
networkBibCoupling
(weighted=True, fullInfo=False, addCR=False)¶ Creates a bibliographic coupling network based on citations for the RecordCollection.
-
networkCitation
(dropAnon=False, nodeType='full', nodeInfo=True, fullInfo=False, weighted=True, dropNonJournals=False, count=True, directed=True, keyWords=None, detailedCore=True, detailedCoreAttributes=False, coreOnly=False, expandedCore=False, recordToCite=True, addCR=False, _quiet=False)¶ Creates a citation network for the RecordCollection.
-
networkCoAuthor
(detailedInfo=False, weighted=True, dropNonJournals=False, count=True, useShortNames=False, citeProfile=False)¶ Creates a coauthorship network for the RecordCollection.
-
networkCoCitation
(dropAnon=True, nodeType='full', nodeInfo=True, fullInfo=False, weighted=True, dropNonJournals=False, count=True, keyWords=None, detailedCore=True, detailedCoreAttributes=False, coreOnly=False, expandedCore=False, addCR=False)¶ Creates a co-citation network for the RecordCollection.
-
rpys
(minYear=None, maxYear=None, dropYears=None, rankEmptyYears=False)¶ This implements Referenced Publication Years Spectroscopy a techinique for finding import years in citation data. The authors of the original papers have a website with more information, found here.
This function computes the spectra of the
RecordCollection
and returns a dictionary mapping strings to lists ofints
. Each list is ordered and the values of each with the same index form a row and each list a column. The strings are the names of the columns. This is intended to be read directly by pandasDataFrames
.The columns returned are:
'year'
, the years of the counted citations, missing years are inserted with a count of 0, unless they are outside the bounds of the highest year or the lowest year and the default value is used. e.g. if the highest year is 2016, 2017 will not be inserted unless maxYear has been set to 2017 or higher'count'
, the number of times the year was cited'abs-deviation'
, deviation from the 5-year median. Calculated by taking the absolute deviation of the count from the median of it and the next 2 years and the preceding 2 years'rank'
, the rank of the year, the highest ranked year being the one with the highest deviation, the second highest being the second highest deviation and so on. All years with 0 count are given the rank 0 by default
-
writeBib
(fname=None, maxStringLength=1000, wosMode=False, reducedOutput=False, niceIDs=True)¶ Writes a bibTex entry to fname for each
Record
in the collection.If the Record is of a journal article (PT J) the bibtext type is set to
'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier only.
Note Record entries that are lists have their values separated with the string
' and '
, as this is the way bibTex understands
-
writeCSV
(fname=None, splitByTag=None, onlyTheseTags=None, numAuthors=True, genderCounts=True, longNames=False, firstTags=None, csvDelimiter=', ', csvQuote='"', listDelimiter='|')¶ Writes all the
Records
from the collection into a csv file with each row a record and each column a tag.
-
writeFile
(fname=None)¶ Writes the
RecordCollection
to a file, the written file’s format is identical to those download from WOS. The order ofRecords
written is random.
-
yearSplit
(startYear, endYear, dropMissingYears=True)¶ Creates a RecordCollection of Records from the years between startYear and endYear inclusive.
-
ScopusRecord(ExtendedRecord)¶
-
class
metaknowledge.scopus.
ScopusRecord
(inRecord, sFile='', sLine=0, header=None)¶ Class for full Scopus entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use scopusParser() on a scopus CSV file.
-
__init__
(inRecord, sFile='', sLine=0, header=None)¶ Base constructor for Records
fieldDict : is the unpared entry dict with tags as keys and their lines as a list of strings
idValue : is the unique ID of the Record, e.g. the WOS number
titleKey : is the tag giving the title of the Record, e.g. the WOS tag is
'TI'
bad : is the bool to flag the Record as having encountered an errror
error : is the error that bad indicates
sFile : is the name of the source file
sLine : is the line number of the start of the Record entry
altNames : is a dict that maps the names of tags to an alternative name, i.e. the long names dict. It must be bidirectional: map long to short and short to long
proccessingFuncs : is a dict of functions to proccess the tags. It has the short names as keys and their proccessing fucntions as values. Missing tags will result in the unparsed value to be returned.
The Records inheting from this must implement, calling the implementations in Record with super() will not cause errors:
- writeRecord
- tagProcessingFunc
- encoding
- titleTag
- getAltName
-
createCitation
(multiCite=False)¶ Overwriting the general citation creator to deal with scopus weirdness.
Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.Parameters¶
multiCite :
optional [bool]
DefaultFalse
, ifTrue
a tuple of Citations is returned with each having a different one of the records authors as the author
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶
key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶
The processed value of key
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
writeRecord
(f)¶ An
abstractmethod
, writes the record in its original form to infile
-
WOSRecord(ExtendedRecord)¶
-
class
metaknowledge.WOS.
WOSRecord
(inRecord, sFile='', sLine=0)¶ Class for full WOS records
It is meant to be immutable; many of the methods and attributes are evaluated when first called, not when the object is created, and the results are stored privately.
The record’s meta-data is stored in an ordered dictionary labeled by WOS tags. To access the raw data stored in the original record the tags() method can be used. To access data that has been processed and cleaned the attributes named after the tags are used.
Customizations¶
The
Record
’s hashing and equality testing are based on the WOS number (the tag is ‘UT’, and also called the accession number). They are strings starting with'WOS:'
and followed by 15 or so numbers and letters, although both the length and character set are known to vary. The numbers are unique to each record so are used for comparisons. If a record isbad
all equality checks returnFalse
.When converted to a string the records title is used so for a record
R
,R.TI == R.title == str(R)
and its representation uses the WOS number instead of memory location.Attributes¶
When a record is created if the parsing of the WOS file failed it is marked as
bad
. Thebad
attribute is set to True and theerror
attribute is created to contain the exception object.Generally, to get the information from a Record its attributes should be used. For a Record
R
, callingR.CR
causes citations() from the the tagProcessing module to be called on the contents of the raw ‘CR’ field. Then the result is saved and returned. In this case, a list of Citation objects is returned. You can also callR.citations
to get the same effect, as each known field tag has a longer name (currently there are 61 field tags). These names are meant to make accessing tags more readable and mapping from tag to name can be found in the tagToFull dict. If a tag is known (in tagToFull) but not in the raw dataNone
is returned instead. Most tags when cleaned return a string or list of strings, the exact results can be found in the help for the particular function.The attribute
authors
is also defined as a convenience and returns the same as ‘AF’ or if that is not found ‘AU’.__Init__¶
Records are generally created as collections in Recordcollections, and not as individual objects. If you wish to create one on its own it is possible, the arguments are as follows.
Parameters¶
inRecord:
files stream, dict, str or itertools.chain
If it is a file stream the file must be open at the location of the first tag in the record, usually ‘PT’, and the file will be read until ‘ER’ is found, which indicates the end of the record in the file.
If a dict is passed the dictionary is used as the database of fields and tags, so each key is considered a WOS tag and each value a list of the lines of the original associated with the tag. This is the same form of dict that recordParser returns.
For a string the input must be the raw textual data of a single record in the WOS style, like the file stream it must start at the first tag and end in
'ER'
.itertools.chain is treated identically to a file stream and is used by RecordCollections.
sFile :
optional [str]
Is the name of the file the raw data was in, by default it is blank. It is mostly used to make error messages more informative.sLine :
optional [int]
Is the line the record starts on in the raw data file. It is mostly used to make error messages more informative.-
UT
¶ Returns the UT tag (WOS number) of the record
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
wosString
¶ Returns the WOS number (UT tag) of the record
-
writeRecord
(infile)¶ Writes to infile the original contents of the Record. This is intended for use by RecordCollections to write to file. What is written to infile is bit for bit identical to the original record file (if utf-8 is used). No newline is inserted above the write but the last character is a newline.
-
Functions¶
-
metaknowledge.citation.
filterNonJournals
(citesLst, invert=False) Removes the
Citations
from citesLst that are not journalsParameters¶
citesLst :
list [Citation]
A list of citations to be filteredinvert :
optional [bool]
DefaultFalse
, ifTrue
non-journals will be kept instead of journals
-
metaknowledge.constants.
isInteractive
()¶ A basic check of if the program is running in interactive mode
-
metaknowledge.diffusion.
diffusionAddCountsFromSource
(grph, source, target, nodeType='citations', extraType=None, diffusionLabel='DiffusionCount', extraKeys=None, countsDict=None, extraMapping=None)¶ Does a diffusion using diffusionCount() and updates grph with it, using the nodes in the graph as keys in the diffusion, i.e. the source. The name of the attribute the counts are added to is given by diffusionLabel. If the graph is not composed of citations from the source and instead is another tag nodeType needs to be given the tag string.
Parameters¶
grph :
networkx Graph
The graph to be updatedsource :
RecordCollection
TheRecordCollection
that created grphtarget :
RecordCollection
TheRecordCollection
that will be countednodeType :
optional [str]
default'citations'
, the tag that constants the values used to create grphReturns¶
dict[:int]
The counts dictioanry used to add values to grph. Note grph is modified by the function and the return is done in case you need it.
-
metaknowledge.diffusion.
diffusionCount
(source, target, sourceType='raw', extraValue=None, pandasFriendly=False, compareCounts=False, numAuthors=True, useAllAuthors=True, _ProgBar=None, extraMapping=None)¶ Takes in two RecordCollections and produces a
dict
counting the citations of source by the Records of target. By default thedict
usesRecord
objects as keys but this can be changed with the sourceType keyword to any of the WOS tags.Parameters¶
source :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
being citedtarget :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
citing those in sourcesourceType :
optional [str]
default'raw'
, if'raw'
the returneddict
will containRecords
as keys. If it is a WOS tag the keys will be of that type.pandasFriendly :
optional [bool]
defaultFalse
, makes the output be a dict with two keys one"Record"
is the list of Records ( or data type requested by sourceType) the other is their occurrence counts as"Counts"
. The lists are the same length.compareCounts :
optional [bool]
defaultFalse
, ifTrue
the diffusion analysis will be run twice, first with source and target setup like the default (global scope) then using only the sourceRecordCollection
(local scope).extraValue :
optional [str]
default
None
, if a tag the returned dictionary will haveRecords
mapped to maps, these maps will map the entries for the tag to counts. If pandasFriendly is alsoTrue
the resultant dictionary will have an additional column called'year'
. This column will contain the year the citations occurred, in addition the Records entries will be duplicated for each year they occur in.For example if
'year'
was given then the count for a singleRecord
could be{1990 : 1, 2000 : 5}
useAllAuthors :
optional [bool]
defaultTrue
, ifFalse
only the first author will be used to generate theCitations
for the sourceRecords
Returns¶
dict[:int]
A dictionary with the type given by sourceType as keys and integers as values.
If compareCounts is
True
the values are tuples with the first integer being the diffusion in the target and the second the diffusion in the source.If pandasFriendly is
True
the returned dict has keys with the names of the WOS tags and lists with their values, i.e. a table with labeled columns. The counts are in the column named"TargetCount"
and if compareCounts the local count is in a column called"SourceCount"
.
-
metaknowledge.diffusion.
diffusionGraph
(source, target, weighted=True, sourceType='raw', targetType='raw', labelEdgesBy=None)¶ Takes in two RecordCollections and produces a graph of the citations of source by the Records in target. By default the nodes in the are
Record
objects but this can be changed with the sourceType and targetType keywords. The edges of the graph go from the target to the source.Each node on the output graph has two boolean attributes,
"source"
and"target"
indicating if they are targets or sources. Note, if the types of the sources and targets are different the attributes will not be checked for overlap of the other type. e.g. if the source type is'TI'
(title) and the target type is'UT'
(WOS number), and there is some overlap of the targets and sources. Then the Record corresponding to a source node will not be checked for being one of the titles of the targets, only its WOS number will be considered.Parameters¶
source :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
being citedtarget :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
citing those in sourceweighted :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute'weight'
giving the number of times the source has referenced the target.sourceType :
optional [str]
Default
'raw'
, if'raw'
the returned graph will containRecords
as source nodes.If Records are not wanted then it can be set to a WOS tag, such as
'SO'
(for journals ), to make the nodes into the type of object returned by that tag from Records.targetType :
optional [str]
Default
'raw'
, if'raw'
the returned graph will containRecords
as target nodes.If Records are not wanted then it can be set to a WOS tag, such as
'SO'
(for journals ), to make the nodes into the type of object returned by that tag from Records.labelEdgesBy :
optional [str]
Default
None
, if a WOS tag (or long name of WOS tag) then the edges of the output graph will have a attribute'key'
that is the value of the referenced tag, of sourceRecord
, i.e. if'PY'
is given then each edge will have a'key'
value equal to the publication year of the source.This option will cause the output graph to be an
MultiDiGraph
and is likely to result in parallel edges. If aRecord
has multiple values for at tag (e.g.'AF'
) the each tag will create its own edge.Returns¶
networkx Directed Graph or networkx multi Directed Graph
A directed graph of the diffusion network, labelEdgesBy is used the graph will allow parallel edges.
-
metaknowledge.diffusion.
makeNodeID
(Rec, ndType, extras=None)¶ Helper to make a node ID, extras is currently not used
-
metaknowledge.graphHelpers.
dropEdges
(grph, minWeight=-inf, maxWeight=inf, parameterName='weight', ignoreUnweighted=False, dropSelfLoops=False)¶ Modifies grph by dropping edges whose weight is not within the inclusive bounds of minWeight and maxWeight, i.e after running grph will only have edges whose weights meet the following inequality: minWeight <= edge’s weight <= maxWeight. A
Keyerror
will be raised if the graph is unweighted unless ignoreUnweighted isTrue
, the weight is determined by examining the attribute parameterName.Note: none of the default options will result in grph being modified so only specify the relevant ones, e.g.
dropEdges(G, dropSelfLoops = True)
will remove only the self loops fromG
.Parameters¶
grph :
networkx Graph
The graph to be modified.minWeight :
optional [int or double]
default-inf
, the minimum weight for an edge to be kept in the graph.maxWeight :
optional [int or double]
defaultinf
, the maximum weight for an edge to be kept in the graph.parameterName :
optional [str]
default'weight'
, key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correctignoreUnweighted :
optional [bool]
defaultFalse
, ifTrue
unweighted edges will keptdropSelfLoops :
optional [bool]
defaultFalse
, ifTrue
self loops will be removed regardless of their weight
-
metaknowledge.graphHelpers.
dropNodesByCount
(grph, minCount=-inf, maxCount=inf, parameterName='count', ignoreMissing=False)¶ Modifies grph by dropping nodes that do not have a count that is within inclusive bounds of minCount and maxCount, i.e after running grph will only have nodes whose degrees meet the following inequality: minCount <= node’s degree <= maxCount.
Count is determined by the count attribute, parameterName, and if missing will result in a
KeyError
being raised. ignoreMissing can be set toTrue
to suppress the error.minCount and maxCount default to negative and positive infinity respectively so without specifying either the output should be the input
Parameters¶
grph :
networkx Graph
The graph to be modified.minCount :
optional [int or double]
default-inf
, the minimum Count for an node to be kept in the graph.maxCount :
optional [int or double]
defaultinf
, the maximum Count for an node to be kept in the graph.parameterName :
optional [str]
default'count'
, key to count field in the nodes’s attribute dictionary, the default is the same thoughout metaknowledge so is likely to be correct.ignoreMissing :
optional [bool]
defaultFalse
, ifTrue
nodes missing a count will be kept in the graph instead of raising an exception
-
metaknowledge.graphHelpers.
dropNodesByDegree
(grph, minDegree=-inf, maxDegree=inf, useWeight=True, parameterName='weight', includeUnweighted=True)¶ Modifies grph by dropping nodes that do not have a degree that is within inclusive bounds of minDegree and maxDegree, i.e after running grph will only have nodes whose degrees meet the following inequality: minDegree <= node’s degree <= maxDegree.
Degree is determined in two ways, the default useWeight is the weight attribute of the edges to a node will be summed, the attribute’s name is parameterName otherwise the number of edges touching the node is used. If includeUnweighted is
True
then useWeight will assign a degree of 1 to unweighted edges.Parameters¶
grph :
networkx Graph
The graph to be modified.minDegree :
optional [int or double]
default-inf
, the minimum degree for an node to be kept in the graph.maxDegree :
optional [int or double]
defaultinf
, the maximum degree for an node to be kept in the graph.useWeight :
optional [bool]
defaultTrue
, ifTrue
the the edge weights will be summed to get the degree, ifFalse
the number of edges will be used to determine the degree.parameterName :
optional [str]
default'weight'
, key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correct.includeUnweighted :
optional [bool]
defaultTrue
, ifTrue
edges with no weight will be considered to have a weight of 1, ifFalse
they will cause aKeyError
to be raised.
-
metaknowledge.graphHelpers.
getNodeDegrees
(grph, weightString='weight', strictMode=False, returnType=<class 'int'>, edgeType='bi')¶ Retunrs a dictionary of nodes to their degrees, the degree is determined by adding the weight of edge with the weight being the string weightString that gives the name of the attribute of each edge containng thier weight. The Weights are then converted to the type returnType. If weightString is give as False instead each edge is counted as 1.
edgeType, takes in one of three strings: ‘bi’, ‘in’, ‘out’. ‘bi’ means both nodes on the edge count it, ‘out’ mans only the one the edge comes form counts it and ‘in’ means only the node the edge goes to counts it. ‘bi’ is the default. Use only on directional graphs as otherwise the selected nodes is random.
-
metaknowledge.graphHelpers.
getWeight
(grph, nd1, nd2, weightString='weight', returnType=<class 'int'>)¶ - A way of getting the weight of an edge with or without weight as a parameterreturns a the value of the weight parameter converted to returnType if it is given or 1 (also converted) if not
-
metaknowledge.graphHelpers.
graphStats
(G, stats=('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity'), makeString=True, sentenceString=False)¶ Returns a string or list containing statistics about the graph G.
graphStats() gives 6 different statistics: number of nodes, number of edges, number of isolates, number of loops, density and transitivity. The ones wanted can be given to stats. By default a string giving each stat on a different line it can also produce a sentence containing all the requested statistics or the raw values can be accessed instead by setting makeString to
False
.Parameters¶
G :
networkx Graph
The graph for the statistics to be determined ofstats :
optional [list or tuple [str]]
Default
('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity')
, a list or tuple containing any number or combination of the strings:"nodes"
,"edges"
,"isolates"
,"loops"
,"density"
and `”transitivity”``At least one occurrence of the corresponding string causes the statistics to be provided in the string output. For the non-string (tuple) output the returned tuple has the same length as the input and each output is at the same index as the string that requested it, e.g.
_stats_ = ("edges", "loops", "edges")
The return is a tuple with 2 elements the first and last of which are the number of edges and the second is the number of loops
makeString :
optional [bool]
DefaultTrue
, ifTrue
a string is returned ifFalse
a tuplesentenceString :
optional [bool]
DefaultFalse
: ifTrue
the returned string is a sentce, otherwise each value has a seperate line.
-
metaknowledge.graphHelpers.
mergeGraphs
(targetGraph, addedGraph, incrementedNodeVal='count', incrementedEdgeVal='weight')¶ A quick way of merging graphs, this is meant to be quick and is only intended for graphs generated by metaknowledge. This does not check anything and as such may cause unexpected results if the source and target were not generated by the same method.
mergeGraphs() will modify targetGraph in place by adding the nodes and edges found in the second, addedGraph. If a node or edge exists targetGraph is given precedence, but the edge and node attributes given by incrementedNodeVal and incrementedEdgeVal are added instead of being overwritten.
Parameters¶
targetGraph :
networkx Graph
the graph to be modified, it has precedence.addedGraph :
networkx Graph
the graph that is unmodified, it is added and does not have precedence.incrementedNodeVal :
optional [str]
default'count'
, the name of the count attribute for the graph’s nodes. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.incrementedEdgeVal :
optional [str]
default'weight'
, the name of the weight attribute for the graph’s edges. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.
-
metaknowledge.graphHelpers.
readGraph
(edgeList, nodeList=None, directed=False, idKey='ID', eSource='From', eDest='To')¶ Reads the files given by edgeList and nodeList and creates a networkx graph for the files.
This is designed only for the files produced by metaknowledge and is meant to be the reverse of writeGraph(), if this does not produce the desired results the networkx builtin networkx.read_edgelist() could be tried as it is aimed at a more general usage.
The read edge list format assumes the column named eSource (default
'From'
) is the source node, then the column eDest (default'To'
) givens the destination and all other columns are attributes of the edges, e.g. weight.The read node list format assumes the column idKey (default
'ID'
) is the ID of the node for the edge list and the resulting network. All other columns are considered attributes of the node, e.g. count.Note: If the names of the columns do not match those given to readGraph() a
KeyError
exception will be raised.Note: If nodes appear in the edgelist but not the nodeList they will be created silently with no attributes.
Parameters¶
edgeList :
str
a string giving the path to the edge list filenodeList :
optional [str]
defaultNone
, a string giving the path to the node list filedirected :
optional [bool]
defaultFalse
, ifTrue
the produced network is directed from eSource to eDestidKey :
optional [str]
default'ID'
, the name of the ID column in the node listeSource :
optional [str]
default'From'
, the name of the source column in the edge listeDest :
optional [str]
default'To'
, the name of the destination column in the edge list
-
metaknowledge.graphHelpers.
writeEdgeList
(grph, name, extraInfo=True, allSameAttribute=False, _progBar=None)¶ Writes an edge list of grph at the destination name.
The edge list has two columns for the source and destination of the edge,
'From'
and'To'
respectively, then, if edgeInfo isTrue
, for each attribute of the node another column is created.Note: If any edges are missing an attribute it will be left blank by default, enable allSameAttribute to cause a
KeyError
to be raised.Parameters¶
grph :
networkx Graph
The graph to be written to namename :
str
The name of the file to be writtenedgeInfo :
optional [bool]
DefaultTrue
, ifTrue
the attributes of each edge will be writtenallSameAttribute :
optional [bool]
DefaultFalse
, ifTrue
all the edges must have the same attributes or an exception will be raised. IfFalse
the missing attributes will be left blank.
-
metaknowledge.graphHelpers.
writeGraph
(grph, name, edgeInfo=True, typing=False, suffix='csv', overwrite=True, allSameAttribute=False)¶ Writes both the edge list and the node attribute list of grph to files starting with name.
The output files start with name, the file type (edgeList, nodeAttributes) then if typing is True the type of graph (directed or undirected) then the suffix, the default is as follows:
name_fileType.suffixBoth files are csv’s with comma delimiters and double quote quoting characters. The edge list has two columns for the source and destination of the edge,
'From'
and'To'
respectively, then, if edgeInfo isTrue
, for each attribute of the node another column is created. The node list has one column call “ID” with the node ids used by networkx and all other columns are the node attributes.To read back these files use readGraph() and to write only one type of lsit use writeEdgeList() or writeNodeAttributeFile().
Warning: this function will overwrite files, if they are in the way of the output, to prevent this set overwrite to
False
Note: If any nodes or edges are missing an attribute a
KeyError
will be raised.Parameters¶
grph :
networkx Graph
A networkx graph of the network to be written.name :
str
The start of the file name to be written, can include a path.edgeInfo :
optional [bool]
DefaultTrue
, ifTrue
the the attributes of each edge are written to the edge list.typing :
optional [bool]
DefaultFalse
, ifTrue
the directed ness of the graph will be added to the file names.suffix :
optional [str]
Default"csv"
, the suffix of the file.overwrite :
optional [bool]
DefaultTrue
, ifTrue
files will be overwritten silently, otherwise anOSError
exception will be raised.
-
metaknowledge.graphHelpers.
writeNodeAttributeFile
(grph, name, allSameAttribute=False, _progBar=None)¶ Writes a node attribute list of grph to the file given by the path name.
The node list has one column call
'ID'
with the node ids used by networkx and all other columns are the node attributes.Note: If any nodes are missing an attribute it will be left blank by default, enable allSameAttribute to cause a
KeyError
to be raised.Parameters¶
grph :
networkx Graph
The graph to be written to namename :
str
The name of the file to be writtenallSameAttribute :
optional [bool]
DefaultFalse
, ifTrue
all the nodes must have the same attributes or an exception will be raised. IfFalse
the missing attributes will be left blank.
-
metaknowledge.graphHelpers.
writeTnetFile
(grph, name, modeNameString, weighted=False, sourceMode=None, timeString=None, nodeIndexString='tnet-ID', weightString='weight')¶ Writes an edge list designed for reading by the R package tnet.
The networkx graph provided must be a pure two-mode network, the modes must be 2 different values for the node attribute accessed by modeNameString and all edges must be between different node types. Each node will be given an integer id, stored in the attribute given by nodeIndexString, these ids are then written to the file as the endpoints of the edges. Unless sourceMode is given which mode is the source (first column) and which the target (second column) is random.
Note the grph will be modified by this function, the ids of the nodes will be written to the graph at the attribute nodeIndexString.
Parameters¶
grph :
network Graph
The graph that will be written to namename :
str
The path of the file to writemodeNameString :
str
The name of the attribute grph’s modes are stored inweighted :
optional bool
DefaultFalse
, ifTrue
then the attribute weightString will be written to the weight columnsourceMode :
optional str
DefaultNone
, if given the name of the mode used for the source (first column) in the output filetimeString :
optional str
DefaultNone
, if present the attribute timeString of an edge will be written to the time column surrounded by double quotes (“).Note The format used by tnet for dates is very strict it uses the ISO format, down to the second and without time zones.
nodeIndexString :
optional str
Default'tnet-ID'
, the name of the attribute to save the id for each nodeweightString :
optional str
Default'weight'
, the name of the weight attribute
Record
is the base of various objects in mk, it is intended to be
used with things that have some sort of key-value relationship and is
basiclly a hashable python dict. It also has a few extra attributes
intead to make debugging and record keeping easier.
bad
cand be set toTrue
to indcate something is wrong with the issue being saved inerror
the exact details are left to designer_sourceFile
and_sourceLine
store the original file name and line number and are mostly for improving error messages_id
should be a unique string, that preferably can be used to identify the record from its source, although the latter is not always possible to do so, do your best. It is also what is used for hashing and comparison_fieldDict
contains the base mapping of keys to values, it is the dictionary
ExtendedRecord
is what WOSRecord and its ilk inherit from and
extends Record
by adding memoizing and processing of the fields.
ExtendedRecord
cannot be invoked directly as it has many abstract
(virtual) methods that define how the tags are to be proccesed what they
are called, what encoding to use when writing to disk, etc.
-
metaknowledge.mkRecord.
_bibFormatter
(s, maxLength)¶ - Formats a string, list or number to make it good for a bib file by:* if too long splits up the string correctly* tries to use the best quoting characters* expands lists into ‘ and ‘ seperated values, as per spec for authors fieldNote, this does not escape characters. LaTeX may have issues with the outputMax length splitting derived from https://www.cs.arizona.edu/~collberg/Teaching/07.231/BibTeX/bibtex.html
-
metaknowledge.recordCollection.
addToNetwork
(grph, nds, count, weighted, nodeType, nodeInfo, fullInfo, coreCitesDict, coreValues, detailedValues, addCR, recordToCite=True, headNd=None)¶ Addeds the citations nds to grph, according to the rules give by nodeType, fullInfo, etc.
headNd is the citation of the Record
-
metaknowledge.recordCollection.
expandRecs
(G, RecCollect, nodeType, weighted)¶ Expand all the citations from RecCollect
-
metaknowledge.recordCollection.
makeID
(citation, nodeType)¶ Makes the id, of the correct type for the network
-
metaknowledge.recordCollection.
makeNodeTuple
(citation, idVal, nodeInfo, fullInfo, nodeType, count, coreCitesDict, coreValues, detailedValues, addCR)¶ Makes a tuple of idVal and a dict of the selected attributes
-
metaknowledge.genders.nameGender.
nameStringGender
(s, noExcept=False)¶ Expects
first, last
Exceptions¶
The exceptions defined by metaknowledge are:
-
exception
metaknowledge.mkExceptions.
BadCitation
¶ Exception thrown by Citation
-
exception
metaknowledge.mkExceptions.
BadGrant
¶
-
exception
metaknowledge.mkExceptions.
BadInputFile
¶
-
exception
metaknowledge.mkExceptions.
BadProQuestFile
¶
-
exception
metaknowledge.mkExceptions.
BadProQuestRecord
¶
-
exception
metaknowledge.mkExceptions.
BadPubmedFile
¶
-
exception
metaknowledge.mkExceptions.
BadPubmedRecord
¶
-
exception
metaknowledge.mkExceptions.
BadRecord
¶
-
exception
metaknowledge.mkExceptions.
BadScopusFile
¶
-
exception
metaknowledge.mkExceptions.
BadScopusRecord
¶
-
exception
metaknowledge.mkExceptions.
BadWOSFile
¶ Exception thrown by wosParser for mis-formated files
-
exception
metaknowledge.mkExceptions.
BadWOSRecord
¶ Exception thrown by the record parser to indicate a mis-formated record. This occurs when some component of the record does not parse. The messages will be any of:
* _Missing field on line (line Number):(line)_, which indicates a line was to short, there should have been a tag followed by information * _End of file reached before ER_, which indicates the file ended before the 'ER' indicator appeared, 'ER' indicates the end of a record. This is often due to a copy and paste error. * _Duplicate tags in record_, which indicates the record had 2 or more lines with the same tag. * _Missing WOS number_, which indicates the record did not have a 'UT' tag.
Records with a BadWOSRecord error are likely incomplete or the combination of two or more single records.
-
exception
metaknowledge.mkExceptions.
CollectionTypeError
¶
-
exception
metaknowledge.mkExceptions.
GenderException
¶
-
exception
metaknowledge.mkExceptions.
GrantCollectionException
¶
-
exception
metaknowledge.mkExceptions.
JournalDataBaseError
¶
-
exception
metaknowledge.mkExceptions.
RCTypeError
¶
-
exception
metaknowledge.mkExceptions.
RCValueError
¶
-
exception
metaknowledge.mkExceptions.
RecordsNotCompatible
¶
-
exception
metaknowledge.mkExceptions.
TagError
¶
-
exception
metaknowledge.mkExceptions.
UnknownFile
¶
-
exception
metaknowledge.mkExceptions.
cacheError
¶ Exception raised when loading a cached RecordCollection fails, should only be seen inside metaknowledge and always be caught.
-
exception
metaknowledge.mkExceptions.
mkException
¶
Examples¶
Note: for a more recent example of using metaknowledge, please visit the NetLab blog.
metaknowledge is a python library for creating and analyzing scientific metadata. It uses records obtained from Web of Science (WOS), Scopus and other sources. It is intended to be usable by those who do not know much python. This page will be a short overview of its capabilities, to allow you to use it for your own work.
This document was made from a jupyter notebook, if you know how to use them, you can download the notebook here and the sample file is here if you wish to have an interactive version of this page. Now let’s begin.
About Jupyter Notebooks¶
This document was made from a jupyter notebook and can show and run python code. The document is broken up into what are called cells, each cell is either code, output, or markdown (text). For example this cell is markdown, which means it is plain text with a couple small formatting things, like the link in the first sentence. You can change the cell type using the dropdown menu at the top of the page.
[1]:
#This cell is python
#The cell below it is output
print("This is an output cell")
This is an output cell
The code cells contain python code that you can edit and run your self. Try changing the one above.
Importing¶
First you need to import the metaknowledge package
[2]:
import metaknowledge as mk
And you will often need the networkx package
[3]:
import networkx as nx
And matplotlib to display the graphs and to make them look nice when displayed
[4]:
import matplotlib.pyplot as plt
%matplotlib inline
metaknowledge also has a matplotlib based graph visualizer that will be used sometimes
[5]:
import metaknowledge.visual as mkv
These lines of code will be at the top of all the other lessons as they are what let us use metaknowledge.
Reading Files¶
First we need to import metaknowledge like we saw in lesson 1.
[1]:
import metaknowledge as mk
we only need metaknowledge for now so no need to import everything
The files from the Web of Science (WOS) can be loaded into a RecordCollections by creating a RecordCollection
with the path to the files given to it as a string.
[2]:
RC = mk.RecordCollection("savedrecs.txt")
repr(RC)
[2]:
'savedrecs'
You can also read a whole directory, in this case it is reading the current working directory
[3]:
RC = mk.RecordCollection(".")
repr(RC)
[3]:
'files-from-.'
metaknowledge can detect if a file is a valid WOS file or not and will read the entire directory and load only those that have the right header. You can also tell it to only read a certain type of file, by using the extension argument.
[4]:
RC = mk.RecordCollection(".", extension = "txt")
repr(RC)
[4]:
'txt-files-from-.'
Now you have a RecordCollection
composed of all the WOS records in the selected file(s).
[5]:
print("RC is a " + str(RC))
RC is a Collection of 32 records
You might have noticed I used two different ways to display the RecordCollection
. repr(RC)
will give you where metaknowledge thinks the collection came from. While str(RC)
will give you a nice string containing the number of Records
.
Objects¶
In Python everything is an object thus everything metaknowledge produces is an object. There are three objects that have been created specifically for it, objects created this way are call classes. The three are Record
a single WOS record, RecordCollection
a group of Records
and Citation
a single WOS citation.
Lets import metaknowledge and read a file
[1]:
import metaknowledge as mk
RC = mk.RecordCollection('../savedrecs.txt') # '..' is one directory above the current one
Now we can look at how the different objects relate to this file.
Record
object¶
Record is an object that contains a simple WOS record, for example a journal article, book, or conference proceedings. They are what RecordCollections contain. To see an individual Record at random from a RecordCollection
you can use peak()
[2]:
R = RC.peak()
A single Record
can give you all the information it contains about its record. If for example you want its authors.
[3]:
print(R.authorsFull)
print(R.AF)
['BREVIK, I']
['BREVIK, I']
Converting a Record
to a string will give its title
[4]:
print(R)
EXPERIMENTS IN PHENOMENOLOGICAL ELECTRODYNAMICS AND THE ELECTROMAGNETIC ENERGY-MOMENTUM TENSOR
If you try to access a tag the Record
does not have it will return None
[5]:
print(R.GP)
None
There are two ways of getting each tag, one is using the WOS 2 letter abbreviation and the second is to use the human readable name. There is no standard for the human readable names, so they are specific to metaknowledge. To see how the WOS names map to the long names look at tagFuncs. If you want all the tags a Record
has use iter.
[6]:
print(R.__iter__())
['PT', 'AU', 'AF', 'TI', 'SO', 'LA', 'DT', 'C1', 'CR', 'NR', 'TC', 'Z9', 'PU', 'PI', 'PA', 'SN', 'J9', 'JI', 'PY', 'VL', 'IS', 'BP', 'EP', 'DI', 'PG', 'WC', 'SC', 'GA', 'UT']
RecordCollection
object¶
RecordCollection is the object that metaknowledge uses the most. It is your interface with the data you want.
To iterate over all of the Records
you can use a for loop
[7]:
for R in RC:
print(R)
EXPERIMENTS IN PHENOMENOLOGICAL ELECTRODYNAMICS AND THE ELECTROMAGNETIC ENERGY-MOMENTUM TENSOR
OBSERVATION OF SHIFTS IN TOTAL REFLECTION OF A LIGHT-BEAM BY A MULTILAYERED STRUCTURE
ANGULAR SPECTRUM AS AN ELECTRICAL NETWORK
SHIFTS OF COHERENT-LIGHT BEAMS ON REFLECTION AT PLANE INTERFACES BETWEEN ISOTROPIC MEDIA
DISCUSSIONS OF PROBLEM OF PONDEROMOTIVE FORCES
A Novel Method for Enhancing Goos-Hanchen Shift in Total Internal Reflection
Optical properties of nanostructured thin films
Simple technique for measuring the Goos-Hanchen effect with polarization modulation and a position-sensitive detector
CONSERVATION OF ANGULAR MOMENT WITH SIX COMPONENTS AND ASYMMETRICAL IMPULSE ENERGY TENSORS
INTERFERENCE THEORY OF REFLECTION FROM MULTILAYERED MEDIA
Longitudinal and transverse effects of nonspecular reflection
TRANSVERSE DISPLACEMENT OF A TOTALLY REFLECTED LIGHT-BEAM AND PHASE-SHIFT METHOD
MECHANICAL INTERPRETATION OF SHIFTS IN TOTAL REFLECTION OF SPINNING PARTICLES
WHY ENERGY FLUX AND ABRAHAMS PHOTON MOMENTUM ARE MACROSCOPICALLY SUBSTITUTED FOR MOMENTUM DENSITY AND MINKOWSKIS PHOTON MOMENTUM
SPIN ANGULAR-MOMENTUM OF A FIELD INTERACTING WITH A PLANE INTERFACE
Numerical study of the displacement of a three-dimensional Gaussian beam transmitted at total internal reflection. Near-field applications
LONGITUDINAL AND TRANSVERSE DISPLACEMENTS OF A BOUNDED MICROWAVE BEAM AT TOTAL INTERNAL-REFLECTION
EXCHANGED MOMENTUM BETWEEN MOVING ATOMS AND A SURFACE-WAVE - THEORY AND EXPERIMENT
ASYMMETRICAL MOMENTUM-ENERGY TENSORS AND 6-COMPONENT ANGULAR-MOMENTUM IN PROBLEM CONCERNING 2 PHOTON MOMENTA AND MAGNETODYNAMIC EFFECT PROBLEM
Experimental observation of the Imbert-Fedorov transverse displacement after a single total reflection
RESONANCE EFFECTS ON TOTAL INTERNAL-REFLECTION AND LATERAL (GOOS-HANCHEN) BEAM DISPLACEMENT AT THE INTERFACE BETWEEN NONLOCAL AND LOCAL DIELECTRIC
Goos-Hanchen shift as a probe in evanescent slab waveguide sensors
THEORETICAL NOTES ON AMPLIFICATION OF TRANSVERSE SHIFT BY TOTAL REFLECTION ON MULTILAYERED SYSTEM
INTERNAL PHOTON IMPULSE OF DIELECTRIC AND ON COUPLE APPLIED TO ANISOTROPIC CRYSTAL
SPIN ANGULAR-MOMENTUM OF A FIELD INTERACTING WITH A PLANE INTERFACE
CALCULATION AND MEASUREMENT OF FORCES AND TORQUES APPLIED TO UNIAXIAL CRYSTAL BY EXTRAORDINARY WAVE
Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes
PREDICTION OF A RESONANCE-ENHANCED LASER-BEAM DISPLACEMENT AT TOTAL INTERNAL-REFLECTION IN SEMICONDUCTORS
GENERAL STUDY OF DISPLACEMENTS AT TOTAL REFLECTION
NONLINEAR TOTALLY REFLECTING PRISM COUPLER - THERMOMECHANIC EFFECTS AND INTENSITY-DEPENDENT REFRACTIVE-INDEX OF THIN-FILMS
DISPLACEMENT OF A TOTALLY REFLECTED LIGHT-BEAM - FILTERING OF POLARIZATION STATES AND AMPLIFICATION
Transverse displacement at total reflection near the grazing angle: a way to discriminate between theories
The individual Records
are index by their WOS numbers so you can access a specific one in the collection if you know its number.
[8]:
RC.getWOS("WOS:A1979GV55600001")
[8]:
<metaknowledge.record.Record at 0x7f07784be860>
Citation
object¶
Citation is an object to contain the results of parsing a citation. They can be created from a Record
[9]:
Cite = R.createCitation()
print(Cite)
Pillon F, 2005, APPL PHYS B-LASERS O, V80, P355, DOI 10.1007/s00340-005-1728-2
Citations
allow for the raw strings of citations to be manipulated easily by metaknowledge.
Filtering¶
The for loop shown above is the main way to filter a RecordCollection, that said there are a few builtin filters, e.g. yearSplit(), but the for loop is an easily generalized way of filtering that is relatively simple to read so it the main way you should filter. An example of the workflow is as follows:
First create a new RecordCollection
[10]:
RCfiltered = mk.RecordCollection()
Then add the records that meet your condition, in this case that their title’s start with 'A'
[11]:
for R in RC:
if R.title[0] == 'A':
RCfiltered.addRec(R)
[12]:
print(RCfiltered)
Collection of 3 records
Now you have a RecordCollection RCfiltered
of all the Records
whose titles begin with 'A'
.
One note about implementing this, the above code does not handle the case in which the title is missing i.e. R.title
is None
. You will have to deal with this on your own.
Two builtin functions to filter collections are yearSplit() and localCitesOf(). To get a RecordCollection of all Records between 1970 and 1979:
[13]:
RC70 = RC.yearSplit(1970, 1979)
print(RC70)
Collection of 19 records
The second function localCitesOf() takes in an object that a Citation can be created from and returns a RecordCollection of all the Records that cite it. So to see all the records that cite "Yariv A., 1971, INTRO OPTICAL ELECTR"
.
[14]:
RCintroOpt = RC.localCitesOf("Yariv A., 1971, INTRO OPTICAL ELECTR")
print(RCintroOpt)
Collection of 1 records
Exporting RecordCollections¶
Now you have a filtered RecordCollection you can write it as a file with writeFile()
[15]:
RCfiltered.writeFile("Records_Starting_with_A.txt")
The written file is identical to one of those produced by WOS.
If you wish to have a more useful file use writeCSV() which creates a CSV file of all the tags as columns and the Records as rows. IF you only care about a few tags the onlyTheseTags
argument allows you to control the tags.
[16]:
selectedTags = ['TI', 'UT', 'CR', 'AF']
This will give only the title, WOS number, citations, and authors.
[17]:
RCfiltered.writeCSV("Records_Starting_with_A.csv", onlyTheseTags = selectedTags)
The last export feature is for using metaknowledge with other packages, in particular pandas, which you will learn about later, but others should also work. makeDict() creates a dictionary with tags as keys and lists as values with each index of the lists corresponding to a Record. pandas can accept these directly to make DataFrames.
[18]:
import pandas
recDataFrame = pandas.DataFrame(RC.makeDict())
Making a network¶
For this class most of the types of network you will want to make can be produced by metaknowledge. The first three co-citation network, citation network and co-author network are specialized versions of the last three one-mode network, two-mode network and multi-mode network.
First we need to import metaknowledge and because we will be dealing with graphs the graphs package networkx as should be imported
[1]:
import metaknowledge as mk
import networkx as nx
And so we can visualize the graphs
[2]:
import matplotlib.pyplot as plt
%matplotlib inline
import metaknowledge.contour.plotting as mkv
Before we start we should also get a RecordCollection
to work with.
[3]:
RC = mk.RecordCollection('../savedrecs.txt')
Now lets look at the different types of graph.
Making a co-citation network¶
To make a basic co-citation network of Records use networkCoCitation().
[4]:
CoCitation = RC.networkCoCitation()
print(mk.graphStats(CoCitation, makeString = True)) #makestring by default is True so it is not strictly necessary to include
The graph has 601 nodes, 19492 edges, 0 isolates, 4 self loops, a density of 0.108109 and a transitivity of 0.691662
graphStats() is a function to extract some of the statists of a graph and make them into a nice string.
CoCitation
is now a networkx graph of the co-citation network, with the hashes of the Citations
as nodes and the full citations stored as an attributes. Lets look at one node
[5]:
CoCitation.nodes(data = True)[0]
[5]:
(5308678917494226943,
{'count': 1, 'info': 'CAVALLERI G, 1974, LETT NUOVO CIMENTO, V12, P626'})
and an edge
[6]:
CoCitation.edges(data = True)[0]
[6]:
(5308678917494226943, 7204849785423671553, {'weight': 1})
All the graphs metaknowledge use are networkx graphs, a few functions to trim them are implemented in metaknowledge, here is the example section, but many useful functions are implemented by it. Read the documentation here for more information.
The networkCoCitation()
function has many options for filtering and determining the nodes. The default is to use the Citations
themselves. If you wanted to make a network of co-citations of journals you would have to make the node type 'journal'
and remove the non-journals.
[7]:
coCiteJournals = RC.networkCoCitation(nodeType = 'journal', dropNonJournals = True)
print(mk.graphStats(coCiteJournals))
The graph has 89 nodes, 1383 edges, 0 isolates, 40 self loops, a density of 0.353166 and a transitivity of 0.640306
Lets take a look at the graph after a quick spring layout
[8]:
nx.draw_spring(coCiteJournals)

A bit basic but gives a general idea. If you want to make a much better looking and more informative visualization you could try gephi or visone. Exporting to them is covered below in Exporting graphs.
Making a citation network¶
The networkCitation() method is nearly identical to networkCoCitation()
in its parameters. It has one additional keyword argument directed
that controls if it produces a directed network. Read Making a co-citation network to learn more about networkCitation()
.
One small example is still worth providing. If you want to make a network of the citations of years by other years and have the letter 'A'
in them then you would write:
[9]:
citationsA = RC.networkCitation(nodeType = 'year', keyWords = ['A'])
print(mk.graphStats(citationsA))
The graph has 18 nodes, 24 edges, 0 isolates, 1 self loops, a density of 0.0784314 and a transitivity of 0.0344828
[10]:
nx.draw_spring(citationsA, with_labels = True)

Making a co-author network¶
The networkCoAuthor() function produces the co-authorship network of the RecordCollection as is used as shown
[11]:
coAuths = RC.networkCoAuthor()
print(mk.graphStats(coAuths))
The graph has 45 nodes, 46 edges, 9 isolates, 0 self loops, a density of 0.0464646 and a transitivity of 0.822581
Making a one-mode network¶
In addition to the specialized network generators metaknowledge lets you make a one-mode co-occurence network of any of the WOS tags, with the oneModeNetwork() function. For examples the WOS subject tag 'WC'
can be examined.
[12]:
wcCoOccurs = RC.oneModeNetwork('WC')
print(mk.graphStats(wcCoOccurs))
The graph has 9 nodes, 3 edges, 3 isolates, 0 self loops, a density of 0.0833333 and a transitivity of 0
[13]:
nx.draw_spring(wcCoOccurs, with_labels = True)

Making a two-mode network¶
If you wish to study the relationships between 2 tags you can use the twoModeNetwork() function which creates a two mode network showing the connections between the tags. For example to look at the connections between titles('TI'
) and subjects ('WC'
)
[14]:
ti_wc = RC.twoModeNetwork('WC', 'title')
print(mk.graphStats(ti_wc))
The graph has 40 nodes, 35 edges, 0 isolates, 0 self loops, a density of 0.0448718 and a transitivity of 0
The network is directed by default with the first tag going to the second.
[15]:
mkv.quickVisual(ti_wc, showLabel = False) #default is False as there are usually lots of labels

quickVisual() makes a graph with the different types of nodes coloured differently and a couple other small visual tweaks from networkx’s draw_spring
.
Making a multi-mode network¶
For any number of tags the nModeNetwork() function will do the same thing as the oneModeNetwork()
but with any number of tags and it will keep track of their types. So to look at the co-occurence of titles 'TI'
, WOS number 'UT'
and authors 'AU'
.
[16]:
tags = ['TI', 'UT', 'AU']
multiModeNet = RC.nModeNetwork(tags)
mk.graphStats(multiModeNet)
[16]:
'The graph has 108 nodes, 163 edges, 0 isolates, 0 self loops, a density of 0.0282105 and a transitivity of 0.443946'
[17]:
mkv.quickVisual(multiModeNet)

Beware this can very easily produce hairballs
[18]:
tags = mk.tagsAndNames #All the tags, twice
sillyMultiModeNet = RC.nModeNetwork(tags)
mk.graphStats(sillyMultiModeNet)
[18]:
'The graph has 1184 nodes, 59573 edges, 0 isolates, 1184 self loops, a density of 0.0850635 and a transitivity of 0.492152'
[19]:
mkv.quickVisual(sillyMultiModeNet)

Post processing graphs¶
If you wish to apply a well known algorithm or process to a graph networkx is a good place to look as they do a good job at implementing them.
One of the features it lacks though is pruning of graphs, metaknowledge has these capabilities. To remove edges outside of some weight range, use dropEdges(). For example if you wish to remove the self loops, edges with weight less than 2 and weight higher than 10 from coCiteJournals
.
[20]:
minWeight = 3
maxWeight = 10
proccessedCoCiteJournals = mk.dropEedges(coCiteJournals, minWeight, maxWeight, dropSelfLoops = True)
mk.graphStats(proccessedCoCiteJournals)
[20]:
'The graph has 89 nodes, 466 edges, 1 isolates, 0 self loops, a density of 0.118999 and a transitivity of 0.213403'
Then to remove all the isolates, i.e. nodes with degree less than 1, use dropNodesByDegree()
[21]:
proccessedCoCiteJournals = mk.dropNodesByDegree(proccessedCoCiteJournals, 1)
mk.graphStats(proccessedCoCiteJournals)
[21]:
'The graph has 88 nodes, 466 edges, 0 isolates, 0 self loops, a density of 0.121735 and a transitivity of 0.213403'
Now before the processing the graph can be seen here. After the processing it looks like
[22]:
nx.draw_spring(proccessedCoCiteJournals)

Hm, it looks a bit thinner. Using a visualizer will make the difference a bit more noticeable.
Exporting graphs¶
Now you have a graph the last step is to write it to disk. networkx has a few ways of doing this, but they tend to be slow. metaknowledge can write an edge list and node attribute file that contain all the information of the graph. The function to do this is called writeGraph(). You give it the start of the file name and it makes two labeled files containing the graph.
[23]:
mk.writeGraph(proccessedCoCiteJournals, "FinalJournalCoCites")
These files are simple CSVs an can be read easily by most systems. If you want to read them back into Python the readGraph() function will do that.
[24]:
FinalJournalCoCites = mk.readGraph("FinalJournalCoCites_edgeList.csv", "FinalJournalCoCites_nodeAttributes.csv")
mk.graphStats(FinalJournalCoCites)
[24]:
'The graph has 88 nodes, 466 edges, 0 isolates, 0 self loops, a density of 0.121735 and a transitivity of 0.213403'
This is full example workflow for metaknowledge, the package is flexible and you hopefully will be able to customize it to do what you want (I assume you do not want the Records staring with ‘A’).
Command Line Tool¶
metaknowledge comes with a command-line application named metaknowledge
. This provides a simple interface to the python package and allows the generation of most of the networks along with ways to manage the records themselves.
Overview¶
To start the tool run:
$ metaknowledge
You will be asked for the location of the file or files to use. These can be given by paths to the files or paths to directories with the files. Note: if a directory is used all files with the proper header will be read.
You will then be asked what to do with the records:
A collection of 537 WOS records has been created
What do you wish to do with it:
1) Make a graph
2) Write the collection as a single WOS style file
3) Write the collection as a single WOS style file and make a graph
4) Write the collection as a single csv file
5) Write the collection as a single csv file and make a graph
6) Write all the citations to a single file
7) Go over non-journal citations
i) open python console
q) quit
What is your selection:
Select the option you want by typing the corresponding number or character and pressing enter. The menus after this step are controlled this way as well.
The second last option i)
will start an interactive python session will all the objects you have created thus far accessible, their names will be given when it starts.
The last option q)
will cause the program to exit. You can also quit at any time by pressing ctr-c
.
Questions?¶
If you find bugs, or have questions, please write to: