Functions¶
-
metaknowledge.citation.
filterNonJournals
(citesLst, invert=False) Removes the
Citations
from citesLst that are not journalsParameters¶
citesLst :
list [Citation]
A list of citations to be filteredinvert :
optional [bool]
DefaultFalse
, ifTrue
non-journals will be kept instead of journals
-
metaknowledge.constants.
isInteractive
()¶ A basic check of if the program is running in interactive mode
-
metaknowledge.diffusion.
diffusionAddCountsFromSource
(grph, source, target, nodeType='citations', extraType=None, diffusionLabel='DiffusionCount', extraKeys=None, countsDict=None, extraMapping=None)¶ Does a diffusion using diffusionCount() and updates grph with it, using the nodes in the graph as keys in the diffusion, i.e. the source. The name of the attribute the counts are added to is given by diffusionLabel. If the graph is not composed of citations from the source and instead is another tag nodeType needs to be given the tag string.
Parameters¶
grph :
networkx Graph
The graph to be updatedsource :
RecordCollection
TheRecordCollection
that created grphtarget :
RecordCollection
TheRecordCollection
that will be countednodeType :
optional [str]
default'citations'
, the tag that constants the values used to create grphReturns¶
dict[:int]
The counts dictioanry used to add values to grph. Note grph is modified by the function and the return is done in case you need it.
-
metaknowledge.diffusion.
diffusionCount
(source, target, sourceType='raw', extraValue=None, pandasFriendly=False, compareCounts=False, numAuthors=True, useAllAuthors=True, _ProgBar=None, extraMapping=None)¶ Takes in two RecordCollections and produces a
dict
counting the citations of source by the Records of target. By default thedict
usesRecord
objects as keys but this can be changed with the sourceType keyword to any of the WOS tags.Parameters¶
source :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
being citedtarget :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
citing those in sourcesourceType :
optional [str]
default'raw'
, if'raw'
the returneddict
will containRecords
as keys. If it is a WOS tag the keys will be of that type.pandasFriendly :
optional [bool]
defaultFalse
, makes the output be a dict with two keys one"Record"
is the list of Records ( or data type requested by sourceType) the other is their occurrence counts as"Counts"
. The lists are the same length.compareCounts :
optional [bool]
defaultFalse
, ifTrue
the diffusion analysis will be run twice, first with source and target setup like the default (global scope) then using only the sourceRecordCollection
(local scope).extraValue :
optional [str]
default
None
, if a tag the returned dictionary will haveRecords
mapped to maps, these maps will map the entries for the tag to counts. If pandasFriendly is alsoTrue
the resultant dictionary will have an additional column called'year'
. This column will contain the year the citations occurred, in addition the Records entries will be duplicated for each year they occur in.For example if
'year'
was given then the count for a singleRecord
could be{1990 : 1, 2000 : 5}
useAllAuthors :
optional [bool]
defaultTrue
, ifFalse
only the first author will be used to generate theCitations
for the sourceRecords
Returns¶
dict[:int]
A dictionary with the type given by sourceType as keys and integers as values.
If compareCounts is
True
the values are tuples with the first integer being the diffusion in the target and the second the diffusion in the source.If pandasFriendly is
True
the returned dict has keys with the names of the WOS tags and lists with their values, i.e. a table with labeled columns. The counts are in the column named"TargetCount"
and if compareCounts the local count is in a column called"SourceCount"
.
-
metaknowledge.diffusion.
diffusionGraph
(source, target, weighted=True, sourceType='raw', targetType='raw', labelEdgesBy=None)¶ Takes in two RecordCollections and produces a graph of the citations of source by the Records in target. By default the nodes in the are
Record
objects but this can be changed with the sourceType and targetType keywords. The edges of the graph go from the target to the source.Each node on the output graph has two boolean attributes,
"source"
and"target"
indicating if they are targets or sources. Note, if the types of the sources and targets are different the attributes will not be checked for overlap of the other type. e.g. if the source type is'TI'
(title) and the target type is'UT'
(WOS number), and there is some overlap of the targets and sources. Then the Record corresponding to a source node will not be checked for being one of the titles of the targets, only its WOS number will be considered.Parameters¶
source :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
being citedtarget :
RecordCollection
A metaknowledgeRecordCollection
containing theRecords
citing those in sourceweighted :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute'weight'
giving the number of times the source has referenced the target.sourceType :
optional [str]
Default
'raw'
, if'raw'
the returned graph will containRecords
as source nodes.If Records are not wanted then it can be set to a WOS tag, such as
'SO'
(for journals ), to make the nodes into the type of object returned by that tag from Records.targetType :
optional [str]
Default
'raw'
, if'raw'
the returned graph will containRecords
as target nodes.If Records are not wanted then it can be set to a WOS tag, such as
'SO'
(for journals ), to make the nodes into the type of object returned by that tag from Records.labelEdgesBy :
optional [str]
Default
None
, if a WOS tag (or long name of WOS tag) then the edges of the output graph will have a attribute'key'
that is the value of the referenced tag, of sourceRecord
, i.e. if'PY'
is given then each edge will have a'key'
value equal to the publication year of the source.This option will cause the output graph to be an
MultiDiGraph
and is likely to result in parallel edges. If aRecord
has multiple values for at tag (e.g.'AF'
) the each tag will create its own edge.Returns¶
networkx Directed Graph or networkx multi Directed Graph
A directed graph of the diffusion network, labelEdgesBy is used the graph will allow parallel edges.
-
metaknowledge.diffusion.
makeNodeID
(Rec, ndType, extras=None)¶ Helper to make a node ID, extras is currently not used
-
metaknowledge.graphHelpers.
dropEdges
(grph, minWeight=-inf, maxWeight=inf, parameterName='weight', ignoreUnweighted=False, dropSelfLoops=False)¶ Modifies grph by dropping edges whose weight is not within the inclusive bounds of minWeight and maxWeight, i.e after running grph will only have edges whose weights meet the following inequality: minWeight <= edge’s weight <= maxWeight. A
Keyerror
will be raised if the graph is unweighted unless ignoreUnweighted isTrue
, the weight is determined by examining the attribute parameterName.Note: none of the default options will result in grph being modified so only specify the relevant ones, e.g.
dropEdges(G, dropSelfLoops = True)
will remove only the self loops fromG
.Parameters¶
grph :
networkx Graph
The graph to be modified.minWeight :
optional [int or double]
default-inf
, the minimum weight for an edge to be kept in the graph.maxWeight :
optional [int or double]
defaultinf
, the maximum weight for an edge to be kept in the graph.parameterName :
optional [str]
default'weight'
, key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correctignoreUnweighted :
optional [bool]
defaultFalse
, ifTrue
unweighted edges will keptdropSelfLoops :
optional [bool]
defaultFalse
, ifTrue
self loops will be removed regardless of their weight
-
metaknowledge.graphHelpers.
dropNodesByCount
(grph, minCount=-inf, maxCount=inf, parameterName='count', ignoreMissing=False)¶ Modifies grph by dropping nodes that do not have a count that is within inclusive bounds of minCount and maxCount, i.e after running grph will only have nodes whose degrees meet the following inequality: minCount <= node’s degree <= maxCount.
Count is determined by the count attribute, parameterName, and if missing will result in a
KeyError
being raised. ignoreMissing can be set toTrue
to suppress the error.minCount and maxCount default to negative and positive infinity respectively so without specifying either the output should be the input
Parameters¶
grph :
networkx Graph
The graph to be modified.minCount :
optional [int or double]
default-inf
, the minimum Count for an node to be kept in the graph.maxCount :
optional [int or double]
defaultinf
, the maximum Count for an node to be kept in the graph.parameterName :
optional [str]
default'count'
, key to count field in the nodes’s attribute dictionary, the default is the same thoughout metaknowledge so is likely to be correct.ignoreMissing :
optional [bool]
defaultFalse
, ifTrue
nodes missing a count will be kept in the graph instead of raising an exception
-
metaknowledge.graphHelpers.
dropNodesByDegree
(grph, minDegree=-inf, maxDegree=inf, useWeight=True, parameterName='weight', includeUnweighted=True)¶ Modifies grph by dropping nodes that do not have a degree that is within inclusive bounds of minDegree and maxDegree, i.e after running grph will only have nodes whose degrees meet the following inequality: minDegree <= node’s degree <= maxDegree.
Degree is determined in two ways, the default useWeight is the weight attribute of the edges to a node will be summed, the attribute’s name is parameterName otherwise the number of edges touching the node is used. If includeUnweighted is
True
then useWeight will assign a degree of 1 to unweighted edges.Parameters¶
grph :
networkx Graph
The graph to be modified.minDegree :
optional [int or double]
default-inf
, the minimum degree for an node to be kept in the graph.maxDegree :
optional [int or double]
defaultinf
, the maximum degree for an node to be kept in the graph.useWeight :
optional [bool]
defaultTrue
, ifTrue
the the edge weights will be summed to get the degree, ifFalse
the number of edges will be used to determine the degree.parameterName :
optional [str]
default'weight'
, key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correct.includeUnweighted :
optional [bool]
defaultTrue
, ifTrue
edges with no weight will be considered to have a weight of 1, ifFalse
they will cause aKeyError
to be raised.
-
metaknowledge.graphHelpers.
getNodeDegrees
(grph, weightString='weight', strictMode=False, returnType=<class 'int'>, edgeType='bi')¶ Retunrs a dictionary of nodes to their degrees, the degree is determined by adding the weight of edge with the weight being the string weightString that gives the name of the attribute of each edge containng thier weight. The Weights are then converted to the type returnType. If weightString is give as False instead each edge is counted as 1.
edgeType, takes in one of three strings: ‘bi’, ‘in’, ‘out’. ‘bi’ means both nodes on the edge count it, ‘out’ mans only the one the edge comes form counts it and ‘in’ means only the node the edge goes to counts it. ‘bi’ is the default. Use only on directional graphs as otherwise the selected nodes is random.
-
metaknowledge.graphHelpers.
getWeight
(grph, nd1, nd2, weightString='weight', returnType=<class 'int'>)¶ - A way of getting the weight of an edge with or without weight as a parameterreturns a the value of the weight parameter converted to returnType if it is given or 1 (also converted) if not
-
metaknowledge.graphHelpers.
graphStats
(G, stats=('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity'), makeString=True, sentenceString=False)¶ Returns a string or list containing statistics about the graph G.
graphStats() gives 6 different statistics: number of nodes, number of edges, number of isolates, number of loops, density and transitivity. The ones wanted can be given to stats. By default a string giving each stat on a different line it can also produce a sentence containing all the requested statistics or the raw values can be accessed instead by setting makeString to
False
.Parameters¶
G :
networkx Graph
The graph for the statistics to be determined ofstats :
optional [list or tuple [str]]
Default
('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity')
, a list or tuple containing any number or combination of the strings:"nodes"
,"edges"
,"isolates"
,"loops"
,"density"
and `”transitivity”``At least one occurrence of the corresponding string causes the statistics to be provided in the string output. For the non-string (tuple) output the returned tuple has the same length as the input and each output is at the same index as the string that requested it, e.g.
_stats_ = ("edges", "loops", "edges")
The return is a tuple with 2 elements the first and last of which are the number of edges and the second is the number of loops
makeString :
optional [bool]
DefaultTrue
, ifTrue
a string is returned ifFalse
a tuplesentenceString :
optional [bool]
DefaultFalse
: ifTrue
the returned string is a sentce, otherwise each value has a seperate line.
-
metaknowledge.graphHelpers.
mergeGraphs
(targetGraph, addedGraph, incrementedNodeVal='count', incrementedEdgeVal='weight')¶ A quick way of merging graphs, this is meant to be quick and is only intended for graphs generated by metaknowledge. This does not check anything and as such may cause unexpected results if the source and target were not generated by the same method.
mergeGraphs() will modify targetGraph in place by adding the nodes and edges found in the second, addedGraph. If a node or edge exists targetGraph is given precedence, but the edge and node attributes given by incrementedNodeVal and incrementedEdgeVal are added instead of being overwritten.
Parameters¶
targetGraph :
networkx Graph
the graph to be modified, it has precedence.addedGraph :
networkx Graph
the graph that is unmodified, it is added and does not have precedence.incrementedNodeVal :
optional [str]
default'count'
, the name of the count attribute for the graph’s nodes. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.incrementedEdgeVal :
optional [str]
default'weight'
, the name of the weight attribute for the graph’s edges. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.
-
metaknowledge.graphHelpers.
readGraph
(edgeList, nodeList=None, directed=False, idKey='ID', eSource='From', eDest='To')¶ Reads the files given by edgeList and nodeList and creates a networkx graph for the files.
This is designed only for the files produced by metaknowledge and is meant to be the reverse of writeGraph(), if this does not produce the desired results the networkx builtin networkx.read_edgelist() could be tried as it is aimed at a more general usage.
The read edge list format assumes the column named eSource (default
'From'
) is the source node, then the column eDest (default'To'
) givens the destination and all other columns are attributes of the edges, e.g. weight.The read node list format assumes the column idKey (default
'ID'
) is the ID of the node for the edge list and the resulting network. All other columns are considered attributes of the node, e.g. count.Note: If the names of the columns do not match those given to readGraph() a
KeyError
exception will be raised.Note: If nodes appear in the edgelist but not the nodeList they will be created silently with no attributes.
Parameters¶
edgeList :
str
a string giving the path to the edge list filenodeList :
optional [str]
defaultNone
, a string giving the path to the node list filedirected :
optional [bool]
defaultFalse
, ifTrue
the produced network is directed from eSource to eDestidKey :
optional [str]
default'ID'
, the name of the ID column in the node listeSource :
optional [str]
default'From'
, the name of the source column in the edge listeDest :
optional [str]
default'To'
, the name of the destination column in the edge list
-
metaknowledge.graphHelpers.
writeEdgeList
(grph, name, extraInfo=True, allSameAttribute=False, _progBar=None)¶ Writes an edge list of grph at the destination name.
The edge list has two columns for the source and destination of the edge,
'From'
and'To'
respectively, then, if edgeInfo isTrue
, for each attribute of the node another column is created.Note: If any edges are missing an attribute it will be left blank by default, enable allSameAttribute to cause a
KeyError
to be raised.Parameters¶
grph :
networkx Graph
The graph to be written to namename :
str
The name of the file to be writtenedgeInfo :
optional [bool]
DefaultTrue
, ifTrue
the attributes of each edge will be writtenallSameAttribute :
optional [bool]
DefaultFalse
, ifTrue
all the edges must have the same attributes or an exception will be raised. IfFalse
the missing attributes will be left blank.
-
metaknowledge.graphHelpers.
writeGraph
(grph, name, edgeInfo=True, typing=False, suffix='csv', overwrite=True, allSameAttribute=False)¶ Writes both the edge list and the node attribute list of grph to files starting with name.
The output files start with name, the file type (edgeList, nodeAttributes) then if typing is True the type of graph (directed or undirected) then the suffix, the default is as follows:
name_fileType.suffixBoth files are csv’s with comma delimiters and double quote quoting characters. The edge list has two columns for the source and destination of the edge,
'From'
and'To'
respectively, then, if edgeInfo isTrue
, for each attribute of the node another column is created. The node list has one column call “ID” with the node ids used by networkx and all other columns are the node attributes.To read back these files use readGraph() and to write only one type of lsit use writeEdgeList() or writeNodeAttributeFile().
Warning: this function will overwrite files, if they are in the way of the output, to prevent this set overwrite to
False
Note: If any nodes or edges are missing an attribute a
KeyError
will be raised.Parameters¶
grph :
networkx Graph
A networkx graph of the network to be written.name :
str
The start of the file name to be written, can include a path.edgeInfo :
optional [bool]
DefaultTrue
, ifTrue
the the attributes of each edge are written to the edge list.typing :
optional [bool]
DefaultFalse
, ifTrue
the directed ness of the graph will be added to the file names.suffix :
optional [str]
Default"csv"
, the suffix of the file.overwrite :
optional [bool]
DefaultTrue
, ifTrue
files will be overwritten silently, otherwise anOSError
exception will be raised.
-
metaknowledge.graphHelpers.
writeNodeAttributeFile
(grph, name, allSameAttribute=False, _progBar=None)¶ Writes a node attribute list of grph to the file given by the path name.
The node list has one column call
'ID'
with the node ids used by networkx and all other columns are the node attributes.Note: If any nodes are missing an attribute it will be left blank by default, enable allSameAttribute to cause a
KeyError
to be raised.Parameters¶
grph :
networkx Graph
The graph to be written to namename :
str
The name of the file to be writtenallSameAttribute :
optional [bool]
DefaultFalse
, ifTrue
all the nodes must have the same attributes or an exception will be raised. IfFalse
the missing attributes will be left blank.
-
metaknowledge.graphHelpers.
writeTnetFile
(grph, name, modeNameString, weighted=False, sourceMode=None, timeString=None, nodeIndexString='tnet-ID', weightString='weight')¶ Writes an edge list designed for reading by the R package tnet.
The networkx graph provided must be a pure two-mode network, the modes must be 2 different values for the node attribute accessed by modeNameString and all edges must be between different node types. Each node will be given an integer id, stored in the attribute given by nodeIndexString, these ids are then written to the file as the endpoints of the edges. Unless sourceMode is given which mode is the source (first column) and which the target (second column) is random.
Note the grph will be modified by this function, the ids of the nodes will be written to the graph at the attribute nodeIndexString.
Parameters¶
grph :
network Graph
The graph that will be written to namename :
str
The path of the file to writemodeNameString :
str
The name of the attribute grph’s modes are stored inweighted :
optional bool
DefaultFalse
, ifTrue
then the attribute weightString will be written to the weight columnsourceMode :
optional str
DefaultNone
, if given the name of the mode used for the source (first column) in the output filetimeString :
optional str
DefaultNone
, if present the attribute timeString of an edge will be written to the time column surrounded by double quotes (“).Note The format used by tnet for dates is very strict it uses the ISO format, down to the second and without time zones.
nodeIndexString :
optional str
Default'tnet-ID'
, the name of the attribute to save the id for each nodeweightString :
optional str
Default'weight'
, the name of the weight attribute
Record
is the base of various objects in mk, it is intended to be
used with things that have some sort of key-value relationship and is
basiclly a hashable python dict. It also has a few extra attributes
intead to make debugging and record keeping easier.
bad
cand be set toTrue
to indcate something is wrong with the issue being saved inerror
the exact details are left to designer_sourceFile
and_sourceLine
store the original file name and line number and are mostly for improving error messages_id
should be a unique string, that preferably can be used to identify the record from its source, although the latter is not always possible to do so, do your best. It is also what is used for hashing and comparison_fieldDict
contains the base mapping of keys to values, it is the dictionary
ExtendedRecord
is what WOSRecord and its ilk inherit from and
extends Record
by adding memoizing and processing of the fields.
ExtendedRecord
cannot be invoked directly as it has many abstract
(virtual) methods that define how the tags are to be proccesed what they
are called, what encoding to use when writing to disk, etc.
-
metaknowledge.mkRecord.
_bibFormatter
(s, maxLength)¶ - Formats a string, list or number to make it good for a bib file by:* if too long splits up the string correctly* tries to use the best quoting characters* expands lists into ‘ and ‘ seperated values, as per spec for authors fieldNote, this does not escape characters. LaTeX may have issues with the outputMax length splitting derived from https://www.cs.arizona.edu/~collberg/Teaching/07.231/BibTeX/bibtex.html
-
metaknowledge.recordCollection.
addToNetwork
(grph, nds, count, weighted, nodeType, nodeInfo, fullInfo, coreCitesDict, coreValues, detailedValues, addCR, recordToCite=True, headNd=None)¶ Addeds the citations nds to grph, according to the rules give by nodeType, fullInfo, etc.
headNd is the citation of the Record
-
metaknowledge.recordCollection.
expandRecs
(G, RecCollect, nodeType, weighted)¶ Expand all the citations from RecCollect
-
metaknowledge.recordCollection.
makeID
(citation, nodeType)¶ Makes the id, of the correct type for the network
-
metaknowledge.recordCollection.
makeNodeTuple
(citation, idVal, nodeInfo, fullInfo, nodeType, count, coreCitesDict, coreValues, detailedValues, addCR)¶ Makes a tuple of idVal and a dict of the selected attributes
-
metaknowledge.genders.nameGender.
nameStringGender
(s, noExcept=False)¶ Expects
first, last