Functions¶

metaknowledge.citation.filterNonJournals(citesLst, invert=False): Removes the Citations from citesLst that are not journals

Parameters¶

citesLst : list [Citation]

A list of citations to be filtered

invert : optional [bool]

Default False, if True non-journals will be kept instead of journals

Returns¶

list [Citation]

A filtered list of Citations from citesLst

metaknowledge.constants.isInteractive()¶: A basic check of if the program is running in interactive mode

metaknowledge.diffusion.diffusionAddCountsFromSource(grph, source, target, nodeType='citations', extraType=None, diffusionLabel='DiffusionCount', extraKeys=None, countsDict=None, extraMapping=None)¶: Does a diffusion using diffusionCount() and updates grph with it, using the nodes in the graph as keys in the diffusion, i.e. the source. The name of the attribute the counts are added to is given by diffusionLabel. If the graph is not composed of citations from the source and instead is another tag nodeType needs to be given the tag string.

Parameters¶

grph : networkx Graph

The graph to be updated

source : RecordCollection

The RecordCollection that created grph

target : RecordCollection

The RecordCollection that will be counted

nodeType : optional [str]

default 'citations', the tag that constants the values used to create grph

Returns¶

dict[:int]

The counts dictioanry used to add values to grph. Note grph is modified by the function and the return is done in case you need it.

metaknowledge.diffusion.diffusionCount(source, target, sourceType='raw', extraValue=None, pandasFriendly=False, compareCounts=False, numAuthors=True, useAllAuthors=True, _ProgBar=None, extraMapping=None)¶: Takes in two RecordCollections and produces a dict counting the citations of source by the Records of target. By default the dict uses Record objects as keys but this can be changed with the sourceType keyword to any of the WOS tags.

Parameters¶

source : RecordCollection

A metaknowledge RecordCollection containing the Records being cited

target : RecordCollection

A metaknowledge RecordCollection containing the Records citing those in source

sourceType : optional [str]

default 'raw', if 'raw' the returned dict will contain Records as keys. If it is a WOS tag the keys will be of that type.

pandasFriendly : optional [bool]

default False, makes the output be a dict with two keys one "Record" is the list of Records ( or data type requested by sourceType) the other is their occurrence counts as "Counts". The lists are the same length.

compareCounts : optional [bool]

default False, if True the diffusion analysis will be run twice, first with source and target setup like the default (global scope) then using only the source RecordCollection (local scope).

extraValue : optional [str]

default None, if a tag the returned dictionary will have Records mapped to maps, these maps will map the entries for the tag to counts. If pandasFriendly is also True the resultant dictionary will have an additional column called 'year'. This column will contain the year the citations occurred, in addition the Records entries will be duplicated for each year they occur in.

For example if 'year' was given then the count for a single Record could be {1990 : 1, 2000 : 5}

useAllAuthors : optional [bool]

default True, if False only the first author will be used to generate the Citations for the source Records

Returns¶

dict[:int]

A dictionary with the type given by sourceType as keys and integers as values.

If compareCounts is True the values are tuples with the first integer being the diffusion in the target and the second the diffusion in the source.

If pandasFriendly is True the returned dict has keys with the names of the WOS tags and lists with their values, i.e. a table with labeled columns. The counts are in the column named "TargetCount" and if compareCounts the local count is in a column called "SourceCount".

metaknowledge.diffusion.diffusionGraph(source, target, weighted=True, sourceType='raw', targetType='raw', labelEdgesBy=None)¶

Takes in two RecordCollections and produces a graph of the citations of source by the Records in target. By default the nodes in the are Record objects but this can be changed with the sourceType and targetType keywords. The edges of the graph go from the target to the source.

Each node on the output graph has two boolean attributes, "source" and "target" indicating if they are targets or sources. Note, if the types of the sources and targets are different the attributes will not be checked for overlap of the other type. e.g. if the source type is 'TI' (title) and the target type is 'UT' (WOS number), and there is some overlap of the targets and sources. Then the Record corresponding to a source node will not be checked for being one of the titles of the targets, only its WOS number will be considered.

Parameters¶

source : RecordCollection

A metaknowledge RecordCollection containing the Records being cited

target : RecordCollection

A metaknowledge RecordCollection containing the Records citing those in source

weighted : optional [bool]

Default True, if True each edge will have an attribute 'weight' giving the number of times the source has referenced the target.

sourceType : optional [str]

Default 'raw', if 'raw' the returned graph will contain Records as source nodes.

If Records are not wanted then it can be set to a WOS tag, such as 'SO' (for journals ), to make the nodes into the type of object returned by that tag from Records.

targetType : optional [str]

Default 'raw', if 'raw' the returned graph will contain Records as target nodes.

If Records are not wanted then it can be set to a WOS tag, such as 'SO' (for journals ), to make the nodes into the type of object returned by that tag from Records.

labelEdgesBy : optional [str]

Default None, if a WOS tag (or long name of WOS tag) then the edges of the output graph will have a attribute 'key' that is the value of the referenced tag, of source Record, i.e. if 'PY' is given then each edge will have a 'key' value equal to the publication year of the source.

This option will cause the output graph to be an MultiDiGraph and is likely to result in parallel edges. If a Record has multiple values for at tag (e.g. 'AF') the each tag will create its own edge.

Returns¶

networkx Directed Graph or networkx multi Directed Graph

A directed graph of the diffusion network, labelEdgesBy is used the graph will allow parallel edges.

metaknowledge.diffusion.makeNodeID(Rec, ndType, extras=None)¶: Helper to make a node ID, extras is currently not used

metaknowledge.graphHelpers.dropEdges(grph, minWeight=-inf, maxWeight=inf, parameterName='weight', ignoreUnweighted=False, dropSelfLoops=False)¶

Modifies grph by dropping edges whose weight is not within the inclusive bounds of minWeight and maxWeight, i.e after running grph will only have edges whose weights meet the following inequality: minWeight <= edge’s weight <= maxWeight. A Keyerror will be raised if the graph is unweighted unless ignoreUnweighted is True, the weight is determined by examining the attribute parameterName.

Note: none of the default options will result in grph being modified so only specify the relevant ones, e.g. dropEdges(G, dropSelfLoops = True) will remove only the self loops from G.

Parameters¶

grph : networkx Graph

The graph to be modified.

minWeight : optional [int or double]

default -inf, the minimum weight for an edge to be kept in the graph.

maxWeight : optional [int or double]

default inf, the maximum weight for an edge to be kept in the graph.

parameterName : optional [str]

default 'weight', key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correct

ignoreUnweighted : optional [bool]

default False, if True unweighted edges will kept

dropSelfLoops : optional [bool]

default False, if True self loops will be removed regardless of their weight

metaknowledge.graphHelpers.dropNodesByCount(grph, minCount=-inf, maxCount=inf, parameterName='count', ignoreMissing=False)¶

Modifies grph by dropping nodes that do not have a count that is within inclusive bounds of minCount and maxCount, i.e after running grph will only have nodes whose degrees meet the following inequality: minCount <= node’s degree <= maxCount.

Count is determined by the count attribute, parameterName, and if missing will result in a KeyError being raised. ignoreMissing can be set to True to suppress the error.

minCount and maxCount default to negative and positive infinity respectively so without specifying either the output should be the input

Parameters¶

grph : networkx Graph

The graph to be modified.

minCount : optional [int or double]

default -inf, the minimum Count for an node to be kept in the graph.

maxCount : optional [int or double]

default inf, the maximum Count for an node to be kept in the graph.

parameterName : optional [str]

default 'count', key to count field in the nodes’s attribute dictionary, the default is the same thoughout metaknowledge so is likely to be correct.

ignoreMissing : optional [bool]

default False, if True nodes missing a count will be kept in the graph instead of raising an exception

metaknowledge.graphHelpers.dropNodesByDegree(grph, minDegree=-inf, maxDegree=inf, useWeight=True, parameterName='weight', includeUnweighted=True)¶

Modifies grph by dropping nodes that do not have a degree that is within inclusive bounds of minDegree and maxDegree, i.e after running grph will only have nodes whose degrees meet the following inequality: minDegree <= node’s degree <= maxDegree.

Degree is determined in two ways, the default useWeight is the weight attribute of the edges to a node will be summed, the attribute’s name is parameterName otherwise the number of edges touching the node is used. If includeUnweighted is True then useWeight will assign a degree of 1 to unweighted edges.

Parameters¶

grph : networkx Graph

The graph to be modified.

minDegree : optional [int or double]

default -inf, the minimum degree for an node to be kept in the graph.

maxDegree : optional [int or double]

default inf, the maximum degree for an node to be kept in the graph.

useWeight : optional [bool]

default True, if True the the edge weights will be summed to get the degree, if False the number of edges will be used to determine the degree.

parameterName : optional [str]

default 'weight', key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correct.

includeUnweighted : optional [bool]

default True, if True edges with no weight will be considered to have a weight of 1, if False they will cause a KeyError to be raised.

metaknowledge.graphHelpers.getNodeDegrees(grph, weightString='weight', strictMode=False, returnType=<class 'int'>, edgeType='bi')¶

Retunrs a dictionary of nodes to their degrees, the degree is determined by adding the weight of edge with the weight being the string weightString that gives the name of the attribute of each edge containng thier weight. The Weights are then converted to the type returnType. If weightString is give as False instead each edge is counted as 1.

edgeType, takes in one of three strings: ‘bi’, ‘in’, ‘out’. ‘bi’ means both nodes on the edge count it, ‘out’ mans only the one the edge comes form counts it and ‘in’ means only the node the edge goes to counts it. ‘bi’ is the default. Use only on directional graphs as otherwise the selected nodes is random.

metaknowledge.graphHelpers.getWeight(grph, nd1, nd2, weightString='weight', returnType=<class 'int'>)¶: A way of getting the weight of an edge with or without weight as a parameter

returns a the value of the weight parameter converted to returnType if it is given or 1 (also converted) if not

metaknowledge.graphHelpers.graphStats(G, stats=('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity'), makeString=True, sentenceString=False)¶

Returns a string or list containing statistics about the graph G.

graphStats() gives 6 different statistics: number of nodes, number of edges, number of isolates, number of loops, density and transitivity. The ones wanted can be given to stats. By default a string giving each stat on a different line it can also produce a sentence containing all the requested statistics or the raw values can be accessed instead by setting makeString to False.

Parameters¶

G : networkx Graph

The graph for the statistics to be determined of

stats : optional [list or tuple [str]]

Default ('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity'), a list or tuple containing any number or combination of the strings:

"nodes", "edges", "isolates", "loops", "density" and `”transitivity”``

At least one occurrence of the corresponding string causes the statistics to be provided in the string output. For the non-string (tuple) output the returned tuple has the same length as the input and each output is at the same index as the string that requested it, e.g.

_stats_ = ("edges", "loops", "edges")

The return is a tuple with 2 elements the first and last of which are the number of edges and the second is the number of loops

makeString : optional [bool]

Default True, if True a string is returned if False a tuple

sentenceString : optional [bool]

Default False : if True the returned string is a sentce, otherwise each value has a seperate line.

Returns¶

str or tuple [float and int]

The type is determined by makeString and the layout by stats

metaknowledge.graphHelpers.mergeGraphs(targetGraph, addedGraph, incrementedNodeVal='count', incrementedEdgeVal='weight')¶

A quick way of merging graphs, this is meant to be quick and is only intended for graphs generated by metaknowledge. This does not check anything and as such may cause unexpected results if the source and target were not generated by the same method.

mergeGraphs() will modify targetGraph in place by adding the nodes and edges found in the second, addedGraph. If a node or edge exists targetGraph is given precedence, but the edge and node attributes given by incrementedNodeVal and incrementedEdgeVal are added instead of being overwritten.

Parameters¶

targetGraph : networkx Graph

the graph to be modified, it has precedence.

addedGraph : networkx Graph

the graph that is unmodified, it is added and does not have precedence.

incrementedNodeVal : optional [str]

default 'count', the name of the count attribute for the graph’s nodes. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.

incrementedEdgeVal : optional [str]

default 'weight', the name of the weight attribute for the graph’s edges. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.

metaknowledge.graphHelpers.readGraph(edgeList, nodeList=None, directed=False, idKey='ID', eSource='From', eDest='To')¶

Reads the files given by edgeList and nodeList and creates a networkx graph for the files.

This is designed only for the files produced by metaknowledge and is meant to be the reverse of writeGraph(), if this does not produce the desired results the networkx builtin networkx.read_edgelist() could be tried as it is aimed at a more general usage.

The read edge list format assumes the column named eSource (default 'From') is the source node, then the column eDest (default 'To') givens the destination and all other columns are attributes of the edges, e.g. weight.

The read node list format assumes the column idKey (default 'ID') is the ID of the node for the edge list and the resulting network. All other columns are considered attributes of the node, e.g. count.

Note: If the names of the columns do not match those given to readGraph() a KeyError exception will be raised.

Note: If nodes appear in the edgelist but not the nodeList they will be created silently with no attributes.

Parameters¶

edgeList : str

a string giving the path to the edge list file

nodeList : optional [str]

default None, a string giving the path to the node list file

directed : optional [bool]

default False, if True the produced network is directed from eSource to eDest

idKey : optional [str]

default 'ID', the name of the ID column in the node list

eSource : optional [str]

default 'From', the name of the source column in the edge list

eDest : optional [str]

default 'To', the name of the destination column in the edge list

Returns¶

networkx Graph

the graph described by the input files

metaknowledge.graphHelpers.writeEdgeList(grph, name, extraInfo=True, allSameAttribute=False, _progBar=None)¶

Writes an edge list of grph at the destination name.

The edge list has two columns for the source and destination of the edge, 'From' and 'To' respectively, then, if edgeInfo is True, for each attribute of the node another column is created.

Note: If any edges are missing an attribute it will be left blank by default, enable allSameAttribute to cause a KeyError to be raised.

Parameters¶

grph : networkx Graph

The graph to be written to name

name : str

The name of the file to be written

edgeInfo : optional [bool]

Default True, if True the attributes of each edge will be written

allSameAttribute : optional [bool]

Default False, if True all the edges must have the same attributes or an exception will be raised. If False the missing attributes will be left blank.

metaknowledge.graphHelpers.writeGraph(grph, name, edgeInfo=True, typing=False, suffix='csv', overwrite=True, allSameAttribute=False)¶

Writes both the edge list and the node attribute list of grph to files starting with name.

The output files start with name, the file type (edgeList, nodeAttributes) then if typing is True the type of graph (directed or undirected) then the suffix, the default is as follows:

name_fileType.suffix

Both files are csv’s with comma delimiters and double quote quoting characters. The edge list has two columns for the source and destination of the edge, 'From' and 'To' respectively, then, if edgeInfo is True, for each attribute of the node another column is created. The node list has one column call “ID” with the node ids used by networkx and all other columns are the node attributes.

To read back these files use readGraph() and to write only one type of lsit use writeEdgeList() or writeNodeAttributeFile().

Warning: this function will overwrite files, if they are in the way of the output, to prevent this set overwrite to False

Note: If any nodes or edges are missing an attribute a KeyError will be raised.

Parameters¶

grph : networkx Graph

A networkx graph of the network to be written.

name : str

The start of the file name to be written, can include a path.

edgeInfo : optional [bool]

Default True, if True the the attributes of each edge are written to the edge list.

typing : optional [bool]

Default False, if True the directed ness of the graph will be added to the file names.

suffix : optional [str]

Default "csv", the suffix of the file.

overwrite : optional [bool]

Default True, if True files will be overwritten silently, otherwise an OSError exception will be raised.

metaknowledge.graphHelpers.writeNodeAttributeFile(grph, name, allSameAttribute=False, _progBar=None)¶

Writes a node attribute list of grph to the file given by the path name.

The node list has one column call 'ID' with the node ids used by networkx and all other columns are the node attributes.

Note: If any nodes are missing an attribute it will be left blank by default, enable allSameAttribute to cause a KeyError to be raised.

Parameters¶

grph : networkx Graph

The graph to be written to name

name : str

The name of the file to be written

allSameAttribute : optional [bool]

Default False, if True all the nodes must have the same attributes or an exception will be raised. If False the missing attributes will be left blank.

metaknowledge.graphHelpers.writeTnetFile(grph, name, modeNameString, weighted=False, sourceMode=None, timeString=None, nodeIndexString='tnet-ID', weightString='weight')¶

Writes an edge list designed for reading by the R package tnet.

The networkx graph provided must be a pure two-mode network, the modes must be 2 different values for the node attribute accessed by modeNameString and all edges must be between different node types. Each node will be given an integer id, stored in the attribute given by nodeIndexString, these ids are then written to the file as the endpoints of the edges. Unless sourceMode is given which mode is the source (first column) and which the target (second column) is random.

Note the grph will be modified by this function, the ids of the nodes will be written to the graph at the attribute nodeIndexString.

Parameters¶

grph : network Graph

The graph that will be written to name

name : str

The path of the file to write

modeNameString : str

The name of the attribute grph’s modes are stored in

weighted : optional bool

Default False, if True then the attribute weightString will be written to the weight column

sourceMode : optional str

Default None, if given the name of the mode used for the source (first column) in the output file

timeString : optional str

Default None, if present the attribute timeString of an edge will be written to the time column surrounded by double quotes (“).

Note The format used by tnet for dates is very strict it uses the ISO format, down to the second and without time zones.

nodeIndexString : optional str

Default 'tnet-ID', the name of the attribute to save the id for each node

weightString : optional str

Default 'weight', the name of the weight attribute

Record is the base of various objects in mk, it is intended to be used with things that have some sort of key-value relationship and is basiclly a hashable python dict. It also has a few extra attributes intead to make debugging and record keeping easier.

bad cand be set to True to indcate something is wrong with the issue being saved in error the exact details are left to designer
_sourceFile and _sourceLine store the original file name and line number and are mostly for improving error messages
_id should be a unique string, that preferably can be used to identify the record from its source, although the latter is not always possible to do so, do your best. It is also what is used for hashing and comparison
_fieldDict contains the base mapping of keys to values, it is the dictionary

ExtendedRecord is what WOSRecord and its ilk inherit from and extends Record by adding memoizing and processing of the fields. ExtendedRecord cannot be invoked directly as it has many abstract (virtual) methods that define how the tags are to be proccesed what they are called, what encoding to use when writing to disk, etc.

metaknowledge.mkRecord._bibFormatter(s, maxLength)¶: Formats a string, list or number to make it good for a bib file by:

* if too long splits up the string correctly

* tries to use the best quoting characters

* expands lists into ‘ and ‘ seperated values, as per spec for authors field

Note, this does not escape characters. LaTeX may have issues with the output

Max length splitting derived from https://www.cs.arizona.edu/~collberg/Teaching/07.231/BibTeX/bibtex.html

metaknowledge.recordCollection.addToNetwork(grph, nds, count, weighted, nodeType, nodeInfo, fullInfo, coreCitesDict, coreValues, detailedValues, addCR, recordToCite=True, headNd=None)¶

Addeds the citations nds to grph, according to the rules give by nodeType, fullInfo, etc.

headNd is the citation of the Record

metaknowledge.recordCollection.expandRecs(G, RecCollect, nodeType, weighted)¶: Expand all the citations from RecCollect

metaknowledge.recordCollection.makeID(citation, nodeType)¶: Makes the id, of the correct type for the network

metaknowledge.recordCollection.makeNodeTuple(citation, idVal, nodeInfo, fullInfo, nodeType, count, coreCitesDict, coreValues, detailedValues, addCR)¶: Makes a tuple of idVal and a dict of the selected attributes

metaknowledge.genders.nameGender.nameStringGender(s, noExcept=False)¶: Expects first, last