Functions¶
-
metaknowledge.citation.filterNonJournals(citesLst, invert=False) Removes the
Citationsfrom citesLst that are not journalsParameters¶
citesLst :
list [Citation]A list of citations to be filteredinvert :
optional [bool]DefaultFalse, ifTruenon-journals will be kept instead of journals
-
metaknowledge.constants.isInteractive()¶ A basic check of if the program is running in interactive mode
-
metaknowledge.diffusion.diffusionAddCountsFromSource(grph, source, target, nodeType='citations', extraType=None, diffusionLabel='DiffusionCount', extraKeys=None, countsDict=None, extraMapping=None)¶ Does a diffusion using diffusionCount() and updates grph with it, using the nodes in the graph as keys in the diffusion, i.e. the source. The name of the attribute the counts are added to is given by diffusionLabel. If the graph is not composed of citations from the source and instead is another tag nodeType needs to be given the tag string.
Parameters¶
grph :
networkx GraphThe graph to be updatedsource :
RecordCollectionTheRecordCollectionthat created grphtarget :
RecordCollectionTheRecordCollectionthat will be countednodeType :
optional [str]default'citations', the tag that constants the values used to create grphReturns¶
dict[:int]The counts dictioanry used to add values to grph. Note grph is modified by the function and the return is done in case you need it.
-
metaknowledge.diffusion.diffusionCount(source, target, sourceType='raw', extraValue=None, pandasFriendly=False, compareCounts=False, numAuthors=True, useAllAuthors=True, _ProgBar=None, extraMapping=None)¶ Takes in two RecordCollections and produces a
dictcounting the citations of source by the Records of target. By default thedictusesRecordobjects as keys but this can be changed with the sourceType keyword to any of the WOS tags.Parameters¶
source :
RecordCollectionA metaknowledgeRecordCollectioncontaining theRecordsbeing citedtarget :
RecordCollectionA metaknowledgeRecordCollectioncontaining theRecordsciting those in sourcesourceType :
optional [str]default'raw', if'raw'the returneddictwill containRecordsas keys. If it is a WOS tag the keys will be of that type.pandasFriendly :
optional [bool]defaultFalse, makes the output be a dict with two keys one"Record"is the list of Records ( or data type requested by sourceType) the other is their occurrence counts as"Counts". The lists are the same length.compareCounts :
optional [bool]defaultFalse, ifTruethe diffusion analysis will be run twice, first with source and target setup like the default (global scope) then using only the sourceRecordCollection(local scope).extraValue :
optional [str]default
None, if a tag the returned dictionary will haveRecordsmapped to maps, these maps will map the entries for the tag to counts. If pandasFriendly is alsoTruethe resultant dictionary will have an additional column called'year'. This column will contain the year the citations occurred, in addition the Records entries will be duplicated for each year they occur in.For example if
'year'was given then the count for a singleRecordcould be{1990 : 1, 2000 : 5}useAllAuthors :
optional [bool]defaultTrue, ifFalseonly the first author will be used to generate theCitationsfor the sourceRecordsReturns¶
dict[:int]A dictionary with the type given by sourceType as keys and integers as values.
If compareCounts is
Truethe values are tuples with the first integer being the diffusion in the target and the second the diffusion in the source.If pandasFriendly is
Truethe returned dict has keys with the names of the WOS tags and lists with their values, i.e. a table with labeled columns. The counts are in the column named"TargetCount"and if compareCounts the local count is in a column called"SourceCount".
-
metaknowledge.diffusion.diffusionGraph(source, target, weighted=True, sourceType='raw', targetType='raw', labelEdgesBy=None)¶ Takes in two RecordCollections and produces a graph of the citations of source by the Records in target. By default the nodes in the are
Recordobjects but this can be changed with the sourceType and targetType keywords. The edges of the graph go from the target to the source.Each node on the output graph has two boolean attributes,
"source"and"target"indicating if they are targets or sources. Note, if the types of the sources and targets are different the attributes will not be checked for overlap of the other type. e.g. if the source type is'TI'(title) and the target type is'UT'(WOS number), and there is some overlap of the targets and sources. Then the Record corresponding to a source node will not be checked for being one of the titles of the targets, only its WOS number will be considered.Parameters¶
source :
RecordCollectionA metaknowledgeRecordCollectioncontaining theRecordsbeing citedtarget :
RecordCollectionA metaknowledgeRecordCollectioncontaining theRecordsciting those in sourceweighted :
optional [bool]DefaultTrue, ifTrueeach edge will have an attribute'weight'giving the number of times the source has referenced the target.sourceType :
optional [str]Default
'raw', if'raw'the returned graph will containRecordsas source nodes.If Records are not wanted then it can be set to a WOS tag, such as
'SO'(for journals ), to make the nodes into the type of object returned by that tag from Records.targetType :
optional [str]Default
'raw', if'raw'the returned graph will containRecordsas target nodes.If Records are not wanted then it can be set to a WOS tag, such as
'SO'(for journals ), to make the nodes into the type of object returned by that tag from Records.labelEdgesBy :
optional [str]Default
None, if a WOS tag (or long name of WOS tag) then the edges of the output graph will have a attribute'key'that is the value of the referenced tag, of sourceRecord, i.e. if'PY'is given then each edge will have a'key'value equal to the publication year of the source.This option will cause the output graph to be an
MultiDiGraphand is likely to result in parallel edges. If aRecordhas multiple values for at tag (e.g.'AF') the each tag will create its own edge.Returns¶
networkx Directed Graph or networkx multi Directed GraphA directed graph of the diffusion network, labelEdgesBy is used the graph will allow parallel edges.
-
metaknowledge.diffusion.makeNodeID(Rec, ndType, extras=None)¶ Helper to make a node ID, extras is currently not used
-
metaknowledge.graphHelpers.dropEdges(grph, minWeight=-inf, maxWeight=inf, parameterName='weight', ignoreUnweighted=False, dropSelfLoops=False)¶ Modifies grph by dropping edges whose weight is not within the inclusive bounds of minWeight and maxWeight, i.e after running grph will only have edges whose weights meet the following inequality: minWeight <= edge’s weight <= maxWeight. A
Keyerrorwill be raised if the graph is unweighted unless ignoreUnweighted isTrue, the weight is determined by examining the attribute parameterName.Note: none of the default options will result in grph being modified so only specify the relevant ones, e.g.
dropEdges(G, dropSelfLoops = True)will remove only the self loops fromG.Parameters¶
grph :
networkx GraphThe graph to be modified.minWeight :
optional [int or double]default-inf, the minimum weight for an edge to be kept in the graph.maxWeight :
optional [int or double]defaultinf, the maximum weight for an edge to be kept in the graph.parameterName :
optional [str]default'weight', key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correctignoreUnweighted :
optional [bool]defaultFalse, ifTrueunweighted edges will keptdropSelfLoops :
optional [bool]defaultFalse, ifTrueself loops will be removed regardless of their weight
-
metaknowledge.graphHelpers.dropNodesByCount(grph, minCount=-inf, maxCount=inf, parameterName='count', ignoreMissing=False)¶ Modifies grph by dropping nodes that do not have a count that is within inclusive bounds of minCount and maxCount, i.e after running grph will only have nodes whose degrees meet the following inequality: minCount <= node’s degree <= maxCount.
Count is determined by the count attribute, parameterName, and if missing will result in a
KeyErrorbeing raised. ignoreMissing can be set toTrueto suppress the error.minCount and maxCount default to negative and positive infinity respectively so without specifying either the output should be the input
Parameters¶
grph :
networkx GraphThe graph to be modified.minCount :
optional [int or double]default-inf, the minimum Count for an node to be kept in the graph.maxCount :
optional [int or double]defaultinf, the maximum Count for an node to be kept in the graph.parameterName :
optional [str]default'count', key to count field in the nodes’s attribute dictionary, the default is the same thoughout metaknowledge so is likely to be correct.ignoreMissing :
optional [bool]defaultFalse, ifTruenodes missing a count will be kept in the graph instead of raising an exception
-
metaknowledge.graphHelpers.dropNodesByDegree(grph, minDegree=-inf, maxDegree=inf, useWeight=True, parameterName='weight', includeUnweighted=True)¶ Modifies grph by dropping nodes that do not have a degree that is within inclusive bounds of minDegree and maxDegree, i.e after running grph will only have nodes whose degrees meet the following inequality: minDegree <= node’s degree <= maxDegree.
Degree is determined in two ways, the default useWeight is the weight attribute of the edges to a node will be summed, the attribute’s name is parameterName otherwise the number of edges touching the node is used. If includeUnweighted is
Truethen useWeight will assign a degree of 1 to unweighted edges.Parameters¶
grph :
networkx GraphThe graph to be modified.minDegree :
optional [int or double]default-inf, the minimum degree for an node to be kept in the graph.maxDegree :
optional [int or double]defaultinf, the maximum degree for an node to be kept in the graph.useWeight :
optional [bool]defaultTrue, ifTruethe the edge weights will be summed to get the degree, ifFalsethe number of edges will be used to determine the degree.parameterName :
optional [str]default'weight', key to weight field in the edge’s attribute dictionary, the default is the same as networkx and metaknowledge so is likely to be correct.includeUnweighted :
optional [bool]defaultTrue, ifTrueedges with no weight will be considered to have a weight of 1, ifFalsethey will cause aKeyErrorto be raised.
-
metaknowledge.graphHelpers.getNodeDegrees(grph, weightString='weight', strictMode=False, returnType=<class 'int'>, edgeType='bi')¶ Retunrs a dictionary of nodes to their degrees, the degree is determined by adding the weight of edge with the weight being the string weightString that gives the name of the attribute of each edge containng thier weight. The Weights are then converted to the type returnType. If weightString is give as False instead each edge is counted as 1.
edgeType, takes in one of three strings: ‘bi’, ‘in’, ‘out’. ‘bi’ means both nodes on the edge count it, ‘out’ mans only the one the edge comes form counts it and ‘in’ means only the node the edge goes to counts it. ‘bi’ is the default. Use only on directional graphs as otherwise the selected nodes is random.
-
metaknowledge.graphHelpers.getWeight(grph, nd1, nd2, weightString='weight', returnType=<class 'int'>)¶ - A way of getting the weight of an edge with or without weight as a parameterreturns a the value of the weight parameter converted to returnType if it is given or 1 (also converted) if not
-
metaknowledge.graphHelpers.graphStats(G, stats=('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity'), makeString=True, sentenceString=False)¶ Returns a string or list containing statistics about the graph G.
graphStats() gives 6 different statistics: number of nodes, number of edges, number of isolates, number of loops, density and transitivity. The ones wanted can be given to stats. By default a string giving each stat on a different line it can also produce a sentence containing all the requested statistics or the raw values can be accessed instead by setting makeString to
False.Parameters¶
G :
networkx GraphThe graph for the statistics to be determined ofstats :
optional [list or tuple [str]]Default
('nodes', 'edges', 'isolates', 'loops', 'density', 'transitivity'), a list or tuple containing any number or combination of the strings:"nodes","edges","isolates","loops","density"and `”transitivity”``At least one occurrence of the corresponding string causes the statistics to be provided in the string output. For the non-string (tuple) output the returned tuple has the same length as the input and each output is at the same index as the string that requested it, e.g.
_stats_ = ("edges", "loops", "edges")The return is a tuple with 2 elements the first and last of which are the number of edges and the second is the number of loops
makeString :
optional [bool]DefaultTrue, ifTruea string is returned ifFalsea tuplesentenceString :
optional [bool]DefaultFalse: ifTruethe returned string is a sentce, otherwise each value has a seperate line.
-
metaknowledge.graphHelpers.mergeGraphs(targetGraph, addedGraph, incrementedNodeVal='count', incrementedEdgeVal='weight')¶ A quick way of merging graphs, this is meant to be quick and is only intended for graphs generated by metaknowledge. This does not check anything and as such may cause unexpected results if the source and target were not generated by the same method.
mergeGraphs() will modify targetGraph in place by adding the nodes and edges found in the second, addedGraph. If a node or edge exists targetGraph is given precedence, but the edge and node attributes given by incrementedNodeVal and incrementedEdgeVal are added instead of being overwritten.
Parameters¶
targetGraph :
networkx Graphthe graph to be modified, it has precedence.addedGraph :
networkx Graphthe graph that is unmodified, it is added and does not have precedence.incrementedNodeVal :
optional [str]default'count', the name of the count attribute for the graph’s nodes. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.incrementedEdgeVal :
optional [str]default'weight', the name of the weight attribute for the graph’s edges. When merging this attribute will be the sum of the values in the input graphs, instead of targetGraph’s value.
-
metaknowledge.graphHelpers.readGraph(edgeList, nodeList=None, directed=False, idKey='ID', eSource='From', eDest='To')¶ Reads the files given by edgeList and nodeList and creates a networkx graph for the files.
This is designed only for the files produced by metaknowledge and is meant to be the reverse of writeGraph(), if this does not produce the desired results the networkx builtin networkx.read_edgelist() could be tried as it is aimed at a more general usage.
The read edge list format assumes the column named eSource (default
'From') is the source node, then the column eDest (default'To') givens the destination and all other columns are attributes of the edges, e.g. weight.The read node list format assumes the column idKey (default
'ID') is the ID of the node for the edge list and the resulting network. All other columns are considered attributes of the node, e.g. count.Note: If the names of the columns do not match those given to readGraph() a
KeyErrorexception will be raised.Note: If nodes appear in the edgelist but not the nodeList they will be created silently with no attributes.
Parameters¶
edgeList :
stra string giving the path to the edge list filenodeList :
optional [str]defaultNone, a string giving the path to the node list filedirected :
optional [bool]defaultFalse, ifTruethe produced network is directed from eSource to eDestidKey :
optional [str]default'ID', the name of the ID column in the node listeSource :
optional [str]default'From', the name of the source column in the edge listeDest :
optional [str]default'To', the name of the destination column in the edge list
-
metaknowledge.graphHelpers.writeEdgeList(grph, name, extraInfo=True, allSameAttribute=False, _progBar=None)¶ Writes an edge list of grph at the destination name.
The edge list has two columns for the source and destination of the edge,
'From'and'To'respectively, then, if edgeInfo isTrue, for each attribute of the node another column is created.Note: If any edges are missing an attribute it will be left blank by default, enable allSameAttribute to cause a
KeyErrorto be raised.Parameters¶
grph :
networkx GraphThe graph to be written to namename :
strThe name of the file to be writtenedgeInfo :
optional [bool]DefaultTrue, ifTruethe attributes of each edge will be writtenallSameAttribute :
optional [bool]DefaultFalse, ifTrueall the edges must have the same attributes or an exception will be raised. IfFalsethe missing attributes will be left blank.
-
metaknowledge.graphHelpers.writeGraph(grph, name, edgeInfo=True, typing=False, suffix='csv', overwrite=True, allSameAttribute=False)¶ Writes both the edge list and the node attribute list of grph to files starting with name.
The output files start with name, the file type (edgeList, nodeAttributes) then if typing is True the type of graph (directed or undirected) then the suffix, the default is as follows:
name_fileType.suffixBoth files are csv’s with comma delimiters and double quote quoting characters. The edge list has two columns for the source and destination of the edge,
'From'and'To'respectively, then, if edgeInfo isTrue, for each attribute of the node another column is created. The node list has one column call “ID” with the node ids used by networkx and all other columns are the node attributes.To read back these files use readGraph() and to write only one type of lsit use writeEdgeList() or writeNodeAttributeFile().
Warning: this function will overwrite files, if they are in the way of the output, to prevent this set overwrite to
FalseNote: If any nodes or edges are missing an attribute a
KeyErrorwill be raised.Parameters¶
grph :
networkx GraphA networkx graph of the network to be written.name :
strThe start of the file name to be written, can include a path.edgeInfo :
optional [bool]DefaultTrue, ifTruethe the attributes of each edge are written to the edge list.typing :
optional [bool]DefaultFalse, ifTruethe directed ness of the graph will be added to the file names.suffix :
optional [str]Default"csv", the suffix of the file.overwrite :
optional [bool]DefaultTrue, ifTruefiles will be overwritten silently, otherwise anOSErrorexception will be raised.
-
metaknowledge.graphHelpers.writeNodeAttributeFile(grph, name, allSameAttribute=False, _progBar=None)¶ Writes a node attribute list of grph to the file given by the path name.
The node list has one column call
'ID'with the node ids used by networkx and all other columns are the node attributes.Note: If any nodes are missing an attribute it will be left blank by default, enable allSameAttribute to cause a
KeyErrorto be raised.Parameters¶
grph :
networkx GraphThe graph to be written to namename :
strThe name of the file to be writtenallSameAttribute :
optional [bool]DefaultFalse, ifTrueall the nodes must have the same attributes or an exception will be raised. IfFalsethe missing attributes will be left blank.
-
metaknowledge.graphHelpers.writeTnetFile(grph, name, modeNameString, weighted=False, sourceMode=None, timeString=None, nodeIndexString='tnet-ID', weightString='weight')¶ Writes an edge list designed for reading by the R package tnet.
The networkx graph provided must be a pure two-mode network, the modes must be 2 different values for the node attribute accessed by modeNameString and all edges must be between different node types. Each node will be given an integer id, stored in the attribute given by nodeIndexString, these ids are then written to the file as the endpoints of the edges. Unless sourceMode is given which mode is the source (first column) and which the target (second column) is random.
Note the grph will be modified by this function, the ids of the nodes will be written to the graph at the attribute nodeIndexString.
Parameters¶
grph :
network GraphThe graph that will be written to namename :
strThe path of the file to writemodeNameString :
strThe name of the attribute grph’s modes are stored inweighted :
optional boolDefaultFalse, ifTruethen the attribute weightString will be written to the weight columnsourceMode :
optional strDefaultNone, if given the name of the mode used for the source (first column) in the output filetimeString :
optional strDefaultNone, if present the attribute timeString of an edge will be written to the time column surrounded by double quotes (“).Note The format used by tnet for dates is very strict it uses the ISO format, down to the second and without time zones.
nodeIndexString :
optional strDefault'tnet-ID', the name of the attribute to save the id for each nodeweightString :
optional strDefault'weight', the name of the weight attribute
Record is the base of various objects in mk, it is intended to be
used with things that have some sort of key-value relationship and is
basiclly a hashable python dict. It also has a few extra attributes
intead to make debugging and record keeping easier.
badcand be set toTrueto indcate something is wrong with the issue being saved inerrorthe exact details are left to designer_sourceFileand_sourceLinestore the original file name and line number and are mostly for improving error messages_idshould be a unique string, that preferably can be used to identify the record from its source, although the latter is not always possible to do so, do your best. It is also what is used for hashing and comparison_fieldDictcontains the base mapping of keys to values, it is the dictionary
ExtendedRecord is what WOSRecord and its ilk inherit from and
extends Record by adding memoizing and processing of the fields.
ExtendedRecord cannot be invoked directly as it has many abstract
(virtual) methods that define how the tags are to be proccesed what they
are called, what encoding to use when writing to disk, etc.
-
metaknowledge.mkRecord._bibFormatter(s, maxLength)¶ - Formats a string, list or number to make it good for a bib file by:* if too long splits up the string correctly* tries to use the best quoting characters* expands lists into ‘ and ‘ seperated values, as per spec for authors fieldNote, this does not escape characters. LaTeX may have issues with the outputMax length splitting derived from https://www.cs.arizona.edu/~collberg/Teaching/07.231/BibTeX/bibtex.html
-
metaknowledge.recordCollection.addToNetwork(grph, nds, count, weighted, nodeType, nodeInfo, fullInfo, coreCitesDict, coreValues, detailedValues, addCR, recordToCite=True, headNd=None)¶ Addeds the citations nds to grph, according to the rules give by nodeType, fullInfo, etc.
headNd is the citation of the Record
-
metaknowledge.recordCollection.expandRecs(G, RecCollect, nodeType, weighted)¶ Expand all the citations from RecCollect
-
metaknowledge.recordCollection.makeID(citation, nodeType)¶ Makes the id, of the correct type for the network
-
metaknowledge.recordCollection.makeNodeTuple(citation, idVal, nodeInfo, fullInfo, nodeType, count, coreCitesDict, coreValues, detailedValues, addCR)¶ Makes a tuple of idVal and a dict of the selected attributes
-
metaknowledge.genders.nameGender.nameStringGender(s, noExcept=False)¶ Expects
first, last