CollectionWithIDs(Collection)

class metaknowledge.CollectionWithIDs(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)

A Collection with a few extra methods that assume all the contained items have an id attribute and a bad attribute, e.g. Records or Grants.

__Init__

As CollectionWithIDs is mostly meant to be base for other classes all but one of the arguments in the __init__ are not optional and the optional one is not used. The __init__() function is the same as a Collection.

__init__(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)

Basically a collections.abc.MutableSet wrapper for a set with a bunch of extra record keeping attached.

badEntries()

Creates a new collection of the same type with only the bad entries

Returns

CollectionWithIDs

A collection of only the bad entries
containsID(idVal)

Checks if the collected items contains the give idVal

Parameters

idVal : str

The queried id string

Returns

bool

True if the item is in the collection
cooccurrenceCounts(keyTag, *countedTags)

Counts the number of times values from any of the countedTags occurs with keyTag. The counts are retuned as a dictionary with the values of keyTag mapping to dictionaries with each of the countedTags values mapping to thier counts.

Parameters

keyTag : str

The tag used as the key for the returned dictionary

_*countedTags_ : str, str, str, ...

The tags used as the key for the returned dictionary’s values

Returns

dict[str:dict[str:int]]

The dictionary of counts
discardID(idVal)

Checks if the collected items contains the give idVal and discards it if it is found, will not raise an exception if item is not found

Parameters

idVal : str

The discarded id string
dropBadEntries()

Removes all the bad entries from the collection

getID(idVal)

Looks up an item with idVal and returns it if it is found, returns None if it does not find the item

Parameters

idVal : str

The requested item’s id string

Returns

object

The requested object or None
glimpse(*tags, compact=False)

Creates a printable table with the most frequently occurring values of each of the requested tags, or if none are provided the top authors, journals and citations. The table will be as wide and as tall as the terminal (or 80x24 if there is no terminal) so print(RC.glimpse())should always create a nice looking table. Below is a table created from some of the testing files:

>>> print(RC.glimpse())
+RecordCollection glimpse made at: 2016-01-01 12:00:00++++++++++++++++++++++++++
|33 Records from testFile++++++++++++++++++++++++++++++++++++++++++++++++++++++|
|Columns are ranked by num. of occurrences and are independent of one another++|
|-------Top Authors--------+------Top Journals-------+--------Top Cited--------|
|1                Girard, S|1 CANADIAN JOURNAL OF PH.|1 LEVY Y, 1975, OPT COMM.|
|1                Gilles, H|1 JOURNAL OF THE OPTICAL.|2 GOOS F, 1947, ANN PHYS.|
|2                IMBERT, C|2          APPLIED OPTICS|3 LOTSCH HKV, 1970, OPTI.|
|2                Pillon, F|2   OPTICS COMMUNICATIONS|4 RENARD RH, 1964, J OPT.|
|3          BEAUREGARD, OCD|2 NUOVO CIMENTO DELLA SO.|5 IMBERT C, 1972, PHYS R.|
|3               Laroche, M|2 JOURNAL OF THE OPTICAL.|6 ARTMANN K, 1948, ANN P.|
|3                 HUARD, S|2 JOURNAL OF THE OPTICAL.|6 COSTADEB.O, 1973, PHYS.|
|4                  PURI, A|2 NOUVELLE REVUE D OPTIQ.|6 ROOSEN G, 1973, CR ACA.|
|4               COSTADEB.O|3 PHYSICS REPORTS-REVIEW.|7 Imbert C., 1972, Nouve.|
|4           PATTANAYAK, DN|3 PHYSICAL REVIEW LETTERS|8 HOROWITZ BR, 1971, J O.|
|4           Gazibegovic, A|3 USPEKHI FIZICHESKIKH N.|8 BRETENAKER F, 1992, PH.|
|4                ROOSEN, G|3 APPLIED PHYSICS B-LASE.|8 SCHILLIN.H, 1965, ANN .|
|4               BIRMAN, JL|3 AEU-INTERNATIONAL JOUR.|8 FEDOROV FI, 1955, DOKL.|
|4                Kaiser, R|3 COMPTES RENDUS HEBDOMA.|8 MAZET A, 1971, CR ACAD.|
|5                  LEVY, Y|3 CHINESE PHYSICS LETTERS|9 IMBERT C, 1972, CR ACA.|
|5              BEAUREGA.OC|3       PHYSICAL REVIEW B|9 LOTSCH HKV, 1971, OPTI.|
|5               PAVLOV, VI|3 LETTERE AL NUOVO CIMEN.|9 ASHBY N, 1973, PHYS RE.|
|5                BREVIK, I|3 PROGRESS IN QUANTUM EL.|9 BOULWARE DG, 1973, PHY.|
>>>

Parameters

tags : str, str, ...

Any number of tag strings to be made into columns in the output table

Returns

str

A string containing the table
networkMultiLevel(*modes, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None, nodeAttribute=None, _networkTypeString='n-level network')

Creates a network of the objects found by any number of tags modes, with edges between all co-occurring values. IF you only want edges between co-occurring values from different tags use networkMultiMode().

A networkMultiLevel() looks are each entry in the collection and extracts its values for the tag given by each of the modes, e.g. the 'authorsFull' tag. Then if multiple are returned an edge is created between them. So in the case of the author tag 'authorsFull' a co-authorship network is created. Then for each other tag the entries are also added and edges between the first tag’s node and theirs are created.

The number of times each object occurs is count if nodeCount is True and the edges count the number of co-occurrences if edgeWeight is True. Both areTrue by default.

Note Do not use this for the construction of co-citation networks use Recordcollection.networkCoCitation() it is more accurate and has more options.

Parameters

mode : str

A two character WOS tag or one of the full names for a tag

nodeCount : optional [bool]

Default True, if True each node will have an attribute called “count” that contains an int giving the number of time the object occurred.

edgeWeight : optional [bool]

Default True, if True each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.

stemmer : optional [func]

Default None, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, all IDs are strings. For example:

The function f = lambda x: x[0] if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title 'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes' will create the node 'G'.

Returns

networkx Graph

A networkx Graph with the objects of the tag mode as nodes and their co-occurrences as edges
networkMultiMode(*tags, recordType=True, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None)

Creates a network of the objects found by all tags in tags, each node is marked by which tag spawned it making the resultant graph n-partite.

A networkMultiMode() looks are each item in the collection and extracts its values for the tags given by tags. Then for all objects returned an edge is created between them, regardless of their type. Each node will have an attribute call 'type' that gives the tag that created it or both if both created it, e.g. if 'LA' were in tags node 'English' would have the type attribute be 'LA'.

For example if tags was set to ['CR', 'UT', 'LA'], a three mode network would be created, composed of a co-citation network from the 'CR' tag. Then each citation would also have edges to all the languages of Records that cited it and to the WOS number of the those Records.

The number of times each object occurs is count if nodeCount is True and the edges count the number of co-occurrences if edgeWeight is True. Both areTrue by default.

Parameters

tags : str, str, str, … or list [str]

Any number of tags, or a list of tags

nodeCount : optional [bool]

Default True, if True each node will have an attribute called 'count' that contains an int giving the number of time the object occurred.

edgeWeight : optional [bool]

Default True, if True each edge will have an attribute called 'weight' that contains an int giving the number of time the two objects co-occurrenced.

stemmer : optional [func]

Default None, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, note that all IDs are strings.

For example: the function f = lambda x: x[0] if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title 'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes' will create the node 'G'.

Returns

networkx Graph

A networkx Graph with the objects of the tags tags as nodes and their co-occurrences as edges
networkOneMode(mode, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None, nodeAttribute=None)

Creates a network of the objects found by one tag mode. This is the same as networkMultiLevel() with only one tag.

A networkOneMode() looks are each entry in the collection and extracts its values for the tag given by mode, e.g. the 'authorsFull' tag. Then if multiple are returned an edge is created between them. So in the case of the author tag 'authorsFull' a co-authorship network is created.

The number of times each object occurs is count if nodeCount is True and the edges count the number of co-occurrences if edgeWeight is True. Both areTrue by default.

Note Do not use this for the construction of co-citation networks use Recordcollection.networkCoCitation() it is more accurate and has more options.

Parameters

mode : str

A two character WOS tag or one of the full names for a tag

nodeCount : optional [bool]

Default True, if True each node will have an attribute called “count” that contains an int giving the number of time the object occurred.

edgeWeight : optional [bool]

Default True, if True each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.

stemmer : optional [func]

Default None, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, all IDs are strings. For example:

The function f = lambda x: x[0] if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title 'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes' will create the node 'G'.

Returns

networkx Graph

A networkx Graph with the objects of the tag mode as nodes and their co-occurrences as edges
networkTwoMode(tag1, tag2, directed=False, recordType=True, nodeCount=True, edgeWeight=True, stemmerTag1=None, stemmerTag2=None, edgeAttribute=None)

Creates a network of the objects found by two WOS tags tag1 and tag2, each node marked by which tag spawned it making the resultant graph bipartite.

A networkTwoMode() looks at each Record in the RecordCollection and extracts its values for the tags given by tag1 and tag2, e.g. the 'WC' and 'LA' tags. Then for each object returned by each tag and edge is created between it and every other object of the other tag. So the WOS defined subject tag 'WC' and language tag 'LA', will give a two-mode network showing the connections between subjects and languages. Each node will have an attribute call 'type' that gives the tag that created it or both if both created it, e.g. the node 'English' would have the type attribute be 'LA'.

The number of times each object occurs is count if nodeCount is True and the edges count the number of co-occurrences if edgeWeight is True. Both areTrue by default.

The directed parameter if True will cause the network to be directed with the first tag as the source and the second as the destination.

Parameters

tag1 : str

A two character WOS tag or one of the full names for a tag, the source of edges on the graph

tag1 : str

A two character WOS tag or one of the full names for a tag, the target of edges on the graph

directed : optional [bool]

Default False, if True the returned network is directed

nodeCount : optional [bool]

Default True, if True each node will have an attribute called “count” that contains an int giving the number of time the object occurred.

edgeWeight : optional [bool]

Default True, if True each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.

stemmerTag1 : optional [func]

Default None, If stemmerTag1 is a callable object, basically a function or possibly a class, it will be called for the ID of every node given by tag1 in the graph, all IDs are strings.

For example: the function f = lambda x: x[0] if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title 'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes' will create the node 'G'.

stemmerTag2 : optional [func]

Default None, see stemmerTag1 as it is the same but for tag2

Returns

networkx Graph or networkx DiGraph

A networkx Graph with the objects of the tags tag1 and tag2 as nodes and their co-occurrences as edges.
rankedSeries(tag, outputFile=None, giveCounts=True, giveRanks=False, greatestFirst=True, pandasMode=True, limitTo=None)

Creates an pandas dict of the ordered list of all the values of tag, with and ranked by their number of occurrences. A list can also be returned with the the counts or ranks added or it can be written to a file.

Parameters

tag : str

The tag to be ranked

outputFile : optional str

A file path to write a csv with 2 columns, one the tag values the other their counts

giveCounts : optional bool

Default True, if True the retuned list will be composed of tuples the first values being the tag value and the second their counts. This supersedes giveRanks.

giveRanks : optional bool

Default False, if True and giveCounts is False, the retuned list will be composed of tuples the first values being the tag value and the second their ranks. This is superseded by giveCounts.

greatestFirst : optional bool

Default True, if True the returned list will be ordered with the highest ranked value first, otherwise the lowest ranked will be first.

pandasMode : optional bool

Default True, if True a dict ready for pandas will be returned, otherwise a list

limitTo : optional list[values]

Default None, if a list is provided only those values in the list will be counted or returned

Returns

dict[str:list[value]] or list[str]

A dict or list will be returned depending on if pandasMode is True
removeID(idVal)

Checks if the collected items contains the give idVal and removes it if it is found, will raise a KeyError if item is not found

Parameters

idVal : str

The removed id string
tags()

Creates a list of all the tags of the contained items

Returns

list [str]

A list of all the tags
timeSeries(tag=None, outputFile=None, giveYears=True, greatestFirst=True, limitTo=False, pandasMode=True)

Creates an pandas dict of the ordered list of all the values of tag, with and ranked by the year the occurred in, multiple year occurrences will create multiple entries. A list can also be returned with the the counts or years added or it can be written to a file.

If no tag is given the Records in the collection will be used

Parameters

tag : optional str

Default None, if provided the tag will be ordered

outputFile : optional str

A file path to write a csv with 2 columns, one the tag values the other their years

giveYears : optional bool

Default True, if True the retuned list will be composed of tuples the first values being the tag value and the second their years.

greatestFirst : optional bool

Default True, if True the returned list will be ordered with the highest years first, otherwise the lowest years will be first.

pandasMode : optional bool

Default True, if True a dict ready for pandas will be returned, otherwise a list

limitTo : optional list[values]

Default None, if a list is provided only those values in the list will be counted or returned

Returns

dict[str:list[value]] or list[str]

A dict or list will be returned depending on if pandasMode is True