CollectionWithIDs(Collection)¶
-
class
metaknowledge.
CollectionWithIDs
(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)¶ A Collection with a few extra methods that assume all the contained items have an id attribute and a bad attribute, e.g. Records or Grants.
__Init__
As
CollectionWithIDs
is mostly meant to be base for other classes all but one of the arguments in the__init__
are not optional and the optional one is not used. The__init__()
function is the same as a Collection.-
__init__
(inSet, allowedTypes, collectedTypes, name, bad, errors, quietStart=False)¶ Basically a collections.abc.MutableSet wrapper for a set with a bunch of extra record keeping attached.
-
badEntries
()¶ Creates a new collection of the same type with only the bad entries
-
containsID
(idVal)¶ Checks if the collected items contains the give idVal
-
cooccurrenceCounts
(keyTag, *countedTags)¶ Counts the number of times values from any of the countedTags occurs with keyTag. The counts are retuned as a dictionary with the values of keyTag mapping to dictionaries with each of the countedTags values mapping to thier counts.
Parameters¶
keyTag :
str
The tag used as the key for the returned dictionary_*countedTags_ :
str, str, str, ...
The tags used as the key for the returned dictionary’s values
-
discardID
(idVal)¶ Checks if the collected items contains the give idVal and discards it if it is found, will not raise an exception if item is not found
-
dropBadEntries
()¶ Removes all the bad entries from the collection
-
getID
(idVal)¶ Looks up an item with idVal and returns it if it is found, returns
None
if it does not find the item
-
glimpse
(*tags, compact=False)¶ Creates a printable table with the most frequently occurring values of each of the requested tags, or if none are provided the top authors, journals and citations. The table will be as wide and as tall as the terminal (or 80x24 if there is no terminal) so
print(RC.glimpse())
should always create a nice looking table. Below is a table created from some of the testing files:>>> print(RC.glimpse()) +RecordCollection glimpse made at: 2016-01-01 12:00:00++++++++++++++++++++++++++ |33 Records from testFile++++++++++++++++++++++++++++++++++++++++++++++++++++++| |Columns are ranked by num. of occurrences and are independent of one another++| |-------Top Authors--------+------Top Journals-------+--------Top Cited--------| |1 Girard, S|1 CANADIAN JOURNAL OF PH.|1 LEVY Y, 1975, OPT COMM.| |1 Gilles, H|1 JOURNAL OF THE OPTICAL.|2 GOOS F, 1947, ANN PHYS.| |2 IMBERT, C|2 APPLIED OPTICS|3 LOTSCH HKV, 1970, OPTI.| |2 Pillon, F|2 OPTICS COMMUNICATIONS|4 RENARD RH, 1964, J OPT.| |3 BEAUREGARD, OCD|2 NUOVO CIMENTO DELLA SO.|5 IMBERT C, 1972, PHYS R.| |3 Laroche, M|2 JOURNAL OF THE OPTICAL.|6 ARTMANN K, 1948, ANN P.| |3 HUARD, S|2 JOURNAL OF THE OPTICAL.|6 COSTADEB.O, 1973, PHYS.| |4 PURI, A|2 NOUVELLE REVUE D OPTIQ.|6 ROOSEN G, 1973, CR ACA.| |4 COSTADEB.O|3 PHYSICS REPORTS-REVIEW.|7 Imbert C., 1972, Nouve.| |4 PATTANAYAK, DN|3 PHYSICAL REVIEW LETTERS|8 HOROWITZ BR, 1971, J O.| |4 Gazibegovic, A|3 USPEKHI FIZICHESKIKH N.|8 BRETENAKER F, 1992, PH.| |4 ROOSEN, G|3 APPLIED PHYSICS B-LASE.|8 SCHILLIN.H, 1965, ANN .| |4 BIRMAN, JL|3 AEU-INTERNATIONAL JOUR.|8 FEDOROV FI, 1955, DOKL.| |4 Kaiser, R|3 COMPTES RENDUS HEBDOMA.|8 MAZET A, 1971, CR ACAD.| |5 LEVY, Y|3 CHINESE PHYSICS LETTERS|9 IMBERT C, 1972, CR ACA.| |5 BEAUREGA.OC|3 PHYSICAL REVIEW B|9 LOTSCH HKV, 1971, OPTI.| |5 PAVLOV, VI|3 LETTERE AL NUOVO CIMEN.|9 ASHBY N, 1973, PHYS RE.| |5 BREVIK, I|3 PROGRESS IN QUANTUM EL.|9 BOULWARE DG, 1973, PHY.| >>>
Parameters¶
tags :
str, str, ...
Any number of tag strings to be made into columns in the output table
-
networkMultiLevel
(*modes, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None, nodeAttribute=None, _networkTypeString='n-level network')¶ Creates a network of the objects found by any number of tags modes, with edges between all co-occurring values. IF you only want edges between co-occurring values from different tags use networkMultiMode().
A networkMultiLevel() looks are each entry in the collection and extracts its values for the tag given by each of the modes, e.g. the
'authorsFull'
tag. Then if multiple are returned an edge is created between them. So in the case of the author tag'authorsFull'
a co-authorship network is created. Then for each other tag the entries are also added and edges between the first tag’s node and theirs are created.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.Note Do not use this for the construction of co-citation networks use Recordcollection.networkCoCitation() it is more accurate and has more options.
Parameters¶
mode :
str
A two character WOS tag or one of the full names for a tagnodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called “count” that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.stemmer :
optional [func]
Default
None
, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, all IDs are strings. For example:The function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.Returns¶
networkx Graph
A networkx Graph with the objects of the tag mode as nodes and their co-occurrences as edges
-
networkMultiMode
(*tags, recordType=True, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None)¶ Creates a network of the objects found by all tags in tags, each node is marked by which tag spawned it making the resultant graph n-partite.
A networkMultiMode() looks are each item in the collection and extracts its values for the tags given by tags. Then for all objects returned an edge is created between them, regardless of their type. Each node will have an attribute call
'type'
that gives the tag that created it or both if both created it, e.g. if'LA'
were in tags node'English'
would have the type attribute be'LA'
.For example if tags was set to
['CR', 'UT', 'LA']
, a three mode network would be created, composed of a co-citation network from the'CR'
tag. Then each citation would also have edges to all the languages of Records that cited it and to the WOS number of the those Records.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.Parameters¶
tags :
str
,str
,str
, … orlist [str]
Any number of tags, or a list of tagsnodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called'count'
that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called'weight'
that contains an int giving the number of time the two objects co-occurrenced.stemmer :
optional [func]
Default
None
, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, note that all IDs are strings.For example: the function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.Returns¶
networkx Graph
A networkx Graph with the objects of the tags tags as nodes and their co-occurrences as edges
-
networkOneMode
(mode, nodeCount=True, edgeWeight=True, stemmer=None, edgeAttribute=None, nodeAttribute=None)¶ Creates a network of the objects found by one tag mode. This is the same as networkMultiLevel() with only one tag.
A networkOneMode() looks are each entry in the collection and extracts its values for the tag given by mode, e.g. the
'authorsFull'
tag. Then if multiple are returned an edge is created between them. So in the case of the author tag'authorsFull'
a co-authorship network is created.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.Note Do not use this for the construction of co-citation networks use Recordcollection.networkCoCitation() it is more accurate and has more options.
Parameters¶
mode :
str
A two character WOS tag or one of the full names for a tagnodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called “count” that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.stemmer :
optional [func]
Default
None
, If stemmer is a callable object, basically a function or possibly a class, it will be called for the ID of every node in the graph, all IDs are strings. For example:The function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.Returns¶
networkx Graph
A networkx Graph with the objects of the tag mode as nodes and their co-occurrences as edges
-
networkTwoMode
(tag1, tag2, directed=False, recordType=True, nodeCount=True, edgeWeight=True, stemmerTag1=None, stemmerTag2=None, edgeAttribute=None)¶ Creates a network of the objects found by two WOS tags tag1 and tag2, each node marked by which tag spawned it making the resultant graph bipartite.
A networkTwoMode() looks at each Record in the
RecordCollection
and extracts its values for the tags given by tag1 and tag2, e.g. the'WC'
and'LA'
tags. Then for each object returned by each tag and edge is created between it and every other object of the other tag. So the WOS defined subject tag'WC'
and language tag'LA'
, will give a two-mode network showing the connections between subjects and languages. Each node will have an attribute call'type'
that gives the tag that created it or both if both created it, e.g. the node'English'
would have the type attribute be'LA'
.The number of times each object occurs is count if nodeCount is
True
and the edges count the number of co-occurrences if edgeWeight isTrue
. Both areTrue
by default.The directed parameter if
True
will cause the network to be directed with the first tag as the source and the second as the destination.Parameters¶
tag1 :
str
A two character WOS tag or one of the full names for a tag, the source of edges on the graphtag1 :
str
A two character WOS tag or one of the full names for a tag, the target of edges on the graphdirected :
optional [bool]
DefaultFalse
, ifTrue
the returned network is directednodeCount :
optional [bool]
DefaultTrue
, ifTrue
each node will have an attribute called “count” that contains an int giving the number of time the object occurred.edgeWeight :
optional [bool]
DefaultTrue
, ifTrue
each edge will have an attribute called “weight” that contains an int giving the number of time the two objects co-occurrenced.stemmerTag1 :
optional [func]
Default
None
, If stemmerTag1 is a callable object, basically a function or possibly a class, it will be called for the ID of every node given by tag1 in the graph, all IDs are strings.For example: the function
f = lambda x: x[0]
if given as the stemmer will cause all IDs to be the first character of their unstemmed IDs. e.g. the title'Goos-Hanchen and Imbert-Fedorov shifts for leaky guided modes'
will create the node'G'
.stemmerTag2 :
optional [func]
DefaultNone
, see stemmerTag1 as it is the same but for tag2Returns¶
networkx Graph or networkx DiGraph
A networkx Graph with the objects of the tags tag1 and tag2 as nodes and their co-occurrences as edges.
-
rankedSeries
(tag, outputFile=None, giveCounts=True, giveRanks=False, greatestFirst=True, pandasMode=True, limitTo=None)¶ Creates an pandas dict of the ordered list of all the values of tag, with and ranked by their number of occurrences. A list can also be returned with the the counts or ranks added or it can be written to a file.
Parameters¶
tag :
str
The tag to be rankedoutputFile :
optional str
A file path to write a csv with 2 columns, one the tag values the other their countsgiveCounts :
optional bool
DefaultTrue
, ifTrue
the retuned list will be composed of tuples the first values being the tag value and the second their counts. This supersedes giveRanks.giveRanks :
optional bool
DefaultFalse
, ifTrue
and giveCounts isFalse
, the retuned list will be composed of tuples the first values being the tag value and the second their ranks. This is superseded by giveCounts.greatestFirst :
optional bool
DefaultTrue
, ifTrue
the returned list will be ordered with the highest ranked value first, otherwise the lowest ranked will be first.pandasMode :
optional bool
DefaultTrue
, ifTrue
adict
ready for pandas will be returned, otherwise a listlimitTo :
optional list[values]
DefaultNone
, if a list is provided only those values in the list will be counted or returnedReturns¶
dict[str:list[value]] or list[str]
Adict
orlist
will be returned depending on if pandasMode isTrue
-
removeID
(idVal)¶ Checks if the collected items contains the give idVal and removes it if it is found, will raise a
KeyError
if item is not found
Creates a list of all the tags of the contained items
-
timeSeries
(tag=None, outputFile=None, giveYears=True, greatestFirst=True, limitTo=False, pandasMode=True)¶ Creates an pandas dict of the ordered list of all the values of tag, with and ranked by the year the occurred in, multiple year occurrences will create multiple entries. A list can also be returned with the the counts or years added or it can be written to a file.
If no tag is given the
Records
in the collection will be usedParameters¶
tag :
optional str
DefaultNone
, if provided the tag will be orderedoutputFile :
optional str
A file path to write a csv with 2 columns, one the tag values the other their yearsgiveYears :
optional bool
DefaultTrue
, ifTrue
the retuned list will be composed of tuples the first values being the tag value and the second their years.greatestFirst :
optional bool
DefaultTrue
, ifTrue
the returned list will be ordered with the highest years first, otherwise the lowest years will be first.pandasMode :
optional bool
DefaultTrue
, ifTrue
adict
ready for pandas will be returned, otherwise a listlimitTo :
optional list[values]
DefaultNone
, if a list is provided only those values in the list will be counted or returnedReturns¶
dict[str:list[value]] or list[str]
Adict
orlist
will be returned depending on if pandasMode isTrue
-