scopus¶
Overview¶
Functions¶
-
metaknowledge.scopus.scopusHandlers.isScopusFile(infile, checkedLines=2, maxHeaderDiff=3)¶ Determines if infile is the path to a Scopus csv file. A file is considerd to be a Scopus file if it has the correct encoding (
utf-8with BOM (Byte Order Mark)) and within the first checkedLines a line contains the complete header, the list of all header entries in order is found in`scopus.scopusHeader<#metaknowledge.scopus>`__.Note this is for csv files not plain text files from scopus, plain text files are not complete.
Parameters¶
infile :
strThe path to the targets filecheckedLines :
optional [int]default 2, the number of lines to check for the headermaxHeaderDiff :
optional [int]default 3, maximum number of different entries in the potetial file from the current known headermetaknowledge.scopus.scopusHeader, if exceeded anFalsewill be returned
-
metaknowledge.scopus.scopusHandlers.scopusParser(scopusFile)¶ Parses a scopus file, scopusFile, to extract the individual lines as ScopusRecords.
A Scopus file is a csv (Comma-separated values) with a complete header, see
`scopus.scopusHeader<#metaknowledge.scopus>`__ for the entries, and each line after it containing a record’s entry. The string valued entries are quoted with double quotes which means double quotes inside them can cause issues, see scopusRecordParser() for more information.
Special Functions¶
Tag Functions¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.citeValue(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.commaSpaceSeperated(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.grantValue(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.integralValue(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.semicolonSeperated(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.semicolonSpaceSeperated(val)¶
-
metaknowledge.scopus.tagProcessing.tagFunctions.stringValue(val)¶
Backend¶
-
class
metaknowledge.scopus.recordScopus.ScopusRecord(inRecord, sFile='', sLine=0, header=None)¶ Bases:
metaknowledge.mkRecord.ExtendedRecordClass for full Scopus entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use scopusParser() on a scopus CSV file.
-
authGenders(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male','Female'and'Unknown'to lists of the names of all the authors.Parameters¶
countsOnly :
optional boolDefaultFalse, ifTruethe counts (lengths of the lists) will be given instead of the lists of namesfractionsMode :
optional boolDefaultFalse, ifTruethe fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly
-
bibString(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J) the bibtext type is set to'article', otherwise it is set to'misc'. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and 'Parameters¶
maxLength :
optional [int]default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the'#'character) to allow for longer stringsWOSMode :
optional [bool]defaultFalse, ifTruethe data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.restrictedOutput :
optional [bool]defaultFalse, ifTruethe tags output will be limited to tose found inmetaknowledge.commonRecordFieldsniceID :
optional [bool]defaultTrue, ifTruethe ID used will be derived from the authors, publishing date and title, ifFalseit will be the UT tag
-
copy()¶ Correctly copies the
Record
-
createCitation(multiCite=False)¶ Overwriting the general citation creator to deal with scopus weirdness.
Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year','J9','volume','beginningPage','DOI') and using it to create a Citation object.Parameters¶
multiCite :
optional [bool]DefaultFalse, ifTruea tuple of Citations is returned with each having a different one of the records authors as the author
-
encoding()¶ An
abstractmethod, gives the encoding string of the record.
-
get(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__.Parameters¶
tag :
strThe requested tagdefault :
optional [Object]DefaultNone, the object returned when tag is not foundraw :
optional [bool]DefaultFalse, ifTruethe unprocessed value of tag is returned
-
static
getAltName(tag)¶ An
abstractmethod, gives the alternate name of tag orNone
-
getCitations(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
Parameters¶
field :
optional strDefaultNone, if given all citations missing the named field will be dropped.values :
optional str or list[str]Default
None, if field is also given only those citations with one of the strings given in values will be included.e.g. to get only citations from 1990 or 1991:
field = year, values = [1991, 1990]pandasFriendly :
optional boolDefaultTrue, ifFalsea list of the citations will be returned instead of the more complicated pandas dict
-
id¶
-
items(raw=False)¶ Like
itemsfor dicts but with arawoptionParameters¶
raw :
optional [bool]DefaultFalse, ifTruetheKeysViewcontains the raw values as the values
-
keys() → a set-like object providing a view on D's keys¶
-
sourceFile¶
-
sourceLine¶
-
specialFuncs(key)¶ An
abstractmethod, process the special tag, key using the wholeRecordParameters¶
key :
strOne of the special tags:'authorsFull','keywords','grants','j9','authorsShort','volume','selfCitation','citations','address','abstract','title','month','year','journal','beginningPage'and'DOI'Returns¶
The processed value of key
-
subDict(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None.Parameters¶
tags :
list[str]The list of tags requestedraw :
optional [bool]defaultFalseifTruethe retuned values of the dict will be unprocessed
-
static
tagProcessingFunc(tag)¶ An
abstractmethod, gives the function for processing tag
-
title¶
-
values(raw=False)¶ Like
valuesfor dicts but with arawoption
-
writeRecord(f)¶ An
abstractmethod, writes the record in its original form to infile
-
-
metaknowledge.scopus.recordScopus.scopusRecordParser(record, header=None)¶ The parser ScopusRecords use. This takes a line from scopusParser() and parses it as a part of the creation of a
ScopusRecord.Note this is for csv files downloaded from scopus not the text records as those are less complete. Also, Scopus uses double quotes (
") to quote strings, such as abstracts, in the csv so double quotes in the string must be escaped. For reasons not fully understandable by mortals they choose to use two double quotes in a row ("") to represent an escaped double quote. This parser does not unescape these quotes, but it does correctly handle their interacts with the outer double quotes.