medline¶
Overview¶
These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.
Functions¶
-
metaknowledge.medline.medlineHandlers.
isMedlineFile
(infile, checkedLines=2)¶ Determines if infile is the path to a Medline file. A file is considerd to be a Medline file if it has the correct encoding (
latin-1
) and within the first checkedLines a line starts with"PMID- "
.Parameters¶
infile :
str
The path to the targets filecheckedLines :
optional [int]
default 2, the number of lines to check for the header
-
metaknowledge.medline.medlineHandlers.
medlineParser
(pubFile)¶ Parses a medline file, pubFile, to extract the individual entries as MedlineRecords.
A medline file is a series of entries, each entry is a series of tags. A tag is a 2 to 4 character string each tag is padded with spaces on the left to make it 4 characters which is followed by a dash and a space (
'- '
). Everything after the tag and on all lines after it not starting with a tag is considered associated with the tag. Each entry’s first tag isPMID
, so a first line looks something likePMID- 26524502
. Entries end with a single blank line.
Special Functions¶
-
metaknowledge.medline.tagProcessing.specialFunctions.
DOI
(R)¶
-
metaknowledge.medline.tagProcessing.specialFunctions.
address
(R)¶ Gets the first address of the first author
-
metaknowledge.medline.tagProcessing.specialFunctions.
beginningPage
(R)¶ As pages may not be given as numbers this is the most accurate this function can be
-
metaknowledge.medline.tagProcessing.specialFunctions.
month
(R)¶
-
metaknowledge.medline.tagProcessing.specialFunctions.
volume
(R)¶ Returns the first number/word of the volume field, hopefully trimming something like:
'49 Suppl 20'
to49
-
metaknowledge.medline.tagProcessing.specialFunctions.
year
(R)¶
Tag Functions¶
-
metaknowledge.medline.tagProcessing.tagFunctions.
AB
(val)¶ - Abstractbasically a one liner after parsing
-
metaknowledge.medline.tagProcessing.tagFunctions.
AD
(val)¶ - AffiliationUndoing what the parser does then splitting at the semicolons and dropping newlines extra fitlering is required beacuse some AD’s end with a semicolon
-
metaknowledge.medline.tagProcessing.tagFunctions.
AID
(val)¶ - ArticleIdentifierThe given values do not require any work
-
metaknowledge.medline.tagProcessing.tagFunctions.
AU
(val)¶ Author
-
metaknowledge.medline.tagProcessing.tagFunctions.
AUID
(val)¶ - AuthorIdentifierone line only just need to undo the parser’s effects
-
metaknowledge.medline.tagProcessing.tagFunctions.
BTI
(val)¶ BookTitle
-
metaknowledge.medline.tagProcessing.tagFunctions.
CI
(val)¶ CopyrightInformation
-
metaknowledge.medline.tagProcessing.tagFunctions.
CIN
(val)¶ CommentIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
CN
(val)¶ CorporateAuthor
-
metaknowledge.medline.tagProcessing.tagFunctions.
CRDT
(val)¶ CreateDate
-
metaknowledge.medline.tagProcessing.tagFunctions.
CRF
(val)¶ CorrectedRepublishedFrom
-
metaknowledge.medline.tagProcessing.tagFunctions.
CRI
(val)¶ CorrectedRepublishedIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
CTI
(val)¶ CollectionTitle
-
metaknowledge.medline.tagProcessing.tagFunctions.
DA
(val)¶ DateCreated
-
metaknowledge.medline.tagProcessing.tagFunctions.
DCOM
(val)¶ DateCompleted
-
metaknowledge.medline.tagProcessing.tagFunctions.
DDIN
(val)¶ DatasetIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
DEP
(val)¶ DateElectronicPublication
-
metaknowledge.medline.tagProcessing.tagFunctions.
DP
(val)¶ DatePublication
-
metaknowledge.medline.tagProcessing.tagFunctions.
DRIN
(val)¶ DatasetUseReportedIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
EDAT
(val)¶ EntrezDate
-
metaknowledge.medline.tagProcessing.tagFunctions.
EFR
(val)¶ ErratumFor
-
metaknowledge.medline.tagProcessing.tagFunctions.
EIN
(val)¶ ErratumIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
EN
(val)¶ Edition
-
metaknowledge.medline.tagProcessing.tagFunctions.
FAU
(val)¶ FullAuthor
-
metaknowledge.medline.tagProcessing.tagFunctions.
FED
(val)¶ Editor
-
metaknowledge.medline.tagProcessing.tagFunctions.
FIR
(val)¶ InvestigatorFull
-
metaknowledge.medline.tagProcessing.tagFunctions.
FPS
(val)¶ FullPersonalNameSubject
-
metaknowledge.medline.tagProcessing.tagFunctions.
GN
(val)¶ GeneralNote
-
metaknowledge.medline.tagProcessing.tagFunctions.
GR
(val)¶ GrantNumber
-
metaknowledge.medline.tagProcessing.tagFunctions.
GS
(val)¶ GeneSymbol
-
metaknowledge.medline.tagProcessing.tagFunctions.
IP
(val)¶ Issue
-
metaknowledge.medline.tagProcessing.tagFunctions.
IR
(val)¶ Investigator
-
metaknowledge.medline.tagProcessing.tagFunctions.
IRAD
(val)¶ InvestigatorAffiliation
-
metaknowledge.medline.tagProcessing.tagFunctions.
IS
(val)¶ ISSN
-
metaknowledge.medline.tagProcessing.tagFunctions.
ISBN
(val)¶
-
metaknowledge.medline.tagProcessing.tagFunctions.
JID
(val)¶ NLMID
-
metaknowledge.medline.tagProcessing.tagFunctions.
JT
(val)¶ - JournalTitleOne line only
-
metaknowledge.medline.tagProcessing.tagFunctions.
LA
(val)¶ Language
-
metaknowledge.medline.tagProcessing.tagFunctions.
LID
(val)¶ LocationIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
LR
(val)¶ DateLastRevised
-
metaknowledge.medline.tagProcessing.tagFunctions.
MH
(val)¶ MeSHTerms
-
metaknowledge.medline.tagProcessing.tagFunctions.
MHDA
(val)¶ MeSHDate
-
metaknowledge.medline.tagProcessing.tagFunctions.
MID
(val)¶ ManuscriptIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
NM
(val)¶ SubstanceName
-
metaknowledge.medline.tagProcessing.tagFunctions.
OABL
(val)¶ OtherAbstract
-
metaknowledge.medline.tagProcessing.tagFunctions.
OCI
(val)¶ OtherCopyright
-
metaknowledge.medline.tagProcessing.tagFunctions.
OID
(val)¶ OtherID
-
metaknowledge.medline.tagProcessing.tagFunctions.
ORI
(val)¶ OriginalReportIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
OT
(val)¶ - OtherTermNothing needs to be done
-
metaknowledge.medline.tagProcessing.tagFunctions.
OTO
(val)¶ - OtherTermOwnerone line field
-
metaknowledge.medline.tagProcessing.tagFunctions.
OWN
(val)¶ Owner
-
metaknowledge.medline.tagProcessing.tagFunctions.
PG
(val)¶ - Paginationall pagination seen so far seems to be only on one line
-
metaknowledge.medline.tagProcessing.tagFunctions.
PHST
(val)¶ PublicationHistoryStatus
-
metaknowledge.medline.tagProcessing.tagFunctions.
PL
(val)¶ PlacePublication
-
metaknowledge.medline.tagProcessing.tagFunctions.
PMC
(val)¶ PubMedCentralIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
PMCR
(val)¶ PubMedCentralRelease
-
metaknowledge.medline.tagProcessing.tagFunctions.
PMID
(val)¶ PubMedUniqueIdentifier
-
metaknowledge.medline.tagProcessing.tagFunctions.
PRIN
(val)¶ PartialRetractionIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
PROF
(val)¶ PartialRetractionOf
-
metaknowledge.medline.tagProcessing.tagFunctions.
PS
(val)¶ PersonalNameSubject
-
metaknowledge.medline.tagProcessing.tagFunctions.
PST
(val)¶ PublicationStatus
-
metaknowledge.medline.tagProcessing.tagFunctions.
PT
(val)¶ PublicationType
-
metaknowledge.medline.tagProcessing.tagFunctions.
PUBM
(val)¶ PublishingModel
-
metaknowledge.medline.tagProcessing.tagFunctions.
RF
(val)¶ NumberReferences
-
metaknowledge.medline.tagProcessing.tagFunctions.
RIN
(val)¶ RetractionIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
RN
(val)¶ RegistryNumber
-
metaknowledge.medline.tagProcessing.tagFunctions.
ROF
(val)¶ RetractionOf
-
metaknowledge.medline.tagProcessing.tagFunctions.
RPF
(val)¶ RepublishedFrom
-
metaknowledge.medline.tagProcessing.tagFunctions.
RPI
(val)¶ RepublishedIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
SB
(val)¶ Subset
-
metaknowledge.medline.tagProcessing.tagFunctions.
SFM
(val)¶ SpaceFlightMission
-
metaknowledge.medline.tagProcessing.tagFunctions.
SI
(val)¶ SecondarySourceID
-
metaknowledge.medline.tagProcessing.tagFunctions.
SO
(val)¶ Source
-
metaknowledge.medline.tagProcessing.tagFunctions.
SPIN
(val)¶ SummaryForPatients
-
metaknowledge.medline.tagProcessing.tagFunctions.
STAT
(val)¶ Status
-
metaknowledge.medline.tagProcessing.tagFunctions.
TA
(val)¶ - JournalTitleAbbreviationOne line only
-
metaknowledge.medline.tagProcessing.tagFunctions.
TI
(val)¶ - Titleonly one per record
-
metaknowledge.medline.tagProcessing.tagFunctions.
TT
(val)¶ TransliteratedTitle
-
metaknowledge.medline.tagProcessing.tagFunctions.
UIN
(val)¶ UpdateIn
-
metaknowledge.medline.tagProcessing.tagFunctions.
UOF
(val)¶ UpdateOf
-
metaknowledge.medline.tagProcessing.tagFunctions.
VI
(val)¶ - VolumeThe volumes as a string as volume is single line
-
metaknowledge.medline.tagProcessing.tagFunctions.
VTI
(val)¶ VolumeTitle
Backend¶
-
class
metaknowledge.medline.recordMedline.
MedlineRecord
(inRecord, sFile='', sLine=0)¶ Bases:
metaknowledge.mkRecord.ExtendedRecord
Class for full Medline(Pubmed) entries.
This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use medlineParser() on a medline file.
-
authGenders
(countsOnly=False, fractionsMode=False, _countsTuple=False)¶ Creates a dict mapping
'Male'
,'Female'
and'Unknown'
to lists of the names of all the authors.Parameters¶
countsOnly :
optional bool
DefaultFalse
, ifTrue
the counts (lengths of the lists) will be given instead of the lists of namesfractionsMode :
optional bool
DefaultFalse
, ifTrue
the fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly
-
bibString
(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)¶ Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (
PT J
) the bibtext type is set to'article'
, otherwise it is set to'misc'
. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.
Note Record entries that are lists have their values seperated with the string
' and '
Parameters¶
maxLength :
optional [int]
default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the'#'
character) to allow for longer stringsWOSMode :
optional [bool]
defaultFalse
, ifTrue
the data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.restrictedOutput :
optional [bool]
defaultFalse
, ifTrue
the tags output will be limited to tose found inmetaknowledge.commonRecordFields
niceID :
optional [bool]
defaultTrue
, ifTrue
the ID used will be derived from the authors, publishing date and title, ifFalse
it will be the UT tag
-
copy
()¶ Correctly copies the
Record
-
createCitation
(multiCite=False)¶ Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags (
'year'
,'J9'
,'volume'
,'beginningPage'
,'DOI'
) and using it to create a Citation object.Parameters¶
multiCite :
optional [bool]
DefaultFalse
, ifTrue
a tuple of Citations is returned with each having a different one of the records authors as the author
-
encoding
()¶ An
abstractmethod
, gives the encoding string of the record.
-
get
(tag, default=None, raw=False)¶ Allows access to the raw values or is an Exception safe wrapper to
__getitem__
.Parameters¶
tag :
str
The requested tagdefault :
optional [Object]
DefaultNone
, the object returned when tag is not foundraw :
optional [bool]
DefaultFalse
, ifTrue
the unprocessed value of tag is returned
-
static
getAltName
(tag)¶ An
abstractmethod
, gives the alternate name of tag orNone
-
getCitations
(field=None, values=None, pandasFriendly=True)¶ Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.
There are also options to filter the output citations with field and values
Parameters¶
field :
optional str
DefaultNone
, if given all citations missing the named field will be dropped.values :
optional str or list[str]
Default
None
, if field is also given only those citations with one of the strings given in values will be included.e.g. to get only citations from 1990 or 1991:
field = year, values = [1991, 1990]
pandasFriendly :
optional bool
DefaultTrue
, ifFalse
a list of the citations will be returned instead of the more complicated pandas dict
-
id
¶
-
items
(raw=False)¶ Like
items
for dicts but with araw
optionParameters¶
raw :
optional [bool]
DefaultFalse
, ifTrue
theKeysView
contains the raw values as the values
-
keys
() → a set-like object providing a view on D's keys¶
-
sourceFile
¶
-
sourceLine
¶
-
specialFuncs
(key)¶ An
abstractmethod
, process the special tag, key using the wholeRecord
Parameters¶
key :
str
One of the special tags:'authorsFull'
,'keywords'
,'grants'
,'j9'
,'authorsShort'
,'volume'
,'selfCitation'
,'citations'
,'address'
,'abstract'
,'title'
,'month'
,'year'
,'journal'
,'beginningPage'
and'DOI'
Returns¶
The processed value of key
-
subDict
(tags, raw=False)¶ Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be
None
.Parameters¶
tags :
list[str]
The list of tags requestedraw :
optional [bool]
defaultFalse
ifTrue
the retuned values of the dict will be unprocessed
-
static
tagProcessingFunc
(tag)¶ An
abstractmethod
, gives the function for processing tag
-
title
¶
-
values
(raw=False)¶ Like
values
for dicts but with araw
option
-
writeRecord
(f)¶ This is nearly identical to the original the FAU tag is the only tag not writen in the same place, doing so would require changing the parser and lots of extra logic.
-
-
metaknowledge.medline.recordMedline.
medlineRecordParser
(record)¶ The parser
`MedlineRecord
<../classes/MedlineRecord.html#metaknowledge.medline.MedlineRecord>`__ use. This takes an entry from medlineParser() and parses it a part of the creation of aMedlineRecord
.