medline

Overview

These are the functions used to process medline (pubmed) files at the backend. They are meant for use internal use by metaknowledge.

Functions

metaknowledge.medline.medlineHandlers.isMedlineFile(infile, checkedLines=2)

Determines if infile is the path to a Medline file. A file is considerd to be a Medline file if it has the correct encoding (latin-1) and within the first checkedLines a line starts with "PMID- ".

Parameters

infile : str

The path to the targets file

checkedLines : optional [int]

default 2, the number of lines to check for the header

Returns

bool

True if the file is a Medline file
metaknowledge.medline.medlineHandlers.medlineParser(pubFile)

Parses a medline file, pubFile, to extract the individual entries as MedlineRecords.

A medline file is a series of entries, each entry is a series of tags. A tag is a 2 to 4 character string each tag is padded with spaces on the left to make it 4 characters which is followed by a dash and a space ('- '). Everything after the tag and on all lines after it not starting with a tag is considered associated with the tag. Each entry’s first tag is PMID, so a first line looks something like PMID- 26524502. Entries end with a single blank line.

Parameters

pubFile : str

A path to a valid medline file, use isMedlineFile to verify

Returns

set[MedlineRecord]

Records for each of the entries

Special Functions

metaknowledge.medline.tagProcessing.specialFunctions.DOI(R)
metaknowledge.medline.tagProcessing.specialFunctions.address(R)

Gets the first address of the first author

metaknowledge.medline.tagProcessing.specialFunctions.beginningPage(R)

As pages may not be given as numbers this is the most accurate this function can be

metaknowledge.medline.tagProcessing.specialFunctions.month(R)
metaknowledge.medline.tagProcessing.specialFunctions.volume(R)

Returns the first number/word of the volume field, hopefully trimming something like: '49 Suppl 20' to 49

metaknowledge.medline.tagProcessing.specialFunctions.year(R)

Tag Functions

metaknowledge.medline.tagProcessing.tagFunctions.AB(val)
Abstract
basically a one liner after parsing
metaknowledge.medline.tagProcessing.tagFunctions.AD(val)
Affiliation
Undoing what the parser does then splitting at the semicolons and dropping newlines extra fitlering is required beacuse some AD’s end with a semicolon
metaknowledge.medline.tagProcessing.tagFunctions.AID(val)
ArticleIdentifier
The given values do not require any work
metaknowledge.medline.tagProcessing.tagFunctions.AU(val)

Author

metaknowledge.medline.tagProcessing.tagFunctions.AUID(val)
AuthorIdentifier
one line only just need to undo the parser’s effects
metaknowledge.medline.tagProcessing.tagFunctions.BTI(val)

BookTitle

metaknowledge.medline.tagProcessing.tagFunctions.CI(val)

CopyrightInformation

metaknowledge.medline.tagProcessing.tagFunctions.CIN(val)

CommentIn

metaknowledge.medline.tagProcessing.tagFunctions.CN(val)

CorporateAuthor

metaknowledge.medline.tagProcessing.tagFunctions.CRDT(val)

CreateDate

metaknowledge.medline.tagProcessing.tagFunctions.CRF(val)

CorrectedRepublishedFrom

metaknowledge.medline.tagProcessing.tagFunctions.CRI(val)

CorrectedRepublishedIn

metaknowledge.medline.tagProcessing.tagFunctions.CTI(val)

CollectionTitle

metaknowledge.medline.tagProcessing.tagFunctions.DA(val)

DateCreated

metaknowledge.medline.tagProcessing.tagFunctions.DCOM(val)

DateCompleted

metaknowledge.medline.tagProcessing.tagFunctions.DDIN(val)

DatasetIn

metaknowledge.medline.tagProcessing.tagFunctions.DEP(val)

DateElectronicPublication

metaknowledge.medline.tagProcessing.tagFunctions.DP(val)

DatePublication

metaknowledge.medline.tagProcessing.tagFunctions.DRIN(val)

DatasetUseReportedIn

metaknowledge.medline.tagProcessing.tagFunctions.EDAT(val)

EntrezDate

metaknowledge.medline.tagProcessing.tagFunctions.EFR(val)

ErratumFor

metaknowledge.medline.tagProcessing.tagFunctions.EIN(val)

ErratumIn

metaknowledge.medline.tagProcessing.tagFunctions.EN(val)

Edition

metaknowledge.medline.tagProcessing.tagFunctions.FAU(val)

FullAuthor

metaknowledge.medline.tagProcessing.tagFunctions.FED(val)

Editor

metaknowledge.medline.tagProcessing.tagFunctions.FIR(val)

InvestigatorFull

metaknowledge.medline.tagProcessing.tagFunctions.FPS(val)

FullPersonalNameSubject

metaknowledge.medline.tagProcessing.tagFunctions.GN(val)

GeneralNote

metaknowledge.medline.tagProcessing.tagFunctions.GR(val)

GrantNumber

metaknowledge.medline.tagProcessing.tagFunctions.GS(val)

GeneSymbol

metaknowledge.medline.tagProcessing.tagFunctions.IP(val)

Issue

metaknowledge.medline.tagProcessing.tagFunctions.IR(val)

Investigator

metaknowledge.medline.tagProcessing.tagFunctions.IRAD(val)

InvestigatorAffiliation

metaknowledge.medline.tagProcessing.tagFunctions.IS(val)

ISSN

metaknowledge.medline.tagProcessing.tagFunctions.ISBN(val)
metaknowledge.medline.tagProcessing.tagFunctions.JID(val)

NLMID

metaknowledge.medline.tagProcessing.tagFunctions.JT(val)
JournalTitle
One line only
metaknowledge.medline.tagProcessing.tagFunctions.LA(val)

Language

metaknowledge.medline.tagProcessing.tagFunctions.LID(val)

LocationIdentifier

metaknowledge.medline.tagProcessing.tagFunctions.LR(val)

DateLastRevised

metaknowledge.medline.tagProcessing.tagFunctions.MH(val)

MeSHTerms

metaknowledge.medline.tagProcessing.tagFunctions.MHDA(val)

MeSHDate

metaknowledge.medline.tagProcessing.tagFunctions.MID(val)

ManuscriptIdentifier

metaknowledge.medline.tagProcessing.tagFunctions.NM(val)

SubstanceName

metaknowledge.medline.tagProcessing.tagFunctions.OABL(val)

OtherAbstract

metaknowledge.medline.tagProcessing.tagFunctions.OCI(val)

OtherCopyright

metaknowledge.medline.tagProcessing.tagFunctions.OID(val)

OtherID

metaknowledge.medline.tagProcessing.tagFunctions.ORI(val)

OriginalReportIn

metaknowledge.medline.tagProcessing.tagFunctions.OT(val)
OtherTerm
Nothing needs to be done
metaknowledge.medline.tagProcessing.tagFunctions.OTO(val)
OtherTermOwner
one line field
metaknowledge.medline.tagProcessing.tagFunctions.OWN(val)

Owner

metaknowledge.medline.tagProcessing.tagFunctions.PG(val)
Pagination
all pagination seen so far seems to be only on one line
metaknowledge.medline.tagProcessing.tagFunctions.PHST(val)

PublicationHistoryStatus

metaknowledge.medline.tagProcessing.tagFunctions.PL(val)

PlacePublication

metaknowledge.medline.tagProcessing.tagFunctions.PMC(val)

PubMedCentralIdentifier

metaknowledge.medline.tagProcessing.tagFunctions.PMCR(val)

PubMedCentralRelease

metaknowledge.medline.tagProcessing.tagFunctions.PMID(val)

PubMedUniqueIdentifier

metaknowledge.medline.tagProcessing.tagFunctions.PRIN(val)

PartialRetractionIn

metaknowledge.medline.tagProcessing.tagFunctions.PROF(val)

PartialRetractionOf

metaknowledge.medline.tagProcessing.tagFunctions.PS(val)

PersonalNameSubject

metaknowledge.medline.tagProcessing.tagFunctions.PST(val)

PublicationStatus

metaknowledge.medline.tagProcessing.tagFunctions.PT(val)

PublicationType

metaknowledge.medline.tagProcessing.tagFunctions.PUBM(val)

PublishingModel

metaknowledge.medline.tagProcessing.tagFunctions.RF(val)

NumberReferences

metaknowledge.medline.tagProcessing.tagFunctions.RIN(val)

RetractionIn

metaknowledge.medline.tagProcessing.tagFunctions.RN(val)

RegistryNumber

metaknowledge.medline.tagProcessing.tagFunctions.ROF(val)

RetractionOf

metaknowledge.medline.tagProcessing.tagFunctions.RPF(val)

RepublishedFrom

metaknowledge.medline.tagProcessing.tagFunctions.RPI(val)

RepublishedIn

metaknowledge.medline.tagProcessing.tagFunctions.SB(val)

Subset

metaknowledge.medline.tagProcessing.tagFunctions.SFM(val)

SpaceFlightMission

metaknowledge.medline.tagProcessing.tagFunctions.SI(val)

SecondarySourceID

metaknowledge.medline.tagProcessing.tagFunctions.SO(val)

Source

metaknowledge.medline.tagProcessing.tagFunctions.SPIN(val)

SummaryForPatients

metaknowledge.medline.tagProcessing.tagFunctions.STAT(val)

Status

metaknowledge.medline.tagProcessing.tagFunctions.TA(val)
JournalTitleAbbreviation
One line only
metaknowledge.medline.tagProcessing.tagFunctions.TI(val)
Title
only one per record
metaknowledge.medline.tagProcessing.tagFunctions.TT(val)

TransliteratedTitle

metaknowledge.medline.tagProcessing.tagFunctions.UIN(val)

UpdateIn

metaknowledge.medline.tagProcessing.tagFunctions.UOF(val)

UpdateOf

metaknowledge.medline.tagProcessing.tagFunctions.VI(val)
Volume
The volumes as a string as volume is single line
metaknowledge.medline.tagProcessing.tagFunctions.VTI(val)

VolumeTitle

Backend

class metaknowledge.medline.recordMedline.MedlineRecord(inRecord, sFile='', sLine=0)

Bases: metaknowledge.mkRecord.ExtendedRecord

Class for full Medline(Pubmed) entries.

This class is an ExtendedRecord capable of generating its own id number. You should not create them directly, but instead use medlineParser() on a medline file.

authGenders(countsOnly=False, fractionsMode=False, _countsTuple=False)

Creates a dict mapping 'Male', 'Female' and 'Unknown' to lists of the names of all the authors.

Parameters

countsOnly : optional bool

Default False, if True the counts (lengths of the lists) will be given instead of the lists of names

fractionsMode : optional bool

Default False, if True the fraction counts (lengths of the lists divided by the total number of authors) will be given instead of the lists of names. This supersedes countsOnly

Returns

dict[str:str or int]

The mapping of genders to author’s names or counts
authors
bibString(maxLength=1000, WOSMode=False, restrictedOutput=False, niceID=True)

Makes a string giving the Record as a bibTex entry. If the Record is of a journal article (PT J) the bibtext type is set to 'article', otherwise it is set to 'misc'. The ID of the entry is the WOS number and all the Record’s fields are given as entries with their long names.

Note This is not meant to be used directly with LaTeX none of the special characters have been escaped and there are a large number of unnecessary fields provided. niceID and maxLength have been provided to make conversions easier.

Note Record entries that are lists have their values seperated with the string ' and '

Parameters

maxLength : optional [int]

default 1000, The max length for a continuous string. Most bibTex implementation only allow string to be up to 1000 characters (source), this splits them up into substrings then uses the native string concatenation (the '#' character) to allow for longer strings

WOSMode : optional [bool]

default False, if True the data produced will be unprocessed and use double curly braces. This is the style WOS produces bib files in and mostly macthes that.

restrictedOutput : optional [bool]

default False, if True the tags output will be limited to tose found in metaknowledge.commonRecordFields

niceID : optional [bool]

default True, if True the ID used will be derived from the authors, publishing date and title, if False it will be the UT tag

Returns

str

The bibTex string of the Record
copy()

Correctly copies the Record

Returns

Record

A completely decoupled copy of the original
createCitation(multiCite=False)

Creates a citation string, using the same format as other WOS citations, for the Record by reading the relevant special tags ('year', 'J9', 'volume', 'beginningPage', 'DOI') and using it to create a Citation object.

Parameters

multiCite : optional [bool]

Default False, if True a tuple of Citations is returned with each having a different one of the records authors as the author

Returns

Citation

A Citation object containing a citation for the Record.
encoding()

An abstractmethod, gives the encoding string of the record.

Returns

str

The encoding
get(tag, default=None, raw=False)

Allows access to the raw values or is an Exception safe wrapper to __getitem__.

Parameters

tag : str

The requested tag

default : optional [Object]

Default None, the object returned when tag is not found

raw : optional [bool]

Default False, if True the unprocessed value of tag is returned

Returns

Object

The processed value of tag or default
static getAltName(tag)

An abstractmethod, gives the alternate name of tag or None

Parameters

tag : str

The requested tag

Returns

str

The alternate name of tag or None
getCitations(field=None, values=None, pandasFriendly=True)

Creates a pandas ready dict with each row a different citation and columns containing the original string, year, journal and author’s name.

There are also options to filter the output citations with field and values

Parameters

field : optional str

Default None, if given all citations missing the named field will be dropped.

values : optional str or list[str]

Default None, if field is also given only those citations with one of the strings given in values will be included.

e.g. to get only citations from 1990 or 1991: field = year, values = [1991, 1990]

pandasFriendly : optional bool

Default True, if False a list of the citations will be returned instead of the more complicated pandas dict

Returns

dict

A pandas ready dict with all the citations
id
items(raw=False)

Like items for dicts but with a raw option

Parameters

raw : optional [bool]

Default False, if True the KeysView contains the raw values as the values

Returns

KeysView

The key-value pairs of the record
keys() → a set-like object providing a view on D's keys
sourceFile
sourceLine
specialFuncs(key)

An abstractmethod, process the special tag, key using the whole Record

Parameters

key : str

One of the special tags: 'authorsFull', 'keywords', 'grants', 'j9', 'authorsShort', 'volume', 'selfCitation', 'citations', 'address', 'abstract', 'title', 'month', 'year', 'journal', 'beginningPage' and 'DOI'

Returns

The processed value of key
subDict(tags, raw=False)

Creates a dict of values of tags from the Record. The tags are the keys and the values are the values. If the tag is missing the value will be None.

Parameters

tags : list[str]

The list of tags requested

raw : optional [bool]

default False if True the retuned values of the dict will be unprocessed

Returns

dict

A dictionary with the keys tags and the values from the record
static tagProcessingFunc(tag)

An abstractmethod, gives the function for processing tag

Parameters

tag : optional [str]

The tag in need of processing

Returns

function

The function to process the raw tag
title
values(raw=False)

Like values for dicts but with a raw option

Parameters

raw : optional [bool]

Default False, if True the ValuesView contains the raw values

Returns

ValuesView

The values of the record
writeRecord(f)

This is nearly identical to the original the FAU tag is the only tag not writen in the same place, doing so would require changing the parser and lots of extra logic.

metaknowledge.medline.recordMedline.medlineRecordParser(record)

The parser `MedlineRecord <../classes/MedlineRecord.html#metaknowledge.medline.MedlineRecord>`__ use. This takes an entry from medlineParser() and parses it a part of the creation of a MedlineRecord.

Parameters

record : enumerate object

a file wrapped by enumerate()

Returns

collections.OrderedDict

An ordered dictionary of the key-vaue pairs in the entry