Usage (API)#

class EntrezAPI(tool, email, api_key=None, return_type='json', minimal_interval=0.334, timeout=10, server='https://eutils.ncbi.nlm.nih.gov/entrez/eutils/')#
Parameters:
  • tool (str) – Name of application making the E-utility call, for NCBI internal tracking. Value must be a string with no internal spaces. Specify this value to correspond with the name of the end-use application. For example, the biopython library just uses “biopython” as the tool name, see Bio.Entrez. Please see the Frequency, Timing and Registration of E-utility URL Requests section of Entrez Programming Utilities Help for more information on this parameter.

  • email (str) – E-mail address of the E-utility user, for NCBI internal tracking/communications. Value must be a string with no internal spaces, and should be a valid e-mail address. Please see the Frequency, Timing and Registration of E-utility URL Requests section of Entrez Programming Utilities Help for more information on this parameter.

  • api_key (Optional[str]) – Since December 1st 2018, NCBI began enforcing the practice of using an API key for users that post more than 3 requests per second. Please see the API Keys section of Entrez Programming Utilities Help for a full discussion of this policy.

  • return_type (Literal['json', 'xml']) – Retrieval type. Determines the format of the returned output.

  • minimal_interval (float) – The time interval (seconds) to be enforced between consecutive requests; by default slightly over 1/3 of a second to comply with the Entrez guidelines, but you may increase it if you want to be kind to others, or decrease it if you have an API key with an appropriate consent from Entrez.

  • timeout (float) – The timeout in seconds (default 10 seconds).

  • server (str) – The server address.

fetch(ids, max_results, database='pubmed', return_type='xml', ignore_max_results_limit=False)#

Note: FetchQuery enforces xml as a default return_type as JSON is not properly implemented by the eutilis server.

Functionality:
  • Returns formatted data records for a list of input UIDs

Parameters:
  • database (EntrezDatabaseType) – Database from which to retrieve records. Value must be a valid E-utility database name (default = 'pubmed'). Currently EFetch does not support all Entrez databases. Please see Table 1 for a list of available databases.

  • ids (List[str]) – UID list. Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by database

  • max_results (int) – maximal number of results to return

Supports batch mode, see in_batches_of().

find_citations(citations, database='pubmed')#

Note: enforces xml as it is the only supported return_type for the citation endpoint.

Functionality:
  • Retrieves PubMed IDs (PMIDs) that correspond to a set of input citations

Parameters:
  • database – Database to search. The only supported value is ‘pubmed’.

  • citations (List[Citation]) – Input citations (dictionaries complying the with the Citation interface).

Examples

Check PMIDs for two citations

>>> entrez_api.find_citations(database='pubmed', citations=[{'journal': 'proc natl acad sci u s a', 'year': 1991, 'volume': 88, 'first_page': 3248, 'author': 'mann bj', 'key': 'Art1'}, {'journal': 'science', 'year': 1987, 'volume': 235, 'first_page': 182, 'author': 'palmenberg ac', 'key': 'Art2'}], return_type='xml')
get_info(database=None)#
Functionality:
  • Provides a list of the names of all valid Entrez databases

  • Provides statistics for a single database, including lists of indexing fields and available link names

Parameters:

database – if not provided, will return a list of the names of all valid Entrez databases.

in_batches_of(size=100, sleep_interval=3)#
Functionality:
  • Returns UIDs linked to an input set of UIDs in either the same or a different Entrez database

  • Returns UIDs linked to other UIDs in the same Entrez database that match an Entrez query

  • Checks for the existence of Entrez links for a set of UIDs within the same database

  • Lists the available links for a UID

  • Lists LinkOut URLs and attributes for a set of UIDs

  • Lists hyperlinks to primary LinkOut providers for a set of UIDs

  • Creates hyperlinks to the primary LinkOut provider for a single UID

Parameters:
  • database (EntrezDatabaseType) – Database to search. Value must be a valid E-utility database name (default = 'pubmed'). This is the destination database for the link operation.

  • database_from (EntrezDatabaseType) – Database to search. Value must be a valid E-utility database name (default = 'pubmed'). This is the origin database of the link operation. If database and database_from are set to the same database value, then ELink will return computational neighbors within that database. Please see the full list of Entrez links for available computational neighbors. Computational neighbors have linknames that begin with dbname_dbname (examples: protein_protein, pcassay_pcassay_activityneighbor).

  • ids (List[str]) – UID list. Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by database_from

  • command (CommandType) – ELink command mode. The command mode specifies which function ELink will perform.

Examples

Link from protein to gene

>>> entrez_api.link(database='gene', ids=[15718680, 157427902], database_from='protein', command='neighbor')

Find articles related to PMID 20210808

>>> entrez_api.link(database='pubmed', ids=[20210808], database_from='pubmed', command='neighbor_score')

List all possible links from two protein GIs

>>> entrez_api.link(database=None, ids=[15718680, 157427902], database_from='protein', command='acheck')

List all possible links from two protein GIs to PubMed

>>> entrez_api.link(database='pubmed', ids=[15718680, 157427902], database_from='protein', command='acheck')

Supports batch mode, see in_batches_of().

search(term, max_results, database='pubmed', min_date=None, max_date=None, ignore_max_results_limit=False)#
Functionality:
  • Provides a list of UIDs matching a text query

  • Posts the results of a search on the History server

  • Downloads all UIDs from a dataset stored on the History server

  • Combines or limits UID datasets stored on the History server

  • Sorts sets of UIDs

Parameters:
  • database (EntrezDatabaseType) – Database to search. Value must be a valid E-utility database name (default = 'pubmed').

  • term (Union[str, dict]) – Entrez text query

  • max_results (int) – Maximal number of results to return. Limited to 10’000, following the eUtils documentation.

  • ignore_max_results_limit (bool) – Ignore the upper limit placed on max_results. Experimentation has shown that some databases allow for higher limits, but as this is not documented, setting higher limits needs to be explicitly enabled here. Use at your own risk of hard to predict errors.

Examples

Find articles about human cancers

>>> entrez_api.search(database='pubmed', term='cancer AND human[organism]', max_results=10000, ignore_max_results_limit=False)

Search PubMed Central for free full text articles containing the query stem cells

>>> entrez_api.search(database='pmc', term='stem cells AND free fulltext[filter]', max_results=10000, ignore_max_results_limit=False)
summarize(ids, max_results, database='pubmed', ignore_max_results_limit=False)#
Functionality:
  • Returns document summaries (DocSums) for a list of input UIDs

Parameters:
  • database (EntrezDatabaseType) – Database from which to retrieve DocSums. Value must be a valid E-utility database name (default = 'pubmed').

  • ids (List[str]) – UID list. Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by database. There is no set maximum for the number of UIDs that can be passed to ESummary. To comply with the recommendation of using HTTP POST method if lists of UIDs for ESummary is long, the method is by default set to post.

  • max_results (int) – Maximal number of results to return. Limited to 10’000, following the eUtils documentation.

  • ignore_max_results_limit (bool) – Ignore the upper limit placed on max_results. Experimentation has shown that some databases allow for higher limits, but as this is not documented, setting higher limits needs to be explicitly enabled here. Use at your own risk of hard to predict errors.

Supports batch mode, see in_batches_of().

class EntrezResponse(query, response, api)#

The wrapper around the Entrez response.

property content_type#
property data#
is_response_for(response, query)#

Determine if response is for given type of query.

Return type:

TypeGuard[EntrezResponse[Element, TypeVar(EntrezQueryT, bound= EntrezQuery)]]

is_xml_response(response)#

Determine if response is XML.

Return type:

TypeGuard[EntrezResponse[Element, TypeVar(EntrezQueryT, bound= EntrezQuery)]]