Dartmouth API Developer Portal

Filtering and Paging

The DartAPI service implements a common filtering technique across most APIs that gives a rich set of searching capabilities. Filtering allows an API consumer to scan through resources, and page through large collection results.

NOTE: The current implementation of filtering does not check if the attribute being queried on is available to the consumer. In those cases the query term is discarded without error and returns results as if the term were not entered at all. Some typical examples where this can be encountered is incorrectly specifying an attribute name because of a typo, or querying on an attribute that is not available under the current security scopes. A future version of the filtering will perform validation on the query string inputs to raise error conditions in these situations.

If the result of a filter request is expected to contain a large number of objects, you should be prepared to make multiple requests to the API, returning pages of objects. Results from a filter request (called the found set) are temporarily cached so that you may request batches of objects as pages from that same found set over a period of time. Currently, the result set is cached for 12 hours.

The typical sequence used when working with subsets is to submit your filter request and then make subsequent requests for different pages of results from the found set. The first request to filter returns with two response headers set to faciliate this paging. See the description for Filter Request below.

The number of objects returned in a single request is specified with the pagesize parameter. The default pagesize is 100.

To ensure that subsequent requests for pages are returned from your found set, you must provide the continuation_key parameter with the value returned as the X-Request-ID header in the first filtering call to the API.

See the examples below for the process of requesting a filtered list of people and retrieving those results via multiple requests.

NOTE: By default all searching is case-sensitive, however it is easy to turn on case insensitive searching by using the * wildcard character in the search term.

Multiple filter attributes are logically ANDed together when matching the filters, for example:

affiliations.name=Student&account_status=Active

returns the subset of all People who have Student in their affiliations attribute AND an account_status attribute value of 'Active'.

Query Parameter Syntax

URI query parameters support the following operators; operators that support wildcard characters can cause comparisons to be case-insensitive:

Operator Description Wildcard Support Example Returns
= equal Yes ?last_name=Smith records with last_name equal to 'Smith' (case-sensitive)
?name=*smith records with name ending in smith (case-insensitive)
=! not equal No ?last_name=!Smith records with last_name not equal to 'Smith'
=| equal OR list Yes ?affiliations.name=|Staff,Faculty records with either 'Staff' or 'Faculty' in affiliations
?netid=|D35000G*,F00002S* records with netid 'd35000g' or 'f00002s' (case-insensitive)
=^ equal AND list Yes ?affiliations.name=^Staff,Student records with both 'Staff' and 'Student' in affiliations
?affiliations.name=^staff*,student* records with both 'Staff' and 'Student' in affiliations (case-insensitive)
=> greater than No ?last_name=>Smith records with last_name greater than 'Smith'
=< less than No ?last_name=<Smith records with last_name less than 'Smith'

All attributes associated with a resource (e.g., people) are supported as query parameters; dot notation is used to specify nested attributes (e.g. affiliations.name).

When muliple query parameters are specified, they are ANDed together, for example:

https://api.dartmouth.edu/api/people?affilitations.name=|Faculty,Staff&first_name=Al*&first_name=*an&last_name=|Smith,*jones*

returns records where:

Querying null values and empty arrays

Any attribute that can have a value of null can be queried for the presence or absence of null. For example, ?middle_name=null will return all records where middle_name is null; ?middle_name=!null will return all records where middle_name is not null. Note that this feature causes any search for the string value "null" to fail; however, if a wildcard is used (e.g middle_name=null*) values that contain "null" will be selected.

Any attribute that is an array can be queried for an empty or non-empty array. For example, ?dplans=[] or ?dplans=![]. Note that you cannot specify any values within the square brackets. To query an array for the presence of one or more values, =| can be used (e.g. ?affiliations=|Staff,Student). To query for the presence of all specified values, =^ can be used.

Example Filter Request

Request a subset of People to be returned matching the supplied filters:

/api/people?account_status=Active&affiliations.name=|Staff,Faculty

Parameters

Parameter Name Type Description Required
attribute_name string the attribute to filter on, nested attributes use dot notation such as telephone_numbers.data_source=sis Yes
page integer the page number to return when paging through the found set (starting at page 1) 1 thru n (default is 1) No
pagesize integer the number of objects to return on a page 1 thru n (default is 1000 and cannot be exceeded) No

Returns

The filter request returns two important HTTP response headers that will assist you in paging through large datasets:

Response Header Description Example
X-Request-ID The value used for the continuation_key query parameter in requests for more data from the found set 7f133a20-2ecd-11e8-831d-062da25625b0
X-Total-Count The number of people found via the specified filters 3826

The filter request returns an collection of objects. The details of the object will reflect the attributes allowed under the current logged in scopes.

The number of objects in this collection can vary, depending on one or more factors.

If the results of the filtering yields no objects found, an empty collection is returned. This can be confirmed by noting that the X-Total-Count response header value is 0.

If the combination of page and pagesize results in positioning the paging mechanism past the end of the found set, an empty collection is returned.

The last page of results may contain fewer than pagesize objects if the page requested is not an even multiple of pagesize.

Subsequent Page Requests

Request a subsequent page of results from the found set produced with the filter performed above. Note the use of the continuation_key query parameter containing the value returned in the X-Request-ID response header from the request above. Increment the page by one on each request until all pages have been retrieved.

Parameters

Parameter Name Type Description Required
continuation_key string the value returned in the X-Request-ID header from the Filter Request Yes
page integer the page number to return when paging through the found set (starting at page 1) 1 thru n (default is 1) No
pagesize integer the number of objects to return on a page 1 thru n (default is 1000 and cannot be exceeded) No

Full Python Example Querying People

The following example shows how to query a large result set over multiple pages. It can be executed by any API key and does not require any security scopes.


import requests
import datetime
import os
import pprint

#
# *************************************************************************************************
# * simple logging to console
# *************************************************************************************************
#

def logit(log_msg):
    print(datetime.datetime.now(),log_msg)

#
# *************************************************************************************************
# * get the jwt with any applicable scopes, check that all scopes requested are returned before
# * proceeding
# *************************************************************************************************
#

def login_jwt(login_url, api_key, scopes):
  headers={'Authorization': api_key}

  if scopes:
    url = login_url + '?scope=' + scopes
  else:
    url = login_url

  response = requests.post(url,headers=headers)
  response.raise_for_status()
  response_json = response.json()
  jwt = response_json["jwt"]
  accepted_scopes = response_json["accepted_scopes"]

  logit("accepted scopes="+str(accepted_scopes))

  if scopes:
    for scope in scopes.split(' '):
      if scope not in accepted_scopes:
        raise Exception('A requested scope '+scope+' is not in the set of accepted scopes.')

  return jwt

#
# *************************************************************************************************
# * This function will call the People API with a query string (or null string which gets all
# * people) and returns an array of results.  The function will page through the api result
# * set 1000 records per page.
# *************************************************************************************************
#
def get_people_by_query(jwt, people_url, query_string):

  headers={'Authorization': 'Bearer '+jwt,'Content-Type':'application/json'}

  # set up initial values. Note that Dartmouth APIs only return 1000 results per page max
  page_size = 1000
  page_number = 1

  headers={'Authorization': 'Bearer '+jwt,'Content-Type':'application/json'}

  done = False
  people = []
  continuation_key = None

  logit("get_people_by_query_string...")

  while not done:
    # first time through the loop, we supply the query string and paging parameters
    # on subsequent requests we supply the continuation key and increment the page number
    if page_number == 1:
      url = people_url + "?"+ query_string + "&pagesize="+str(page_size)+"&page="+str(page_number)
    else:
      url = people_url + "?continuation_key="+continuation_key+"&pagesize="+str(page_size)+"&page="+str(page_number)

    logit("calling people get with url="+url)

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    # on the first page of results get the header values of "x-request-id" and "x-total-count"
    # from the returned payload.  The x-request-id must be used on subsequent pages as a
    # query parameter to keep this request separate from others.  The x-total-count can be
    # used as a sanity check on the total result set retrieved
    if page_number == 1:
      continuation_key = response.headers.get("x-request-id")
      total_count = int(response.headers.get("x-total-count"))

      logit("x-total-count="+str(total_count))
      logit("x-request-id="+continuation_key)
    response_list = response.json()

    for i in range(len(response_list)):
      people.append(response_list[i])

    page_number = page_number + 1
    if len(response_list) == 0:
      done = True
  # end while

  if len(people) != total_count:
    raise Exception("Number of people records retrieved ("+str(len(people))+") do not match total number in payload header "+str(total_count))

  return people
#
# *************************************************************************************************
# * Main Program
# *************************************************************************************************
#
api_key = os.environ.get('API_KEY')
api_base_url = os.environ.get('API_BASE_URL')

people_url = api_base_url+"/people"
login_url = api_base_url+"/jwt"

# get a jwt for logging resource changes
logit("acquiring dartapi JWT...")
dartapi_jwt = login_jwt(login_url,api_key,None)

# get all people whose name starts with S
people = get_people_by_query(dartapi_jwt,people_url,"name=S*")

logit("number of people returned = " + str(len(people)))