pacer_lib.scraper

search_agent

class pacer_lib.scraper.search_agent(username, password, output_path='./results', auto_login=True, wait_time=1)[source]

Returns a search_agent() object, that serves as an interface for the PACER case locator. It will query and download both dockets and documents. It is a modified requests.sessions object.

Keyword Arguments

  • username: a valid PACER username
  • password: a valid PACER password that goes with username
  • output_path: allows you to specify the relative path where you would like to save your downloads. The actual docket sheets will be saved to a subfolder within output_path, ‘/local_docket_archive/’. If the folders do not exist, they will be created.
  • auto_login: specify if you would like to login when the object is instantiated (you may want to use search_agent() to create PACER query strings).
  • wait_time: how long to wait between requests to the PACER website.
download_case_docket(case_no, court_id, other_options={'default_form': 'b', 'court_type': 'all'}, overwrite=False)[source]

Returns a list that indicates the case_no, court_id and any error. download_case_docket also writes the .html docket sheet to self.output_path (in a subfolder ‘/local_docket_archive/’. If you set overwrite*=True, it will overwrite previous dockets. Otherwise, ``download_case_docket`` will check to see if the docket has already been downloaded **before* incurring any additional search or download charges.

You can also pass additional POST requests through other_options.

download_document(case_filename, doc_no, doc_link, no_type='U', overwrite=False)[source]

Returns a list that indicates the case_name, doc_no and any error. download_case_document also writes the .pdf document to self.output_path (to the sub-folder ‘/local_document_archive/’. If you set overwrite*=True, it will overwrite previously downloaded documents. Otherwise, ``download_case_document`` will check to see if the docket has already been downloaded **before* incurring any additional search or download charges.

(To be implemented) docket_parser() assigns two types of numbers: the listed docket number (i.e., the number listed on the page) and the unique identifier (i.e., the position of the docket entry on the page). We should default to using the unique identifier, but all of the legacy files will be using the listed identifier and we will need to reassociate / convert those documents to their unique identifier.

no_type = ‘U’ –> unique identifier no_type = ‘L’ –> listed identifier

We have begun implementing this, but this is not completely finished.

Using the listed identifier should be considered legacy and not advised.

This will be dangerous in terms of redundant download protection.

Document this properly once we finish.

(Not implemented) You can also pass additional POST requests through other_options.

query_case_locator(payload)[source]

Returns a string literal of the HTML of the search results page. This function passes queries to the PACER Case Locator (https://pcl.uscourts.gov/dquery) and this is the simplest interface (you can send any key:value pairs as a POST request).

We do not recommend using this unless you want more advanced functionality.

Keyword Arguments

  • payload: key-value pairs that will be converted into a POST request.
refresh_login()[source]

Logs in to the PACER system using the login and password provided at the initialization of search_agent(). This will create a Requests session that will allow you to query the PACER system. If auto_login =False, refresh_login() must be called before you can query the case_locator. This function will raise an error if you supply an invalid login or password.

Returns nothing.

request_docket_sheet(docket_link, other_options={})[source]

Returns the HTML of the docket sheet specified by docket_link.

You can also pass additional POST requests through other_options.

request_document(case_filename, document_link, other_options={})[source]

Using a case_filename and a link to the document, this function constructs the necesssary POST data and finds the correct document URL to download the specified PDF document.

Returns binary data.

You can also pass additional POST requests through other_options.

(For version 2.1) Currently only implemented for district courts, but should eventually be implemented for bankruptcy and appellate courts.

search_case_locator(case_no, other_options={'default_form': 'b', 'court_type': 'all'})[source]

Passes a query to the PACER Case Locator and returns a list of search results (as well as error message, if applicable). Returns two objects, a list (results) and a string that indicates if there was an error.

Keyword Arguments

  • case_no: a string that represents a PACER query.
  • other_options: allows you to determine the payload sent to query_case_locator(). This is validated in search_case_locator() so that you only pass known valid POST requests. The default options are those known to be necessary to get search results.

Output Documentation Each search result is a dictionary with these keys:

  • searched_case_no
  • result_no
  • case_name
  • listed_case_no
  • court_id
  • nos
  • date_filed
  • date_closed
  • query_link

The second object returned is a string that verbosely indicates errors that occured. If the search result was found, the string is empty.

Other Functions

pacer_lib.scraper.disaggregate_docket_number(combined_docket_number)[source]

Returns a string that indicates the year of the case and the PACER-valid case_id.

Disaggregates the year from the case number when we have combined docket numbers. Combined year and case numbers are often stored as integers, but this leads to the truncation of leading zeroes. We restore these leading zeroes and then return the two-digit year of the case and the case_id. The minimum number of digits for this function is five (which assumes that the case was from 2000). If there are further truncations (e.g., ‘00-00084’ stored as ‘0000084’ and truncated to ‘84’), pre-process your case-numbers.

pacer_lib.scraper.gen_case_query(district, office, year, docket_number, type_code, district_first=True)[source]

Creates a PACER query from the district, office, year, case_id and case_type and returns a tuple of (case_id, court_id, region).

PACER case-numbers can be generated by consolidating the district, office, year, case id and case type information in a specific way. This function formats the district name and type_code correctly and then combines the case identifying information into a single PACER query.

Many other data sources list the district of the court before the state, e.g., EDNY rather than NYED. If this is not the case, turn off the district_first option.

Keyword Arguments

  • year should be either 2 digits (e.g., 00) or 4 digits (e.g., 1978).
  • case_id should be exactly 5digits
  • type code must be one of the following: civil, civ, criminal, crim, bankruptcy, bank, cv, cr, bk

Returns a tuple (case_number, court_id)

(For Version 2.1) Note: Appellate Courts have not been implemented yet.

Some of this functionality may not be necessary and should be revisited.

Specifically, year can be 2 or 4 digits and case number does not have to be exactly 5 digits (up to 5 digits). Office must be exactly 1 digit.

We could also consider including the specific sate in the output. We should also create a list of all valid courtids and check against it.