pgb_utils.bigquery¶

The bigquery module facilitates querying Pitt-Google Broker’s BigQuery databases and reading the results.

pgb_utils.pgb_utils.bigquery.check_history_column_names(columns)[source]¶

Make sure user-submitted column names are appropriate to query object histories.

Return type: Union[List[str], bool]

pgb_utils.pgb_utils.bigquery.cone_search(center, radius, columns, objectIds=None, format='pandas', iterator=False, dry_run=True)[source]¶

Perform a cone search on the alerts database and return object histories. This uses the coordinates of the most recent observation to determine whether an object is within the cone.

Parameters

center (SkyCoord) – Center of the cone to search within.
radius (Angle) – Radius of the cone to search within.
columns (List[str]) – Names of history columns to select from the alerts table. The ‘objectId’ and ‘candid’ columns are automatically included and do not need to be in this list.
objectIds (Optional[list]) – IDs of ZTF objects to include in the query.
format (str) – One of ‘pandas’, or ‘json’. Query results will be returned in this format. Duplicate observations are dropped.
iterator (bool) – If True, iterate over the objects and return one at a time. Else return the full query results together.
dry_run (bool) – If True, pgb.bigquery.dry_run will be called first and the user will be asked to confirm before continuing.

Returns: Query results in the requested format. If iterator is True,: returns a generator; else all results are returned together.

Return type: Union[str, DataFrame, Generator[Union[str, DataFrame], None, None]]

pgb_utils.pgb_utils.bigquery.create_client(project_id)[source]¶

Open a BigQuery Client.

Parameters: project_id (str) – User’s Google Cloud Platform project ID

pgb_utils.pgb_utils.bigquery.dry_run(query, notify=True)[source]¶: Perform a dry run to find out how many bytes the query will process. :type query: str :param query: SQL query statement

pgb_utils.pgb_utils.bigquery.format_history_query_results(query_job=None, row=None, format='pandas')[source]¶

Converts the results of a BigQuery query to the desired format. Must pass either query_job or row. Any duplicate observations will be dropped.

Parameters

query_job (Optional[QueryJob]) – Results from a object history query job. SQL statement needed to create the job can be obtained with object_history_sql_statement(). Must supply either query_job or row.
row (Optional[Row]) – A single row from query_job. Must supply either row or query_job.
format (str) – One of ‘pandas’ or ‘json’. Input query results will be returned in this format.

Returns

Input query results converted to requested format

Return type

histories

pgb_utils.pgb_utils.bigquery.get_dataset_table_names(dataset='ztf_alerts')[source]¶

Parameters: dataset (str) – Name of the BigQuery dataset.
Returns: List of table names in the dataset.
Return type: tables

pgb_utils.pgb_utils.bigquery.get_history_column_names()[source]¶

Returns: Column names appropriate for querying object histories.
Return type: historycols

Note: It would be convenient to also return the column descriptions, but: that is more complicated, and this function will be completely obsolete if we change the database structure to store only the “candidate” observation and metadata.

pgb_utils.pgb_utils.bigquery.get_table_info(table='all', dataset='ztf_alerts')[source]¶

Retrieves and prints BigQuery table schemas.

Parameters

table (Union[str, list]) – Name of the BigQuery table or list of the same. ‘all’ will print the info for all tables in the dataset.
dataset (str) – Name of BigQuery dataset that the table(s) belong to.

pgb_utils.pgb_utils.bigquery.get_table_schema(table, dataset='ztf_alerts')[source]¶

Retrieves information about the columns in a BigQuery table and returns it as a DataFrame.

Parameters

table (str) – Name of the BigQuery table
dataset (str) – Name of BigQuery dataset that the table(s) belong to.

Returns: df: Column information from the BigQuery table schema.

Return type: DataFrame

pgb_utils.pgb_utils.bigquery.object_history_sql_statement(columns, objectIds=None, limit=None)[source]¶

Convience function that generates the SQL string needed to query the alerts table and aggregate data by objectId. When the resulting SQL query is executed, the query job will contain one row for each objectId, with the object’s data aggregated into arrays (one array per column in columns) ordered by the observation date.

Note: Arrays may contain duplicated observations; it is the user’s responsiblity to clean them.

Parameters

columns (List[str]) – Names of columns to select from the alerts table. The ‘objectId’ and ‘candid’ columns are automatically included and do not need to be in this list.
objectIds (Optional[list]) – IDs of ZTF objects to include in the query.
limit (Optional[int]) – Maximum number of rows to be returned.

Returns

SQL statement to query the alerts table and aggregate data by objectId.

Return type

query

pgb_utils.pgb_utils.bigquery.object_is_in_cone(object, center, radius)[source]¶

Checks whether the object’s most recent observation has a position that is within a cone defined by center and radius.

Parameters

object (DataFrame) – DataFrame containing the history of a single objectId. Required columns: [‘jd’,’ra’,’dec’]
center (SkyCoord) – Center of the cone to search within.
radius (Angle) – Radius of the cone to search within.

Returns

True if object is within radius of center, else False

Return type

in_cone

pgb_utils.pgb_utils.bigquery.query_objects(columns, objectIds=None, limit=None, format='pandas', iterator=False, dry_run=True)[source]¶

Query the alerts database for object histories.

Parameters

columns (List[str]) – Names of columns to select from the alerts table. The ‘objectId’ and ‘candid’ columns are automatically included and do not need to be in this list.
objectIds (Optional[list]) – IDs of ZTF objects to include in the query.
limit (Optional[int]) – Limit the number of objects returned to N <= limit.
format (str) – One of ‘pandas’, ‘json’, or ‘query_job’. Query results will be returned in this format. Results returned as ‘query_job’ may contain duplicate observations; else duplicates are dropped.
iterator (bool) – If True, iterate over the objects and return one at a time. Else return the full query results together. This parameter is ignored if format == ‘query_job’.
dry_run (bool) – If True, pgb.bigquery.dry_run will be called first and the user will be asked to confirm before continuing.

Returns: Query results in the requested format. If iterator is True,: yields one object at a time; else all results are returned together.

Return type: Union[str, DataFrame, QueryJob, Generator[Union[str, DataFrame], None, None]]