pgb_utils.bigquery¶
The bigquery
module facilitates querying Pitt-Google Broker’s
BigQuery databases and reading the results.
- pgb_utils.pgb_utils.bigquery.check_history_column_names(columns)[source]¶
Make sure user-submitted column names are appropriate to query object histories.
- Return type
Union
[List
[str
],bool
]
- pgb_utils.pgb_utils.bigquery.cone_search(center, radius, columns, objectIds=None, format='pandas', iterator=False, dry_run=True)[source]¶
Perform a cone search on the alerts database and return object histories. This uses the coordinates of the most recent observation to determine whether an object is within the cone.
- Parameters
center (
SkyCoord
) – Center of the cone to search within.radius (
Angle
) – Radius of the cone to search within.columns (
List
[str
]) – Names of history columns to select from the alerts table. The ‘objectId’ and ‘candid’ columns are automatically included and do not need to be in this list.objectIds (
Optional
[list
]) – IDs of ZTF objects to include in the query.format (
str
) – One of ‘pandas’, or ‘json’. Query results will be returned in this format. Duplicate observations are dropped.iterator (
bool
) – If True, iterate over the objects and return one at a time. Else return the full query results together.dry_run (
bool
) – If True, pgb.bigquery.dry_run will be called first and the user will be asked to confirm before continuing.
- Returns: Query results in the requested format. If iterator is True,
returns a generator; else all results are returned together.
- Return type
Union
[str
,DataFrame
,Generator
[Union
[str
,DataFrame
],None
,None
]]
- pgb_utils.pgb_utils.bigquery.create_client(project_id)[source]¶
Open a BigQuery Client.
- Parameters
project_id (
str
) – User’s Google Cloud Platform project ID
- pgb_utils.pgb_utils.bigquery.dry_run(query, notify=True)[source]¶
Perform a dry run to find out how many bytes the query will process. :type query:
str
:param query: SQL query statement
- pgb_utils.pgb_utils.bigquery.format_history_query_results(query_job=None, row=None, format='pandas')[source]¶
Converts the results of a BigQuery query to the desired format. Must pass either query_job or row. Any duplicate observations will be dropped.
- Parameters
query_job (
Optional
[QueryJob
]) – Results from a object history query job. SQL statement needed to create the job can be obtained with object_history_sql_statement(). Must supply either query_job or row.row (
Optional
[Row
]) – A single row from query_job. Must supply either row or query_job.format (
str
) – One of ‘pandas’ or ‘json’. Input query results will be returned in this format.
- Returns
Input query results converted to requested format
- Return type
histories
- pgb_utils.pgb_utils.bigquery.get_dataset_table_names(dataset='ztf_alerts')[source]¶
- Parameters
dataset (
str
) – Name of the BigQuery dataset.- Returns
List of table names in the dataset.
- Return type
tables
- pgb_utils.pgb_utils.bigquery.get_history_column_names()[source]¶
- Returns
Column names appropriate for querying object histories.
- Return type
historycols
- Note: It would be convenient to also return the column descriptions, but
that is more complicated, and this function will be completely obsolete if we change the database structure to store only the “candidate” observation and metadata.
- pgb_utils.pgb_utils.bigquery.get_table_info(table='all', dataset='ztf_alerts')[source]¶
Retrieves and prints BigQuery table schemas.
- Parameters
table (
Union
[str
,list
]) – Name of the BigQuery table or list of the same. ‘all’ will print the info for all tables in the dataset.dataset (
str
) – Name of BigQuery dataset that the table(s) belong to.
- pgb_utils.pgb_utils.bigquery.get_table_schema(table, dataset='ztf_alerts')[source]¶
Retrieves information about the columns in a BigQuery table and returns it as a DataFrame.
- Parameters
table (
str
) – Name of the BigQuery tabledataset (
str
) – Name of BigQuery dataset that the table(s) belong to.
- Returns
df: Column information from the BigQuery table schema.
- Return type
DataFrame
- pgb_utils.pgb_utils.bigquery.object_history_sql_statement(columns, objectIds=None, limit=None)[source]¶
Convience function that generates the SQL string needed to query the alerts table and aggregate data by objectId. When the resulting SQL query is executed, the query job will contain one row for each objectId, with the object’s data aggregated into arrays (one array per column in columns) ordered by the observation date.
Note: Arrays may contain duplicated observations; it is the user’s responsiblity to clean them.
- Parameters
columns (
List
[str
]) – Names of columns to select from the alerts table. The ‘objectId’ and ‘candid’ columns are automatically included and do not need to be in this list.objectIds (
Optional
[list
]) – IDs of ZTF objects to include in the query.limit (
Optional
[int
]) – Maximum number of rows to be returned.
- Returns
SQL statement to query the alerts table and aggregate data by objectId.
- Return type
query
- pgb_utils.pgb_utils.bigquery.object_is_in_cone(object, center, radius)[source]¶
Checks whether the object’s most recent observation has a position that is within a cone defined by center and radius.
- Parameters
object (
DataFrame
) – DataFrame containing the history of a single objectId. Required columns: [‘jd’,’ra’,’dec’]center (
SkyCoord
) – Center of the cone to search within.radius (
Angle
) – Radius of the cone to search within.
- Returns
True if object is within radius of center, else False
- Return type
in_cone
- pgb_utils.pgb_utils.bigquery.query_objects(columns, objectIds=None, limit=None, format='pandas', iterator=False, dry_run=True)[source]¶
Query the alerts database for object histories.
- Parameters
columns (
List
[str
]) – Names of columns to select from the alerts table. The ‘objectId’ and ‘candid’ columns are automatically included and do not need to be in this list.objectIds (
Optional
[list
]) – IDs of ZTF objects to include in the query.limit (
Optional
[int
]) – Limit the number of objects returned to N <= limit.format (
str
) – One of ‘pandas’, ‘json’, or ‘query_job’. Query results will be returned in this format. Results returned as ‘query_job’ may contain duplicate observations; else duplicates are dropped.iterator (
bool
) – If True, iterate over the objects and return one at a time. Else return the full query results together. This parameter is ignored if format == ‘query_job’.dry_run (
bool
) – If True, pgb.bigquery.dry_run will be called first and the user will be asked to confirm before continuing.
- Returns: Query results in the requested format. If iterator is True,
yields one object at a time; else all results are returned together.
- Return type
Union
[str
,DataFrame
,QueryJob
,Generator
[Union
[str
,DataFrame
],None
,None
]]