v0.7.1¶
Starts a system to track metadata for provenance and performance benchmarking.
Adds custom metadata attributes to messages in all Pub/Sub streams, except “alerts”.
Adds a script to Night Conductor that processes the Pub/Sub streams each morning, extracts the metadata, and stores it in BigQuery.
Updates broker_utils with tools to do the above.
See PR #72
Working notes:
process_streams.md (setup/test the processing script)
Test the changes¶
For the results of the tests, see:
Code used to create and run the broker testing instance¶
Create/delete a broker testing instance
# get the code
git clone https://github.com/mwvgroup/Pitt-Google-Broker
cd Pitt-Google-Broker
git checkout v/0.7.1/tjr
cd broker/setup_broker
# create/delete the instance
# survey="decat"
survey="ztf"
testid="v071"
teardown="False"
# teardown="True"
./setup_broker.sh "$testid" "$teardown" "$survey"
# name some things
consumerVM="${survey}-consumer-${testid}"
nconductVM="${survey}-night-conductor-${testid}"
# https://cloud.google.com/compute/vm-instance-pricing
# https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#e2_shared-core_custom_machine_types
f1="f1-micro" # 1 vcpu (0.2), 0.6 GB memory
g1="g1-small" # 1 vcpu (0.5), 1.7 GB memory
e2m="e2-medium" # 2 vcpu (1), 4 GB memory
e22="e2-standard-2" # 2 vcpu, 8 GB memory
n24="n2-standard-4" # 4 vcpu, 16 GB memory
cpu="micro" # for custom type
mem="10GB" # for custom type
custom="--custom-vm-type=e2 --custom-cpu=small --custom-memory=4GB" # 2 vcpu (0.5), 4 GB memory
# change machine types. after the installs are done and the machines are off
gcloud compute instances set-machine-type $consumerVM --machine-type $g1
# gcloud compute instances set-machine-type $nconductVM --machine-type $e22
gcloud compute instances set-machine-type $nconductVM --custom-vm-type=e2 --custom-cpu=small --custom-memory=4GB
Start the broker
topic="${survey}-cue_night_conductor-${testid}"
cue=START
attr=KAFKA_TOPIC=NONE
# attr=topic_date=20210820
gcloud pubsub topics publish "$topic" --message="$cue" --attribute="$attr"
Run the consumer simulator long enough to get alerts in every counter
from broker_utils import consumer_sim
testid = 'v071'
survey = 'ztf'
instance = (survey, testid)
# alert_rate = (25, 'once')
alert_rate = 'ztf-active-avg'
runtime = (20, 'min') # options: 'sec', 'min', 'hr', 'night'(=10 hrs)
consumer_sim.publish_stream(alert_rate, instance, runtime)
Stop the broker, which triggers night conductor to shut everything down and process the streams.
topic="${survey}-cue_night_conductor-${testid}"
cue=END
gcloud pubsub topics publish "$topic" --message="$cue"