docs/source/working-notes/troyraen/metadata-tracking.md
Metadata Tracking System
todo
[ ] candid should be stored as an int, but currently using string f”{message_id}_unknown” for
alerts
stream messages that can’t be matched to a candid.
Testing pieces
export GCP_PROJECT=$GOOGLE_CLOUD_PROJECT
export SURVEY=ztf
export TESTID=metatrack
cd /Users/troyraen/Documents/broker/metadata/broker/cloud_functions/ps_to_gcs
from broker_utils import data_utils, gcp_utils
import troy_fncs as troy
import main
msgs = gcp_utils.pull_pubsub('ztf-alerts-reservoir', msg_only=False)
msg = msgs[0].message
attributes = {'kafka.topic': 'ztf_yyyymmdd'}
context = {'attributes': attributes, 'event_id': '1234'}
blob, alert = main.upload_bytes_to_bucket(alert_bytes, attributes)
main.attach_file_metadata(blob, alert, context)
Broker Testing Instance
Create/delete a broker testing instance
# get the code
git clone https://github.com/mwvgroup/Pitt-Google-Broker
cd Pitt-Google-Broker
git checkout tjr/metadata_tracking
cd broker/setup_broker
# create/delete the instance
survey="ztf"
testid="metatrack"
teardown="False"
# teardown="True"
./setup_broker.sh "$testid" "$teardown" "$survey"
# name some things
consumerVM="${survey}-consumer-${testid}"
nconductVM="${survey}-night-conductor-${testid}"
# https://cloud.google.com/compute/vm-instance-pricing
# https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#e2_shared-core_custom_machine_types
f1="f1-micro" # 1 vcpu (0.2), 0.6 GB memory
g1="g1-small" # 1 vcpu (0.5), 1.7 GB memory
e2m="e2-medium" # 2 vcpu (1), 4 GB memory
e22="e2-standard-2" # 2 vcpu, 8 GB memory
n24="n2-standard-4" # 4 vcpu, 16 GB memory
cpu="micro" # for custom type
mem="10GB" # for custom type
custom="--custom-vm-type=e2 --custom-cpu=small --custom-memory=4GB" # 2 vcpu (0.5), 4 GB memory
# change machine types. after the installs are done and the machines are off
gcloud compute instances set-machine-type $consumerVM --machine-type g1-small
# gcloud compute instances set-machine-type $nconductVM --machine-type $e22
gcloud compute instances set-machine-type $nconductVM --custom-vm-type=e2 --custom-cpu=small --custom-memory=4GB
Run the consumer simulator long enough to get alerts in every counter
from broker_utils import consumer_sim
testid = 'metatrack'
survey = 'ztf'
instance = (survey, testid)
# alert_rate = (25, 'once')
alert_rate = 'ztf-active-avg'
runtime = (10, 'min') # options: 'sec', 'min', 'hr', 'night'(=10 hrs)
consumer_sim.publish_stream(alert_rate, instance, runtime)
Stop the broker, which triggers night conductor to shut everything down and process the streams.
topic="${survey}-cue_night_conductor-${testid}"
cue=END
gcloud pubsub topics publish "$topic" --message="$cue"