docs/source/working-notes/troyraen/metadata-tracking.md

Metadata Tracking System

todo

  • [ ] candid should be stored as an int, but currently using string f”{message_id}_unknown” for alerts stream messages that can’t be matched to a candid.

Testing pieces

export GCP_PROJECT=$GOOGLE_CLOUD_PROJECT
export SURVEY=ztf
export TESTID=metatrack
cd /Users/troyraen/Documents/broker/metadata/broker/cloud_functions/ps_to_gcs
from broker_utils import data_utils, gcp_utils
import troy_fncs as troy
import main


msgs = gcp_utils.pull_pubsub('ztf-alerts-reservoir', msg_only=False)
msg = msgs[0].message
attributes = {'kafka.topic': 'ztf_yyyymmdd'}
context = {'attributes': attributes, 'event_id': '1234'}

blob, alert = main.upload_bytes_to_bucket(alert_bytes, attributes)
main.attach_file_metadata(blob, alert, context)

Broker Testing Instance

Create/delete a broker testing instance

# get the code
git clone https://github.com/mwvgroup/Pitt-Google-Broker
cd Pitt-Google-Broker
git checkout tjr/metadata_tracking
cd broker/setup_broker

# create/delete the instance
survey="ztf"
testid="metatrack"
teardown="False"
# teardown="True"
./setup_broker.sh "$testid" "$teardown" "$survey"


# name some things
consumerVM="${survey}-consumer-${testid}"
nconductVM="${survey}-night-conductor-${testid}"

# https://cloud.google.com/compute/vm-instance-pricing
# https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#e2_shared-core_custom_machine_types
f1="f1-micro"  # 1 vcpu (0.2), 0.6 GB memory
g1="g1-small"  # 1 vcpu (0.5), 1.7 GB memory
e2m="e2-medium"  # 2 vcpu (1), 4 GB memory
e22="e2-standard-2"  # 2 vcpu, 8 GB memory
n24="n2-standard-4"  # 4 vcpu, 16 GB memory
cpu="micro"  # for custom type
mem="10GB"  # for custom type
custom="--custom-vm-type=e2 --custom-cpu=small --custom-memory=4GB" # 2 vcpu (0.5), 4 GB memory

# change machine types. after the installs are done and the machines are off
gcloud compute instances set-machine-type $consumerVM --machine-type g1-small
# gcloud compute instances set-machine-type $nconductVM --machine-type $e22
gcloud compute instances set-machine-type $nconductVM --custom-vm-type=e2 --custom-cpu=small --custom-memory=4GB

Run the consumer simulator long enough to get alerts in every counter

from broker_utils import consumer_sim

testid = 'metatrack'
survey = 'ztf'
instance = (survey, testid)
# alert_rate = (25, 'once')
alert_rate = 'ztf-active-avg'
runtime = (10, 'min')  # options: 'sec', 'min', 'hr', 'night'(=10 hrs)

consumer_sim.publish_stream(alert_rate, instance, runtime)

Stop the broker, which triggers night conductor to shut everything down and process the streams.

topic="${survey}-cue_night_conductor-${testid}"
cue=END
gcloud pubsub topics publish "$topic" --message="$cue"