desc_broker_workshop.md

DESC Broker Workshop

This document provides notes on the LSST-DESC Broker Workshop. Link to Talks

Thursday Session 1: Science Vision and Precursor Surveys

Lessons learned from microlensing follow-up (Rachel Street)

Microlensing requires rapid followup (Usually < 10 alerts per day)
You need redundancy in followup locations to deal with weather and down time
Data should be shared in real time with clear publication guidelines
- Clear policy from each team over use of their telescope time/data
- Clear procedure if someone is interested in analyzing/publishing on a target/topic
- Clearly identify who should be notified and when
- Incentive “playing nice”, penalize those who don’t
Don’t just build systems in advance - train people to use them in advance
Avoid gold rush syndrome. Balance detailed follow up with a small number of targets with broader followup of many targets

Black Hole Microlensing with Parallax (Nathan Golovich @ LLNL)

Discussed why this topic is scientifically interesting
How will the alert system handel long time scale signals with weak signal to noise?
How will the catalogue / broker handel objects that have large parallax?
Understanding the optimal cadence for different objects is important for creating an observing strategy.

Microlensing with ZTF: Breaking in Spin with Brokers (Michael Medford @ LBNL)

ZTF Alert Packet Inspection Tool
Capture
- Ingest ZTF alert packet data through Kafka consumer
- Cross-match detections into long duration light-curves
Detect
- Regularly filter for ongoing and completed microlensing events
- Remove false positives, mainly variable stars
Characterize
- Fit microlensing events to model parameters
- Calculate optical depths, Einstein crossing time distributions, and dark matter constraints
Existing brokers have specific science cases in mind that don’t always scale to the general community.
NERSC is capable of processing the data from 47 deg2 every 30 seconds in realtime (~O(105) per night)
- Host of complimentary services including HPSS tape archive, science gateways, NERSC web Toolkit (NEWT) HTML API, Spin, etc.
ZAPIT v0.5 is currently running on NERSC’s Spin pilot phase
- Containers-as-a-Service platform based on Docker container technology
- All the convenient benefits of docker (DBs, web service, scalability) and all of the computational firepower of NERSC

The ZTF Coadd Facility (Danny Goldstein @ Caltech)

Combines images from multiple observations to create deeper stacks.
Reduces functional cadence, but increases the number of discovered objects

Thursday Session 2: LSST Prompt Processing Data Products

LSST Prompt Data Products (Melissa Graham @ U. of Washington)

Online LSST forum for DM here. See ls.st/dmtn-102 for alert stream numbers.
Alert production process:
- New image reduced and calibrated
- Difference image created
- Source detection on difference image
- Source association (by coordinate) and characterization
- Alert packet assembled for SNR>5 detections
- Alert sent to community
Alert packet contents for 60s releases:
- Difference image source parameters
  - ID, coordinate, flux, shape, SNR, association with static and moving cataloges
- Difference image object parameters
  - ~12 month history of proper motion, parallax, mean flux, variability parameters, and ID for the latest Data Release deep stacks
- Image stamps (FITS)
  - At least 6” by 6” with flux, variance, and mask extensions
  - Includes WCS, zero point, PSF, etc.
You can request larger images than the postage stamp but there will be up to a 24hr delay waiting for the daily data release.
May or may not have CCid information.
Publication times start when image readout ends and broker access is allowed to initiate.
LSST has its own basic filtering service so that users can submit basic queries (eg. SQL)
- No cross-matching to other catalogs
- No access to other LSST data products
- A user can define a filter to go through alerts but can only receive a limited number of alerts. (Minimum of 20 full-size alerts per telescope visit out of 10,000 generated per visit). Remember that alerts have a 12 month history. You can filter on this information.
- Minimum of 100 simultaneous users filtering the stream, but the number is limited.

Plans and Policies for LSST Alert Distribution (Eric Bellm @ U. of Washington)

Key documents:
- Plans and Policies for Alert Distribution (how will community brokers be chosen?) ls.st/LDM-612
- Data Products Definition Document LSE-163 (what will LSST alerts look like?) ls.st/dpdd
- Call for Letters of Intent for Community Alert Brokers (how do I apply to be a community broker?) ls.st/LDM-682
- LSST Alerts: Key Numbers (how many? how much? how often?) dmtn-102.lsst.io
The number of community brokers will be finite.
- Outbound bandwidth from the datacenter is the expected bottleneck; 10 Gbps allocated as a baseline
- Current expectations for number of supported brokers is ~7
Require demonstration of technical capability & appropriate personnel
- No requirement to receive the full stream
- No requirement to redistribute the full stream
- No requirement to make products world public
- Will favor proposals that offer these!
The selection process has two phases: an open call for Letters of Intent, and an invitational call for full proposals.
Brokers must demonstrate adequate resources
- Large inbound and outbound network bandwidth (the full alert stream is a few TB/night)
- Petabytes of disk capacity
- Databases handling of billions of sources
- Compute resources to handle sophisticated classification and filtering tasks in real time at scale
- Appropriate personnel to develop and maintain the service Institutional & funding support to ensure the longevity and stability of the service.
Brokers will be evaluated on their contribution to the scientific utilization of LSST.
- Serve a large community
- Enable high-profile science
- Provide unique capabilities
- Contribute to LSST’s four science pillars
- Take advantage of the unique aspects of the LSST alert stream (real-time, world-public)
Letters of intent due in May. Afterwords 3-day workshop, week of June 17, 2019, Seattle, WA Participants (by invitation) for LOI submitters and LSST Project personnel
Letters of intent due May 15, 2019 (Submission template)
- Proposals can still be in formative stages.
ZTF
- Alert Packet Tools: https://zwicky.tf/4t5
- Alert Schema Documentation: https://zwicky.tf/dm5
- Detailed Pipelines documentation: https://zwicky.tf/ykv
- PASP instrument papers: https://zwicky.tf/3w9
LSST
- Data Products Definition Document: ls.st/dpdd
- Prototype Schemas: https://github.com/lsst-dm/sample-avro-alert
- Kafka-based Alert Stream: https://github.com/lsst-dm/alert_stream
Alert is around 82 KB
ZTF alerts are available in bulk here
Example code for processing alerts is available here

Thursday Session 3: Broker Components

Connexions between LSST-DESC Broker and Machine Learning (Emille E. O. Ishida @ Université Clermont-Auvergne)

Complete representation is not possible in astronomical training sets for machine learning (ML).
One solution is to implement an active learning technique where human inspection is used to supplement the training process.
This can be applied to multiple types of ML models.
Human inspection does not scale to an online learning strategy with data sets as large as LSST. However it does provide a better training set for an initial, offline stage.
This approach can bias you to a particular science case, and needs to be re-run for multiple science objectives.
You can work in a “None of the above” category.

RAPID - Real-time Automated Photometric IDentification (Daniel Muthukrishna @ University of Cambridge)

Trained on PLAsTiCC data set.
SN identification is very similar to voice analysis. Quiet followed by a sudden increase in signal on multiple frequencies.
Publicly available via pip
Designed to classify over time, updating classification percentages as more observations become available

Thursday Session 4: Infrastructure in Development

Antares (Gauthem):

Classify objects, provide summary of object properties, and allow users to apply personalized filters
Expected to scale easily to LSST
Don’t apply 1 ML model to find all objects. Train multiple models to find specific objects.
Following LSST DM, ANTARES is dockarized
RAPID training is built in.
Thinking about google cloud and amazon for deployment.
Uses SciServer and JupyterHub to provide a web front end.

Lasair ()

Backend development in place - still building front end api.
Running on jupyter hub.
Has half an exabyte of storage.

Friday Session 5: Additional Talks and a Group Discussion on “Charting the Course Forward”

NERSC support (Debbie Bard @ NERSC)

Cori is generation NERSC-8. NERSC-9 (Perlmutter) comes online in 2020 and includes the addition of GPUs instead of just CPUs.
NERSC 9 is targeted at data applications and simulations
NVIDIA GPU-accelerated and AMD CPU only nodes
Back end codes, including ML codes, will come optimized “out of the box” so users won’t have to tune them.
All flash file system increasing I/O speeds which is a big improvment fo ML which involves alot of random reads
Spin is a side service running on seperate hardware for projects that don’t need access to the full supercomputer resources (Jupyter, web interfaces, etc.).
- Spin is where brokers will live.
- Based on docker containers.

PLAsTiCC update (Renae):

Data and models will be made public

SkyPortal (Stéfan van der Walt @ Berkely)

Sky Portal is an open source data access portal
Scalable from laptop to cloud services
Dockerized
Low overhead - fast - minimum mantinence
Includes authentication and admin controls

Random Thoughts

Any broker system needs to have redundancies to protect against downtime
Don’t just build systems in advance - train people to use them in advance
Brokers serve to simplify and filter data for easier consumption by the community. This can include a combination of rapid processing filters and slower, more indepth analysis.
Finding objects is more than nightly image subtraction. You also apply models to find deviations from expected behavior.
How to you encourage follow up data?
Most brokers have a bias to specific science goals. This leaves room for a “None of the Above” broker.
LSST templates for difference images will change annully. How to develop year 1 templates is still being researched.
The system should be dockerized
Allow users to specify their own filters (on the entire stream) and watch lists (cuts on the area on the sky)
There are publicly available ML classifiers that can be applied (eg. RAPID and PLAsTiCC).
What “value added” products should we include? Look to existing brokers for inspiration.
Log the version of each analysis step so you can track changes over time.
SkyPortal can be used as a front end (more scalable, astronomy focused alternative to Flask, Django, etc…)
Follow scheduler: https://tomtoolkit.github.io
DESC doesn’t want its own broker but is concerned about understanding the selection function and classification effeciency well.