docs/source/working-notes/early-dev/desc_broker_workshop.md

DESC Broker Workshop

This document provides notes on the LSST-DESC Broker Workshop. Link to Talks

Thursday Session 1: Science Vision and Precursor Surveys

Lessons learned from microlensing follow-up (Rachel Street)

  • Microlensing requires rapid followup (Usually < 10 alerts per day)

  • You need redundancy in followup locations to deal with weather and down time

  • Data should be shared in real time with clear publication guidelines

    • Clear policy from each team over use of their telescope time/data

    • Clear procedure if someone is interested in analyzing/publishing on a target/topic

    • Clearly identify who should be notified and when

    • Incentive “playing nice”, penalize those who don’t

  • Don’t just build systems in advance - train people to use them in advance

  • Avoid gold rush syndrome. Balance detailed follow up with a small number of targets with broader followup of many targets

Black Hole Microlensing with Parallax (Nathan Golovich @ LLNL)

  • Discussed why this topic is scientifically interesting

  • How will the alert system handel long time scale signals with weak signal to noise?

  • How will the catalogue / broker handel objects that have large parallax?

  • Understanding the optimal cadence for different objects is important for creating an observing strategy.

Microlensing with ZTF: Breaking in Spin with Brokers (Michael Medford @ LBNL)

  • ZTF Alert Packet Inspection Tool

  • Capture

    • Ingest ZTF alert packet data through Kafka consumer

    • Cross-match detections into long duration light-curves

  • Detect

    • Regularly filter for ongoing and completed microlensing events

    • Remove false positives, mainly variable stars

  • Characterize

    • Fit microlensing events to model parameters

    • Calculate optical depths, Einstein crossing time distributions, and dark matter constraints

  • Existing brokers have specific science cases in mind that don’t always scale to the general community.

  • NERSC is capable of processing the data from 47 deg2 every 30 seconds in realtime (~O(105) per night)

    • Host of complimentary services including HPSS tape archive, science gateways, NERSC web Toolkit (NEWT) HTML API, Spin, etc.

  • ZAPIT v0.5 is currently running on NERSC’s Spin pilot phase

    • Containers-as-a-Service platform based on Docker container technology

    • All the convenient benefits of docker (DBs, web service, scalability) and all of the computational firepower of NERSC

The ZTF Coadd Facility (Danny Goldstein @ Caltech)

  • Combines images from multiple observations to create deeper stacks.

  • Reduces functional cadence, but increases the number of discovered objects

Thursday Session 2: LSST Prompt Processing Data Products

LSST Prompt Data Products (Melissa Graham @ U. of Washington)

  • Online LSST forum for DM here. See ls.st/dmtn-102 for alert stream numbers.

  • Alert production process:

    • New image reduced and calibrated

    • Difference image created

    • Source detection on difference image

    • Source association (by coordinate) and characterization

    • Alert packet assembled for SNR>5 detections

    • Alert sent to community

  • Alert packet contents for 60s releases:

    • Difference image source parameters

      • ID, coordinate, flux, shape, SNR, association with static and moving cataloges

    • Difference image object parameters

      • ~12 month history of proper motion, parallax, mean flux, variability parameters, and ID for the latest Data Release deep stacks

    • Image stamps (FITS)

      • At least 6” by 6” with flux, variance, and mask extensions

      • Includes WCS, zero point, PSF, etc.

  • You can request larger images than the postage stamp but there will be up to a 24hr delay waiting for the daily data release.

  • May or may not have CCid information.

  • Publication times start when image readout ends and broker access is allowed to initiate.

  • LSST has its own basic filtering service so that users can submit basic queries (eg. SQL)

    • No cross-matching to other catalogs

    • No access to other LSST data products

    • A user can define a filter to go through alerts but can only receive a limited number of alerts. (Minimum of 20 full-size alerts per telescope visit out of 10,000 generated per visit). Remember that alerts have a 12 month history. You can filter on this information.

    • Minimum of 100 simultaneous users filtering the stream, but the number is limited.

Plans and Policies for LSST Alert Distribution (Eric Bellm @ U. of Washington)

  • Key documents:

    • Plans and Policies for Alert Distribution (how will community brokers be chosen?) ls.st/LDM-612

    • Data Products Definition Document LSE-163 (what will LSST alerts look like?) ls.st/dpdd

    • Call for Letters of Intent for Community Alert Brokers (how do I apply to be a community broker?) ls.st/LDM-682

    • LSST Alerts: Key Numbers (how many? how much? how often?) dmtn-102.lsst.io

  • The number of community brokers will be finite.

    • Outbound bandwidth from the datacenter is the expected bottleneck; 10 Gbps allocated as a baseline

    • Current expectations for number of supported brokers is ~7

  • Require demonstration of technical capability & appropriate personnel

    • No requirement to receive the full stream

    • No requirement to redistribute the full stream

    • No requirement to make products world public

    • Will favor proposals that offer these!

  • The selection process has two phases: an open call for Letters of Intent, and an invitational call for full proposals.

  • Brokers must demonstrate adequate resources

    • Large inbound and outbound network bandwidth (the full alert stream is a few TB/night)

    • Petabytes of disk capacity

    • Databases handling of billions of sources

    • Compute resources to handle sophisticated classification and filtering tasks in real time at scale

    • Appropriate personnel to develop and maintain the service Institutional & funding support to ensure the longevity and stability of the service.

  • Brokers will be evaluated on their contribution to the scientific utilization of LSST.

    • Serve a large community

    • Enable high-profile science

    • Provide unique capabilities

    • Contribute to LSST’s four science pillars

    • Take advantage of the unique aspects of the LSST alert stream (real-time, world-public)

  • Letters of intent due in May. Afterwords 3-day workshop, week of June 17, 2019, Seattle, WA Participants (by invitation) for LOI submitters and LSST Project personnel

  • Letters of intent due May 15, 2019 (Submission template)

    • Proposals can still be in formative stages.

  • ZTF

  • LSST

  • Alert is around 82 KB

  • ZTF alerts are available in bulk here

  • Example code for processing alerts is available here

Thursday Session 3: Broker Components

Connexions between LSST-DESC Broker and Machine Learning (Emille E. O. Ishida @ Université Clermont-Auvergne)

  • Complete representation is not possible in astronomical training sets for machine learning (ML).

  • One solution is to implement an active learning technique where human inspection is used to supplement the training process.

  • This can be applied to multiple types of ML models.

  • Human inspection does not scale to an online learning strategy with data sets as large as LSST. However it does provide a better training set for an initial, offline stage.

  • This approach can bias you to a particular science case, and needs to be re-run for multiple science objectives.

  • You can work in a “None of the above” category.

RAPID - Real-time Automated Photometric IDentification (Daniel Muthukrishna @ University of Cambridge)

  • Trained on PLAsTiCC data set.

  • SN identification is very similar to voice analysis. Quiet followed by a sudden increase in signal on multiple frequencies.

  • Publicly available via pip

  • Designed to classify over time, updating classification percentages as more observations become available

Thursday Session 4: Infrastructure in Development

Antares (Gauthem):

  • Classify objects, provide summary of object properties, and allow users to apply personalized filters

  • Expected to scale easily to LSST

  • Don’t apply 1 ML model to find all objects. Train multiple models to find specific objects.

  • Following LSST DM, ANTARES is dockarized

  • RAPID training is built in.

  • Thinking about google cloud and amazon for deployment.

  • Uses SciServer and JupyterHub to provide a web front end.

Lasair ()

  • Backend development in place - still building front end api.

  • Running on jupyter hub.

  • Has half an exabyte of storage.

Friday Session 5: Additional Talks and a Group Discussion on “Charting the Course Forward”

NERSC support (Debbie Bard @ NERSC)

  • Cori is generation NERSC-8. NERSC-9 (Perlmutter) comes online in 2020 and includes the addition of GPUs instead of just CPUs.

  • NERSC 9 is targeted at data applications and simulations

  • NVIDIA GPU-accelerated and AMD CPU only nodes

  • Back end codes, including ML codes, will come optimized “out of the box” so users won’t have to tune them.

  • All flash file system increasing I/O speeds which is a big improvment fo ML which involves alot of random reads

  • Spin is a side service running on seperate hardware for projects that don’t need access to the full supercomputer resources (Jupyter, web interfaces, etc.).

    • Spin is where brokers will live.

    • Based on docker containers.

PLAsTiCC update (Renae):

  • Data and models will be made public

SkyPortal (Stéfan van der Walt @ Berkely)

  • Sky Portal is an open source data access portal

  • Scalable from laptop to cloud services

  • Dockerized

  • Low overhead - fast - minimum mantinence

  • Includes authentication and admin controls

Random Thoughts

  • Any broker system needs to have redundancies to protect against downtime

  • Don’t just build systems in advance - train people to use them in advance

  • Brokers serve to simplify and filter data for easier consumption by the community. This can include a combination of rapid processing filters and slower, more indepth analysis.

  • Finding objects is more than nightly image subtraction. You also apply models to find deviations from expected behavior.

  • How to you encourage follow up data?

  • Most brokers have a bias to specific science goals. This leaves room for a “None of the Above” broker.

  • LSST templates for difference images will change annully. How to develop year 1 templates is still being researched.

  • The system should be dockerized

  • Allow users to specify their own filters (on the entire stream) and watch lists (cuts on the area on the sky)

  • There are publicly available ML classifiers that can be applied (eg. RAPID and PLAsTiCC).

  • What “value added” products should we include? Look to existing brokers for inspiration.

  • Log the version of each analysis step so you can track changes over time.

  • SkyPortal can be used as a front end (more scalable, astronomy focused alternative to Flask, Django, etc…)

  • Follow scheduler: https://tomtoolkit.github.io

  • DESC doesn’t want its own broker but is concerned about understanding the selection function and classification effeciency well.