# docs/source/working-notes/early-dev/desc_broker_workshop.md ## DESC Broker Workshop This document provides notes on the LSST-DESC Broker Workshop. [Link to Talks](https://drive.google.com/drive/folders/1sjYXbdwTID3VnzZNAkcjLbjRfpwNaO_n?usp=sharing) ### Thursday Session 1: Science Vision and Precursor Surveys #### Lessons learned from microlensing follow-up (Rachel Street) - Microlensing requires rapid followup (Usually < 10 alerts per day) - You need redundancy in followup locations to deal with weather and down time - Data should be shared in real time with clear publication guidelines - Clear policy from each team over use of their telescope time/data - Clear procedure if someone is interested in analyzing/publishing on a target/topic - Clearly identify who should be notified and when - Incentive “playing nice”, penalize those who don’t - Don't just build systems in advance - train people to use them in advance - Avoid gold rush syndrome. Balance detailed follow up with a small number of targets with broader followup of many targets #### Black Hole Microlensing with Parallax (Nathan Golovich @ LLNL) - Discussed why this topic is scientifically interesting - How will the alert system handel long time scale signals with weak signal to noise? - How will the catalogue / broker handel objects that have large parallax? - Understanding the optimal cadence for different objects is important for creating an observing strategy. #### Microlensing with ZTF: Breaking in Spin with Brokers (Michael Medford @ LBNL) - **Z**TF **A**lert **P**acket **I**nspection **T**ool - Capture - Ingest ZTF alert packet data through Kafka consumer - Cross-match detections into long duration light-curves - Detect - Regularly filter for ongoing and completed microlensing events - Remove false positives, mainly variable stars - Characterize - Fit microlensing events to model parameters - Calculate optical depths, Einstein crossing time distributions, and dark matter constraints - Existing brokers have specific science cases in mind that don't always scale to the general community. - NERSC is capable of processing the data from 47 deg2 every 30 seconds in realtime (~O(105) per night) - Host of complimentary services including HPSS tape archive, science gateways, NERSC web Toolkit (NEWT) HTML API, Spin, etc. - ZAPIT v0.5 is currently running on NERSC’s Spin pilot phase - Containers-as-a-Service platform based on Docker container technology - All the convenient benefits of docker (DBs, web service, scalability) and all of the computational firepower of NERSC #### The ZTF Coadd Facility (Danny Goldstein @ Caltech) - Combines images from multiple observations to create deeper stacks. - Reduces functional cadence, but increases the number of discovered objects ### Thursday Session 2: LSST Prompt Processing Data Products #### LSST Prompt Data Products (Melissa Graham @ U. of Washington) - Online LSST forum for DM [here](https://community.lsst.org/). See [ls.st/dmtn-102](ls.st/dmtn-102) for alert stream numbers. - Alert production process: - New image reduced and calibrated - Difference image created - Source detection on difference image - Source association (by coordinate) and characterization - Alert packet assembled for SNR>5 detections - Alert sent to community - Alert packet contents for 60s releases: - Difference image source parameters - ID, coordinate, flux, shape, SNR, association with static and moving cataloges - Difference image object parameters - ~12 month history of proper motion, parallax, mean flux, variability parameters, and ID for the latest Data Release deep stacks - Image stamps (FITS) - At least 6" by 6" with flux, variance, and mask extensions - Includes WCS, zero point, PSF, etc. - You can request larger images than the postage stamp but there will be up to a 24hr delay waiting for the daily data release. - May or may not have CCid information. - Publication times start when image readout ends and broker access is allowed to initiate. - LSST has its own basic filtering service so that users can submit basic queries (eg. SQL) - No cross-matching to other catalogs - No access to other LSST data products - A user can define a filter to go through alerts but can only receive a limited number of alerts. (Minimum of 20 full-size alerts per telescope visit out of 10,000 generated per visit). Remember that alerts have a 12 month history. You can filter on this information. - Minimum of 100 simultaneous users filtering the stream, but the number is limited. #### Plans and Policies for LSST Alert Distribution (Eric Bellm @ U. of Washington) - Key documents: - Plans and Policies for Alert Distribution (how will community brokers be chosen?) [ls.st/LDM-612](ls.st/LDM-612) - Data Products Definition Document LSE-163 (what will LSST alerts look like?) [ls.st/dpdd](ls.st/dpdd) - Call for Letters of Intent for Community Alert Brokers (how do I apply to be a community broker?) [ls.st/LDM-682](ls.st/LDM-682) - LSST Alerts: Key Numbers (how many? how much? how often?) [dmtn-102.lsst.io](dmtn-102.lsst.io) - The number of community brokers will be finite. - Outbound bandwidth from the datacenter is the expected bottleneck; 10 Gbps allocated as a baseline - Current expectations for number of supported brokers is ~7 - Require demonstration of technical capability & appropriate personnel - No requirement to receive the full stream - No requirement to redistribute the full stream - No requirement to make products world public - Will favor proposals that offer these! - The selection process has two phases: an open call for Letters of Intent, and an invitational call for full proposals. - Brokers must demonstrate adequate resources - Large inbound and outbound network bandwidth (the full alert stream is a few TB/night) - Petabytes of disk capacity - Databases handling of billions of sources - Compute resources to handle sophisticated classification and filtering tasks in real time at scale - Appropriate personnel to develop and maintain the service Institutional & funding support to ensure the longevity and stability of the service. - Brokers will be evaluated on their contribution to the scientific utilization of LSST. - Serve a large community - Enable high-profile science - Provide unique capabilities - Contribute to LSST’s four science pillars - Take advantage of the unique aspects of the LSST alert stream (real-time, world-public) - Letters of intent due in May. Afterwords 3-day workshop, week of June 17, 2019, Seattle, WA Participants (by invitation) for LOI submitters and LSST Project personnel - Letters of intent due May 15, 2019 ([Submission template](https://github.com/lsst/LDM)) - Proposals can still be in formative stages. - ZTF - Alert Packet Tools: [https://zwicky.tf/4t5](https://zwicky.tf/4t5) - Alert Schema Documentation: [https://zwicky.tf/dm5](https://zwicky.tf/dm5) - Detailed Pipelines documentation: [https://zwicky.tf/ykv](https://zwicky.tf/ykv) - PASP instrument papers: [https://zwicky.tf/3w9](https://zwicky.tf/3w9) - LSST - Data Products Definition Document: [ls.st/dpdd](ls.st/dpdd) - Prototype Schemas: [https://github.com/lsst-dm/sample-avro-alert](https://github.com/lsst-dm/sample-avro-alert) - Kafka-based Alert Stream: [https://github.com/lsst-dm/alert_stream](https://github.com/lsst-dm/alert_stream) - Alert is around 82 KB - ZTF alerts are available in bulk [here](https://ztf.uw.edu/alerts/public/) - Example code for processing alerts is available [here](https://zwicky.tf/bq6) ### Thursday Session 3: Broker Components #### Connexions between LSST-DESC Broker and Machine Learning (Emille E. O. Ishida @ Université Clermont-Auvergne) - Complete representation is not possible in astronomical training sets for machine learning (ML). - One solution is to implement an active learning technique where human inspection is used to supplement the training process. - This can be applied to multiple types of ML models. - Human inspection does not scale to an online learning strategy with data sets as large as LSST. However it does provide a better training set for an initial, offline stage. - This approach can bias you to a particular science case, and needs to be re-run for multiple science objectives. - You can work in a "None of the above" category. #### RAPID - Real-time Automated Photometric IDentification (Daniel Muthukrishna @ University of Cambridge) - Trained on PLAsTiCC data set. - SN identification is very similar to voice analysis. Quiet followed by a sudden increase in signal on multiple frequencies. - Publicly available via pip - Designed to classify over time, updating classification percentages as more observations become available ### Thursday Session 4: Infrastructure in Development #### Antares (Gauthem): - Classify objects, provide summary of object properties, and allow users to apply personalized filters - Expected to scale easily to LSST - Don't apply 1 ML model to find all objects. Train multiple models to find specific objects. - Following LSST DM, ANTARES is dockarized - RAPID training is built in. - Thinking about google cloud and amazon for deployment. - Uses SciServer and JupyterHub to provide a web front end. #### Lasair () - Backend development in place - still building front end api. - Running on jupyter hub. - Has half an exabyte of storage. ### Friday Session 5: Additional Talks and a Group Discussion on “Charting the Course Forward” #### NERSC support (Debbie Bard @ NERSC) - Cori is generation NERSC-8. NERSC-9 (Perlmutter) comes online in 2020 and includes the addition of GPUs instead of just CPUs. - NERSC 9 is targeted at data applications and simulations - NVIDIA GPU-accelerated and AMD CPU only nodes - Back end codes, including ML codes, will come optimized "out of the box" so users won't have to tune them. - All flash file system increasing I/O speeds which is a big improvment fo ML which involves alot of random reads - Spin is a side service running on seperate hardware for projects that don't need access to the full supercomputer resources (Jupyter, web interfaces, etc.). - Spin is where brokers will live. - Based on docker containers. #### PLAsTiCC update (Renae): - Data and models will be made public #### SkyPortal (Stéfan van der Walt @ Berkely) - Sky Portal is an open source data access portal - Scalable from laptop to cloud services - Dockerized - Low overhead - fast - minimum mantinence - Includes authentication and admin controls ### Random Thoughts - Any broker system needs to have redundancies to protect against downtime - Don't just build systems in advance - train people to use them in advance - Brokers serve to simplify and filter data for easier consumption by the community. This can include a combination of rapid processing filters and slower, more indepth analysis. - Finding objects is more than nightly image subtraction. You also apply models to find deviations from expected behavior. - How to you encourage follow up data? - Most brokers have a bias to specific science goals. This leaves room for a "None of the Above" broker. - LSST templates for difference images will change annully. How to develop year 1 templates is still being researched. - The system should be dockerized - Allow users to specify their own filters (on the entire stream) and watch lists (cuts on the area on the sky) - There are publicly available ML classifiers that can be applied (eg. RAPID and PLAsTiCC). - What "value added" products should we include? Look to existing brokers for inspiration. - Log the version of each analysis step so you can track changes over time. - SkyPortal can be used as a front end (more scalable, astronomy focused alternative to Flask, Django, etc...) - Follow scheduler: https://tomtoolkit.github.io - DESC doesn't want its own broker but is concerned about understanding the selection function and classification effeciency well.