Welcome to the SOSP 2021 Website. Pollux simultaneously considers both aspects. Upon these two primitives, our system can scale to thousands of concurrent enclaves with high resource utilization and eliminate the high-cost initialization of secure memory using fork-style enclave creation without weakening the security guarantees. Simultaneous submission of the same work to multiple venues, submission of previously published work, or plagiarism constitutes dishonesty or fraud. We argue that a key-value interface between a file system and an SSD is superior to the legacy block interface by presenting KEVIN. (Visa applications can take at least 30 working days to process.) Submissions violating the detailed formatting and anonymization rules will not be considered for review. Professor Veloso is the Past President of AAAI (the Association for the Advancement of Artificial Intelligence), and the co-founder, Trustee, and Past President of RoboCup. Today, privacy controls are enforced by data curators with full access to data in the clear. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. This formulation of memory management, which we call memory programming, is a generalization of paging that allows MAGE to provide a highly efficient virtual memory abstraction for SC. Moreover, to handle dynamic workloads, Nap adopts a fast NAL switch mechanism. After request completion, an I/O device must decide either to minimize latency by immediately firing an interrupt or to optimize for throughput by delaying the interrupt, anticipating that more requests will complete soon and help amortize the interrupt cost. Paper abstracts and proceedings front matter are available to everyone now. Cores can safely and concurrently read from their local kernel replica, eliminating remote NUMA accesses. Mothy received a PhD in 1995 from the Computer Laboratory of the University of Cambridge, where he was a principal designer and builder of the Nemesis OS. Amy Tai, VMware Research; Igor Smolyar, Technion Israel Institute of Technology; Michael Wei, VMware Research; Dan Tsafrir, Technion Israel Institute of Technology and VMware Research. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. We prove that DistAI is guaranteed to find the -free inductive invariant that proves the desired safety properties in finite time, if one exists. Message from the Program Co-Chairs. Evaluations show that Vegito can perform 1.9 million TPC-C NewOrder transactions and 24 TPC-H-equivalent queries per second simultaneously, which retain the excellent performance of specialized OLTP and OLAP counterparts (e.g., DrTM+H and MonetDB). HotCRP.com signin Sign in using your HotCRP.com account. 64 papers accepted out of 341 submitted. When registering your abstract, you must provide information about conflicts with PC members. Accepted papers will be allowed 14 pages in the proceedings, plus references. Papers so short as to be considered extended abstracts will not receive full consideration. A.H. Hunter, Jane Street Capital; Chris Kennelly, Paul Turner, Darryl Gove, Tipp Moseley, and Parthasarathy Ranganathan, Google. Lifting predicates and crash framing make the specification easy to use for developers, and logically atomic crash specifications allow for modular reasoning in GoJournal, making the proof tractable despite complex concurrency and crash interleavings. For conference information, . This paper demonstrates that it is possible to achieve s-scale latency using Linux kernel storage stack, even when tens of latency-sensitive applications compete for host resources with throughput-bound applications that perform read/write operations at throughput close to hardware capacity. In particular, responses must not include new experiments or data, describe additional work completed since submission, or promise additional work to follow. Mothy joined the Computer Science Department ETH Zurich in January 2007 and was named Fellow of the ACM in 2013 for contributions to operating systems and networking research. We propose a new framework for computing the embeddings of large-scale graphs on a single machine. JEL codes: Q18, Q28, Q57 . Hence, CLP enables efficient search and analytics on archived logs, something that was impossible without it. Metadata from voice calls, such as the knowledge of who is communicating with whom, contains rich information about peoples lives. Indeed, it is a prime target for powerful adversaries such as nation states. Consensus bugs are extremely rare but can be exploited for network split and theft, which cause reliability and security-critical issues in the Ethereum ecosystem. Sat, Aug 7, 2021 3 min read researches review. Prior or concurrent workshop publication does not preclude publishing a related paper in OSDI. Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. OSDI will provide an opportunity for authors to respond to reviews prior to final consideration of the papers at the program committee meeting. MAGE outperforms the OS virtual memory system by up to an order of magnitude, and in many cases, runs SC computations that do not fit in memory at nearly the same speed as if the underlying machines had unbounded physical memory to fit the entire computation. OSDI is "a premier forum for discussing the design, implementation, and implications of systems software." A total of six research papers from the department were accepted to the . Submissions may include as many additional pages as needed for references but not for appendices. USENIX, like other scientific and technical conferences and journals, prohibits these practices and may, on the recommendation of a program chair, take action against authors who have committed them. Authors must limit their responses to (a) correcting factual errors in the reviews or (b) directly addressing questions posed by reviewers. DMon speeds up PostgreSQL, one of the most popular database systems, by 6.64% on average (up to 17.48%). There is no explicit limit to the response, but authors are strongly encouraged to keep it under 500 words; reviewers are neither required nor expected to read excessively long responses. We present application studies for 8 applications, improving requests-per-second (RPS) by 7.7% and reducing RAM usage 2.4%. Consensus bugs are bugs that make Ethereum clients transition to incorrect blockchain states and fail to reach consensus with other clients. Proceedings Cover | Our evaluation on the SPEC benchmarks shows that SanRazor can reduce the overhead of sanitizers significantly, from 73.8% to 28.062.0% for AddressSanitizer, and from 160.1% to 36.6124.4% for UndefinedBehaviorSanitizer (depending on the applied reduction scheme). USENIX new Date().getFullYear()>document.write(new Date().getFullYear()); Grants for Black Computer Science Students Application, Propose an interesting, compelling solution, Demonstrate the practicality and benefits of the solution, Clearly describe the paper's contributions, Clearly articulate the advances beyond previous work. We also welcome work that explores the interface to related areas such as computer architecture, networking, programming languages, analytics, and databases. We present DistAI, a data-driven automated system for learning inductive invariants for distributed protocols. To remedy this, we introduce DeSearch, the first decentralized search engine that guarantees the integrity and privacy of search results for decentralized services and blockchain apps. We identify that current systems for learning the embeddings of large-scale graphs are bottlenecked by data movement, which results in poor resource utilization and inefficient training. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. DMons targeted optimizations provide 16.83% speedup on average (up to 53.14%), compared to a baseline that uses the highest level of compiler optimization. Web pages today commonly include large amounts of JavaScript code in order to offer users a dynamic experience. This kernel is scaled across NUMA nodes using node replication, a scheme inspired by state machine replication in distributed systems. By submitting a paper, you agree that at least one of the authors will attend the conference to present it. Proceedings Front Matter Last year, 70% of accepted OSDI papers participated in the . We demonstrate that the hardware thread scheduler is able to lower RPC tail response time by about 5 while enabling the system to sustain 20% higher load, relative to traditional thread scheduling techniques. If you submit a paper to either of those venues, you may not also submit it to OSDI 21. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. PET then automatically corrects results to restore full equivalence. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 1416, 2021. NrOS replicates kernel state on each NUMA node and uses operation logs to maintain strong consistency between replicas. In this paper, we propose a software-hardware co-design to support dynamic, fine-grained, large-scale secure memory as well as fast-initialization. Papers not meeting these criteria will be rejected without review, and no deadline extensions will be granted for reformatting. Devices employ adaptive interrupt coalescing heuristics that try to balance between these opposing goals. If the conference registration fee will pose a hardship for the presenter of the accepted paper, please contact conference@usenix.org. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning, Oort: Efficient Federated Learning via Guided Participant Selection, PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections, Modernizing File System through In-Storage Indexing, Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes, Rearchitecting Linux Storage Stack for s Latency and High Throughput, Optimizing Storage Performance with Calibrated Interrupts, ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction, DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling, CLP: Efficient and Scalable Search on Compressed Text Logs, Polyjuice: High-Performance Transactions via Learned Concurrency Control, Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing, The nanoPU: A Nanosecond Network Stack for Datacenters, Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator, Scalable Memory Protection in the PENGLAI Enclave, NrOS: Effective Replication and Sharing in an Operating System, Addra: Metadata-private voice communication over fully untrusted infrastructure, Bringing Decentralized Search to Decentralized Services, Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing, MAGE: Nearly Zero-Cost Virtual Memory for Secure Computation, Zeph: Cryptographic Enforcement of End-to-End Data Privacy, It's Time for Operating Systems to Rediscover Hardware, DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols, GoJournal: a verified, concurrent, crash-safe journaling system, STORM: Refinement Types for Secure Web Applications, Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation, SANRAZOR: Reducing Redundant Sanitizer Checks in C/C++ Programs, Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads, GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs, Marius: Learning Massive Graph Embeddings on a Single Machine, P3: Distributed Deep Graph Learning at Scale. In this paper, we present P3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting. The key to our solution, Horcrux, is to account for the non-determinism intrinsic to web page loads and the constraints placed by the browsers API for parallelism. The program co-chairs will use this information at their discretion to preserve the anonymity of the review process without jeopardizing the outcome of the current OSDI submission. We implement a variant of a log-structured merge tree in the storage device that not only indexes file objects, but also supports transactions and manages physical storage space. (Oct 2018) Awarded an Intel Faculty Grant for Research on automated performance optimization (Sep. 2018) Our paper on Foreshadow is accepted to appear at USENIX Security. The co-chairs may then share that paper with the workshops organizers and discuss it with them. Erhu Feng, Xu Lu, Dong Du, Bicheng Yang, and Xueqiang Jiang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Yubin Xia, Binyu Zang, and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. We will look at various problems and approaches, and for each, see if blockchain would help. In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. Second, GNNAdvisor implements a novel and highly-efficient 2D workload management tailored for GNN computation to improve GPU utilization and performance under different application settings. OSDI '21 Technical Sessions All the times listed below are in Pacific Daylight Time (PDT). Calibrated interrupts increase throughput by up to 35%, reduce CPU consumption by as much as 30%, and achieve up to 37% lower latency when interrupts are coalesced. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library (3.02 faster on average) and NeuGraph (up to 4.10 faster), on mainstream GNN architectures across various datasets. First, it enables a caller to push a message to a callee in two hops, using a new way of assigning mailboxes to users that resembles how a post office assigns PO boxes to its customers. How can we design systems that will be reliable despite misbehaving participants? Professor Veloso earned a Bachelor and Master of Science degrees in Electrical and Computer Engineering from Instituto Superior Tecnico in Lisbon, Portugal, a Master of Arts in Computer Science from Boston University, and Master of Science and PhD in Computer Science from Carnegie Mellon University. Please identify yourself as a presenter and include your mailing address in your email. Submitted papers must be no longer than 12 single-spaced 8.5 x 11 pages, including figures and tables, plus as many pages as needed for references, using 10-point type on 12-point (single-spaced) leading, two-column format, Times Roman or a similar font, within a text block 7 wide x 9 deep. Our further evaluation on 38 CVEs from 10 commonly-used programs shows that SanRazor reduced checks suffice to detect at least 33 out of the 38 CVEs. Session Chairs: Gennady Pekhimenko, University of Toronto / Vector Institute, and Shivaram Venkataraman, University of WisconsinMadison, Aurick Qiao, Petuum, Inc. and Carnegie Mellon University; Sang Keun Choe and Suhas Jayaram Subramanya, Carnegie Mellon University; Willie Neiswanger, Petuum, Inc. and Carnegie Mellon University; Qirong Ho, Petuum, Inc.; Hao Zhang, Petuum, Inc. and UC Berkeley; Gregory R. Ganger, Carnegie Mellon University; Eric P. Xing, MBZUAI, Petuum, Inc., and Carnegie Mellon University. HotNets provides a venue for discussing innovative ideas and for debating future research agendas in networking. NrOS is primarily constructed as a simple, sequential kernel with no concurrency, making it easier to develop and reason about its correctness. To adapt to different workloads, prior works mix or switch between a few known algorithms using manual insights or simple heuristics.