Latest Posts
Google for Education’s latest updates from BETT 2025.
Class tools is a new set of real-time control and student engagement features for managed Chromebooks.
We’re announcing new updates to Google Workspace for Education and Chromebooks to help everyone teach and learn in a way that works for them.
We’re kicking off 2025 with new launches from Chromebooks and Google Workspace for Education.
We’re announcing new admin controls across Google Workspace for Education and Chromebooks.
arXiv:2411.03706v2 Announce Type: replace Abstract: We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D change masks and object transformations. Our method can accurately identify changes in cluttered environments using sparse (as few as one) post-change images within as little as 18s. It does not rely on depth input, user instructions, pre-defined object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD.
arXiv:2411.03682v2 Announce Type: replace Abstract: Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. We train visuomotor policies on task demonstrations using this gripper through imitation learning, applying transformation to a motion-invariant space for computing the training loss. Gripper motions generated by the policies are retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework's effectiveness in learning and transferring visuomotor skills across various robots. More information can be found at the project page: https://ut-hcrl.github.io/LEGATO.
arXiv:2411.04418v2 Announce Type: replace Abstract: Over the years, there has been extensive work on fully dynamic algorithms for classic graph problems that admit greedy solutions. Examples include $(\Delta+1)$ vertex coloring, maximal independent set, and maximal matching. For all three problems, there are randomized algorithms that maintain a valid solution after each edge insertion or deletion to the $n$-vertex graph by spending $\polylog n$ time, provided that the adversary is oblivious. However, none of these algorithms work against adaptive adversaries whose updates may depend on the output of the algorithm. In fact, even breaking the trivial bound of $O(n)$ against adaptive adversaries remains open for all three problems. For instance, in the case of $(\Delta+1)$ vertex coloring, the main challenge is that an adaptive adversary can keep inserting edges between vertices of the same color, necessitating a recoloring of one of the endpoints. The trivial algorithm would simply scan all neighbors of one endpoint to find a new available color (which always exists) in $O(n)$ time. In this paper, we break this linear barrier for the $(\Delta+1)$ vertex coloring problem. Our algorithm is randomized, and maintains a valid $(\Delta+1)$ vertex coloring after each edge update by spending $\widetilde{O}(n^{8/9})$ time with high probability.
arXiv:2410.20275v2 Announce Type: replace Abstract: Alternative Current Optimal Power Flow (AC-OPF) is essential for efficient planning and real-time operation in power systems but is NP-hard and non-convex, leading to significant computational challenges. Neural networks (NNs) offer computational speedups in solving OPF but face issues like dependency on large datasets, scalability limitations, and inability to enforce physical constraints, compromising solution reliability. To overcome these limitations, this paper proposes hybrid Quantum Neural Networks (QNNs) that integrate quantum computing principles into neural network architectures. Leveraging quantum mechanics properties such as superposition and entanglement, QNNs can capture complex input-output relationships more effectively and learn from small or noisy datasets.To further improve the performance of QNNs and investigate the interplay between classical and quantum components in hybrid architectures, we incorporate advanced techniques, including residual learning and physics-informed machine learning, into the hybrid QNN designs. These enhancements aim to boost convergence efficiency, lower errors, superior generalization, and robustness to quantum noise. Simulation results demonstrate that these enhanced hybrid QNNs outperform typical hybrid QNNs in solving OPF problems. This work provides valuable insights into the design and optimization of hybrid QNNs, highlighting the potential of quantum computation for broader applications in power systems.
arXiv:2410.19242v2 Announce Type: replace Abstract: The weight spectrum plays a crucial role in the performance of error-correcting codes. Despite substantial theoretical exploration into polar codes with mother code length, a framework for the weight spectrum of rate-compatible polar codes remains elusive. In this paper, we address this gap by enumerating the number of minimum-weight codewords for quasi-uniform punctured, Wang-Liu shortened, and bit-reversal shortened polar codes. Additionally, we propose efficient algorithms for computing the average spectrum of random upper-triangular pre-transformed shortened and punctured polar codes. Notably, our algorithms operate with polynomial complexity relative to the code length. Simulation results affirm that our findings can substantially enhance the practical construction of rate-compatible polar codes, and leading to an improved weight spectrum.
arXiv:2411.01777v2 Announce Type: replace Abstract: Prediction is a fundamental capability of all living organisms, and has been proposed as an objective for learning sensory representations. Recent work demonstrates that in primate visual systems, prediction is facilitated by neural representations that follow straighter temporal trajectories than their initial photoreceptor encoding, which allows for prediction by linear extrapolation. Inspired by these experimental findings, we develop a self-supervised learning (SSL) objective that explicitly quantifies and promotes straightening. We demonstrate the power of this objective in training deep feedforward neural networks on smoothly-rendered synthetic image sequences that mimic commonly-occurring properties of natural videos. The learned model contains neural embeddings that are predictive, but also factorize the geometric, photometric, and semantic attributes of objects. The representations also prove more robust to noise and adversarial attacks compared to previous SSL methods that optimize for invariance to random augmentations. Moreover, these beneficial properties can be transferred to other training procedures by using the straightening objective as a regularizer, suggesting a broader utility for straightening as a principle for robust unsupervised learning.
arXiv:2410.23008v2 Announce Type: replace Abstract: Developing new machine learning applications often requires the collection of new datasets. However, existing datasets may already contain relevant information to train models for new purposes. We propose SoundCollage: a framework to discover new classes within audio datasets by incorporating (1) an audio pre-processing pipeline to decompose different sounds in audio samples, and (2) an automated model-based annotation mechanism to identify the discovered classes. Furthermore, we introduce the clarity measure to assess the coherence of the discovered classes for better training new downstream applications. Our evaluations show that the accuracy of downstream audio classifiers within discovered class samples and a held-out dataset improves over the baseline by up to 34.7% and 4.5%, respectively. These results highlight the potential of SoundCollage in making datasets reusable by labeling with newly discovered classes. To encourage further research in this area, we open-source our code at https://github.com/nokia-bell-labs/audio-class-discovery.
arXiv:2501.04900v2 Announce Type: replace Abstract: In the digital era, managing posthumous data presents a growing challenge, with current technical solutions often falling short in practicality. Existing tools are typically closed-source, lack transparency, fail to offer cross-platform support, and provide limited access control. This paper introduces `Beyond Life', a cross-platform digital will management solution designed to securely handle and distribute digital assets after death. At the core of this solution is a customized Ciphertext-Policy Attribute-Based Encryption (CP-ABE) scheme, referred to as PD-CP-ABE, which enables efficient, fine-grained control over access to will content at scale. Unlike existing systems, Beyond Life operates independently of service providers, offering users greater transparency and control over how their will is generated, stored, and executed. The system is also designed to be portable, allowing users to change their will service provider. The proposed system has been fully developed and rigorously evaluated to ensure performance and real-world feasibility. The system implementation is made publicly available.
arXiv:2410.23649v2 Announce Type: replace Abstract: Parkinson's disease (PD), a degenerative disorder of the central nervous system, is commonly diagnosed using functional medical imaging techniques such as single-photon emission computed tomography (SPECT). In this study, we utilized two SPECT data sets (n = 634 and n = 202) from different hospitals to develop a model capable of accurately predicting PD stages, a multiclass classification task. We used the entire three-dimensional (3D) brain images as input and experimented with various model architectures. Initially, we treated the 3D images as sequences of two-dimensional (2D) slices and fed them sequentially into 2D convolutional neural network (CNN) models pretrained on ImageNet, averaging the outputs to obtain the final predicted stage. We also applied 3D CNN models pretrained on Kinetics-400. Additionally, we incorporated an attention mechanism to account for the varying importance of different slices in the prediction process. To further enhance model efficacy and robustness, we simultaneously trained the two data sets using weight sharing, a technique known as cotraining. Our results demonstrated that 2D models pretrained on ImageNet outperformed 3D models pretrained on Kinetics-400, and models utilizing the attention mechanism outperformed both 2D and 3D models. The cotraining technique proved effective in improving model performance when the cotraining data sets were sufficiently large.
arXiv:2410.17517v2 Announce Type: replace Abstract: Swarm intelligence (SI) explores how large groups of simple individuals (e.g., insects, fish, birds) collaborate to produce complex behaviors, exemplifying that the whole is greater than the sum of its parts. A fundamental task in SI is Collective Decision-Making (CDM), where a group selects the best option among several alternatives, such as choosing an optimal foraging site. In this work, we demonstrate a theoretical and empirical equivalence between CDM and single-agent reinforcement learning (RL) in multi-armed bandit problems, utilizing concepts from opinion dynamics, evolutionary game theory, and RL. This equivalence bridges the gap between SI and RL and leads us to introduce a novel abstract RL update rule called Maynard-Cross Learning. Additionally, it provides a new population-based perspective on common RL practices like learning rate adjustment and batching. Our findings enable cross-disciplinary fertilization between RL and SI, allowing techniques from one field to enhance the understanding and methodologies of the other.
arXiv:2410.20564v2 Announce Type: replace Abstract: Conversational systems rely heavily on speech recognition to interpret and respond to user commands and queries. Despite progress on speech recognition accuracy, errors may still sometimes occur and can significantly affect the end-user utility of such systems. While visual feedback can help detect errors, it may not always be practical, especially for people who are blind or low-vision. In this study, we investigate ways to improve error detection by manipulating the audio output of the transcribed text based on the recognizer's confidence level in its result. Our findings show that selectively slowing down the audio when the recognizer exhibited uncertainty led to a 12% relative increase in participants' ability to detect errors compared to uniformly slowing the audio. It also reduced the time it took participants to listen to the recognition result and decide if there was an error by 11%.
arXiv:2410.18067v3 Announce Type: replace Abstract: This paper studies how transformer models develop robust wavelet-like properties that effectively compensate for the theoretical limitations of Rotary Position Embeddings (RoPE), providing insights into how these networks process sequential information across different scales. Through theoretical analysis and empirical validation across models ranging from 1B to 12B parameters, we show that attention heads naturally evolve to implement multi-resolution processing analogous to wavelet transforms. Our analysis establishes that attention heads consistently organize into complementary frequency bands with systematic power distribution patterns, and these wavelet-like characteristics become more pronounced in larger models. We provide mathematical analysis showing how these properties align with optimal solutions to the fundamental uncertainty principle between positional precision and frequency resolution. Our findings suggest that the effectiveness of modern transformer architectures stems significantly from their development of optimal multi-resolution decompositions that naturally address the theoretical constraints of position encoding.
arXiv:2411.03177v2 Announce Type: replace Abstract: Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, the key components of the best performing LDM training recipes are oftentimes not available to the research community, preventing apple-to-apple comparisons and hindering the validation of progress in the field. In this work, we perform an in-depth study of LDM training recipes focusing on the performance of models and their training efficiency. To ensure apple-to-apple comparisons, we re-implement five previously published models with their corresponding recipes. Through our study, we explore the effects of (i)~the mechanisms used to condition the generative model on semantic information (e.g., text prompt) and control metadata (e.g., crop size, random flip flag, etc.) on the model performance, and (ii)~the transfer of the representations learned on smaller and lower-resolution datasets to larger ones on the training efficiency and model performance. We then propose a novel conditioning mechanism that disentangles semantic and control metadata conditionings and sets a new state-of-the-art in class-conditional generation on the ImageNet-1k dataset -- with FID improvements of 7% on 256 and 8% on 512 resolutions -- as well as text-to-image generation on the CC12M dataset -- with FID improvements of 8% on 256 and 23% on 512 resolution.
arXiv:2410.22658v2 Announce Type: replace Abstract: Continual Imitation Learning (CiL) involves extracting and accumulating task knowledge from demonstrations across multiple stages and tasks to achieve a multi-task policy. With recent advancements in foundation models, there has been a growing interest in adapter-based CiL approaches, where adapters are established parameter-efficiently for tasks newly demonstrated. While these approaches isolate parameters for specific tasks and tend to mitigate catastrophic forgetting, they limit knowledge sharing among different demonstrations. We introduce IsCiL, an adapter-based CiL framework that addresses this limitation of knowledge sharing by incrementally learning shareable skills from different demonstrations, thus enabling sample-efficient task adaptation using the skills particularly in non-stationary CiL environments. In IsCiL, demonstrations are mapped into the state embedding space, where proper skills can be retrieved upon input states through prototype-based memory. These retrievable skills are incrementally learned on their corresponding adapters. Our CiL experiments with complex tasks in Franka-Kitchen and Meta-World demonstrate robust performance of IsCiL in both task adaptation and sample-efficiency. We also show a simple extension of IsCiL for task unlearning scenarios.
arXiv:2410.23142v2 Announce Type: replace Abstract: Deep neural networks are susceptible to adversarial attacks and common corruptions, which undermine their robustness. In order to enhance model resilience against such challenges, Adversarial Training (AT) has emerged as a prominent solution. Nevertheless, adversarial robustness is often attained at the expense of model fairness during AT, i.e., disparity in class-wise robustness of the model. While distinctive classes become more robust towards such adversaries, hard to detect classes suffer. Recently, research has focused on improving model fairness specifically for perturbed images, overlooking the accuracy of the most likely non-perturbed data. Additionally, despite their robustness against the adversaries encountered during model training, state-of-the-art adversarial trained models have difficulty maintaining robustness and fairness when confronted with diverse adversarial threats or common corruptions. In this work, we address the above concerns by introducing a novel approach called Fair Targeted Adversarial Training (FAIR-TAT). We show that using targeted adversarial attacks for adversarial training (instead of untargeted attacks) can allow for more favorable trade-offs with respect to adversarial fairness. Empirical results validate the efficacy of our approach.