Prediction-powered Generalization of Causal Inferences

Causal inferences from a randomized controlled trial (RCT) may not pertain to a target population where some effect modifiers have a different distribution. Prior work studies generalizing the results of a trial to a target population with no outcome but covariate data available. We show how the limited size of trials makes generalization a statistically infeasible task, as it requires estimating complex nuisance functions. We develop generalization algorithms that supplement the trial data with a prediction model learned from an additional observational study (OS), without making any assumptions on the OS. We theoretically and empirically show that our methods facilitate better generalization when the OS is "high-quality", and remain robust when it is not, and e.g., have unmeasured confounding.

Contributors: Ilker Demirel, Ahmed Alaa, Anthony Philippakis Learn more

Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized

Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by offering a set of stochastic procedures that more adequately account for all of the claims individuals have to allocations of social goods or opportunities and effectively balances their interests.

Contributors: Shomik Jain, Kathleen Creel Learn more

Mean-field Underdamped Langevin Dynamics and its Spacetime Discretization

We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discretization of the mean-field underdamped Langevin dynamics, for which we provide a new, fast mixing guarantee. In addition, we demonstrate that our algorithm converges globally in total variation distance, bridging the theoretical gap between the dynamics and its practical implementation.

Contributor: Qiang Fu Learn more

Measuring Stochastic Data Complexity with Boltzmann Influence Functions

Estimating the uncertainty of a model’s prediction on a test point is a crucial part of ensuring reliability and calibration under distribution shifts.A minimum description length approach to this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point, and decreases confidence in a prediction if other labels are also consistent with the model and training data. In this work we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as measure complexity in both labelled and unlabelled settings. We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently matches or beats strong baseline methods.

Contributors: Nathan Hoyen Ng, Roger Baker Grosse Learn more

Learning Optimal Projection for Forecast Reconciliation of Hierarchical Time Series

Hierarchical time series forecasting requires not only prediction accuracy but also coherency, i.e., forecasts add up appropriately across the hierarchy. Recent literature has shown that reconciliation via projection outperforms prior methods such as top-down or bottom-up approaches. Unlike existing work that pre-specifies a projection matrix (e.g., orthogonal), we study the problem of learning the optimal oblique projection from data for coherent forecasting of hierarchical time series. In addition to the unbiasedness-preserving property, oblique projection implicitly accounts for the hierarchy structure and assigns different weights to individual time series, providing significant adaptability over orthogonal projection which treats base forecast errors equally. We examine two broad classes of projections, namely Euclidean projection and general oblique projections. We propose to model the reconciliation step as a learnable, structured, projection layer in the neural forecaster architecture. The proposed approach allows for the efficient learning of the optimal projection in an end-to-end framework where both the neural forecaster and the projection layer are learned simultaneously. An empirical evaluation of real-world hierarchical time series datasets demonstrates the superior performance of the proposed method over existing state-of-the-art approaches.

Contributors:Asterios Tsiourvas, Wei Sun, Georgia Perakis, Pin-Yu Chen, Yada Zhu Learn more

Overcoming the Optimizer’s Curse: Obtaining Realistic Prescriptions from Neural Networks

We study the problem of obtaining optimal and realistic prescriptions when using ReLU networks for data-driven decision-making. In this setting, the network is used to predict a quantity of interest and then is optimized to retrieve the decisions that maximize the quantity (e.g. find the best prices that maximize revenue). However, optimizing over-parameterized models often produces unrealistic prescriptions, far from the data manifold. This phenomenon is known as the Optimizer's Curse. To tackle this problem, we model the requirement for the resulting decisions to align with the data manifold as a tractable optimization constraint. This is achieved by reformulating the highly nonlinear Local Outlier Factor (LOF) metric as a single linear or quadratic constraint. To solve the problem efficiently for large networks, we propose an adaptive sampling algorithm that reduces the initial hard-to-solve optimization problem into a small number of significantly easier-to-solve problems by restricting the decision space to realistic polytopes, i.e. polytopes of the decision space that contain at least one realistic data point. Experiments on publicly available networks demonstrate the efficacy and scalability of our approach.

Contributor: Asterios Tsiourvas Learn more

Implicit Representations via Operator Learning

The idea of representing a signal as the weights of a neural network, called Implicit Neural Representations (INRs), has led to exciting implications for compression, view synthesis and 3D volumetric data understanding. One problem in this setting pertains to the use of INRs for downstream processing tasks. Despite some conceptual results, this remains challenging because the INR for a given image/signal often exists in isolation. What does the neighborhood around a given INR correspond to? Based on this question, we offer an operator theoretic reformulation of the INR model, which we call Operator INR (or O-INR). At a high level, instead of mapping positional encodings to a signal, O-INR maps one function space to another function space. A practical form of this general casting is obtained by appealing to Integral Transforms. The resultant model does not need multi-layer perceptrons (MLPs), used in most existing INR models – we show that convolutions are sufficient and offer benefits including numerically stable behavior. We show that O-INR can easily handle most problem settings in the literature, and offers a similar performance profile as baselines. These benefits come with minimal, if any, compromise. Our code is available at https://github.com/vsingh-group/oinr.

Contributors: Sourav Pal, Harshavardhan Adepu, Clinton Wang, Vikas Singh Learn more

TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement was published in 2015 to provide the minimum reporting recommendations for studies developing or evaluating the performance of a prediction model. Methodological advances in the field of prediction have since included the widespread use of artificial intelligence (AI) powered by machine learning methods to develop prediction models. An update to the TRIPOD statement is thus needed. TRIPOD+AI provides harmonised guidance for reporting prediction model studies, irrespective of whether regression modelling or machine learning methods have been used. The new checklist supersedes the TRIPOD 2015 checklist, which should no longer be used. This article describes the development of TRIPOD+AI and presents the expanded 27 item checklist with more detailed explanation of each reporting recommendation, and the TRIPOD+AI for Abstracts checklist. TRIPOD+AI aims to promote the complete, accurate, and transparent reporting of studies that develop a prediction model or evaluate its performance. Complete reporting will facilitate study appraisal, model evaluation, and model implementation.

Contributors: Gary S Collins, Karel G M Moons, Paula Dhiman, Richard D Riley, Andrew L Beam, Ben Van Calster, Xiaoxuan Liu, Johannes B Reitsma, Maarten van Smeden, Anne-Laure Boulesteix, Jennifer Catherine Camaradou, Leo Anthony Celi, Spiros Denaxas, Alastair K Denniston, Ben Glocker, Robert M Golub, Hugh Harvey, Georg Heinze, Michael M Hoffman, André Pascal Kengne, Emily Lam, Naomi Lee, Elizabeth W Loder, Lena Maier-Hein, Bilal A Mateen, Melissa D McCradden, Lauren Oakden-Rayner, Johan Ordish, Richard Parnell, Sherri Rose, Karandeep Singh, Laure Wynants, Patricia Logullo Learn more

Asymmetry in Low-Rank Adapters of Foundation Models

Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product BA, we observe that the B and A matrices have distinct functions: A extracts features from the input, while B uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning B is inherently more effective than fine-tuning A, and that a random untrained A should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound the generalization of low-rank adapters, showing that the parameter savings of exclusively training B improves the bound. We support our conclusions with experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.

Contributors: Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Mikhail Yurochkin, Justin Solomon Learn more

Integrating Technology into Undergraduate Medical Education: Can Affective Computing Help Teach Empathy?

To the Editor:

Substance use disorders (SUDs) and overdose deaths continue at record levels in the USA. One major barrier to adequate treatment is the stigma attached to the condition. Evidence suggests that clinicians have more negative attitudes and less empathy toward patients with SUDs compared to other medical and mental health conditions, thereby affecting the overall quality of care these patients receive [1]. Stigma can become apparent during clinical interactions where providers may unintentionally convey negative emotions or judgments through their facial expressions.

Until recently, empathy toward this patient population was previously thought of as an inherent trait that could not be taught. However, studies in the medical literature have shown that medical trainees do have the capability to improve their empathy toward patients [2]. Given that a physician’s ability to communicate effectively is associated with better patient outcomes, it is imperative to educate future physicians about how stigma manifests in the clinical setting and the importance of empathetic communication.

A promising approach to achieving this goal is through a technology called affective computing, also called emotional artificial intelligence. Affective computing enables computers to recognize, interpret, process, and simulate human emotion. Researchers from the MIT Media Lab at the Massachusetts Institute of Technology and Weill Cornell Medical College have developed Medship, a computerized training module. Medship leverages affective computing to educate future medical providers about the stigma toward patients with SUDs. It offers interactions with virtual (i.e., computerized) patients who have a SUD, records the user in such interaction, and then simultaneously analyzes the user’s facial expressions to provide feedback on such expressions in real time. The software used was OpenFace, a lightweight, open source toolkit used for facial behavior analysis.

This project is being split into two studies. The initial study aimed to evaluate the usability and acceptability of Medship among medical students. Given the multitude of educational options currently available to medical students, their willingness to adopt the application is pivotal to its success. The second part of this project will be a randomized control trial to assess the module’s impact on decreasing negative attitudes to this patient population and will be critical in evaluating its efficacy.

The initial study of this project used a quantitative interventional design, including a cross-sectional survey following a single session of using Medship. The Institutional Review Board at Weill Cornell Medical College granted approval for the study protocol. The online link for Medship was emailed to medical students during the course of their regular education. All feedback as contained in the module. A total of 26 students at Weill Cornell Medical College participated, providing anonymous responses to demographic questions, a System Usability Scale [3] and a System Quality Scale [4]. Usability refers to the ease of using the module, while acceptability gauges students’ willingness to integrate the module into their medical curriculum.

The results from this pilot study demonstrated positive feedback. Regarding usability, all students found it easy to learn and navigate the module. Most students reported that the module was both enjoyable and user-friendly (n = 20; 77%) and found the graphics to be of high quality and resolution (n = 25; 96%). Participants assigned an average of 85 on the System Usability Scale, where a score of 73 or above indicates satisfactory usability. Regarding acceptability, each student believed that their medical institution should offer Medship as part of the educational curriculum, and a substantial portion felt that medical students would greatly benefit from using the module (n = 20; 77%). On the System Quality Scale, participants rated the module an average of 4, where a score of 3 or higher indicates satisfactory acceptability.

One limitation of Medship is its potential lack of cultural diversity in the inputs that it receives to use in its algorithms that analyze facial action units. Expression of empathy in Western culture often assumes a “one-size-fits-all” approach without taking into consideration intercultural contexts. The current version of Medship is limited from a diversity standpoint in terms of the number of inputs it has from users coming from different backgrounds, cultures, and ethnicities. Future iterations of Medship must address this to enhance external validity.

Previous research has revealed that patients with major depression perceive neutral faces as sad compared to healthy participants who interpret them as happy [5]. It raises the question of whether patients with SUDs might exhibit distinct perceptions of neutral faces, particularly in light of the common comorbidity of SUDs and mood disorders. While one approach could be controlling for these comorbidities, a more clinically valuable direction could be to develop a unique version of Medship addressing patients with SUDs and specific comorbidities.

SUDs are becoming increasingly prevalent, remain significantly undertreated, and are stigmatized by clinicians more so than other medical and psychiatric illnesses. Affective computing is gaining prominence across industries, and the field of medicine is now exploring both its safety and efficacy in enhancing patient care. Medship has the capability of improving empathetic communication between providers and their patients. The first iteration of this study has revealed positive results in terms of the technology’s usability and acceptability by medical students, and the next portion of this study will focus on assessing Medship’s efficacy as an application.

Contributors Michael Woods, Giselle Appel, Aidana Daulbayeva, Caleb Harris, Julia Iyasere, Jonathan Avery Learn more
Load More