-
cs.LG cs.AI stat.ML
Selective Credit Assignment
Authors: Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt
Abstract: Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings. We describe a unified view on temporal-difference algorithms for selective credit assignment. These selective algorithms apply weightings to quantify the contribution of learning updates. We present insights into applying weightings to value-based learning and planning algorithms, and describe their role in mediating the backward credit distribution in prediction and control. Within this space, we identify some existing online learning algorithms that can assign credit selectively as special cases, as well as add new algorithms that assign credit backward in time counterfactually, allowing credit to be assigned off-trajectory and off-policy. △ Less
Submitted 19 February, 2022; originally announced February 2022.
-
cs.LG cs.AI stat.ML
Chaining Value Functions for Off-Policy Learning
Authors: Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
Abstract: To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or because the experience was generated out of its own control. However, off-policy learning is non-trivial, and standard reinforcement-learning algorithms can be unstable and divergent. In this paper we discuss a novel family of off-policy prediction algorithms which are convergent by construction. The idea is to first learn on-policy about the data-generating behaviour, and then bootstrap an off-policy value estimate on this on-policy estimate, thereby constructing a value estimate that is partially off-policy. This process can be repeated to build a chain of value functions, each time bootstrapping a new estimate on the previous estimate in the chain. Each step in the chain is stable and hence the complete algorithm is guaranteed to be stable. Under mild conditions this comes arbitrarily close to the off-policy TD solution when we increase the length of the chain. Hence it can compute the solution even in cases where off-policy TD diverges. We prove that the proposed scheme is convergent and corresponds to an iterative decomposition of the inverse key matrix. Furthermore it can be interpreted as estimating a novel objective -- that we call a `k-step expedition' -- of following the target policy for finitely many steps before continuing indefinitely with the behaviour policy. Empirically we evaluate the idea on challenging MDPs such as Baird's counter example and observe favourable results. △ Less
Submitted 2 February, 2022; v1 submitted 17 January, 2022; originally announced January 2022.
Quantifying agent impacts on contact sequences in social interactions
Human social behavior plays a crucial role in how pathogens like SARS-CoV-2 or fake news spread in a population. Social interactions determine the contact network among individuals, while spreading, requiring individual-to-individual transmission, takes place on top of the network. Studying the topological aspects of a contact network, therefore, not only has the potential of leading to valuable insights into how the behavior of individuals impacts spreading phenomena, but it may also open up possibilities for devising effective behavioral interventions. Because of the temporal nature of interactions - since the topology of the network, containing who is in contact with whom, when, for how long, and in which precise sequence, varies (rapidly) in time - analyzing them requires developing network methods and metrics that respect temporal variability, in contrast to those developed for static (i.e., time-invariant) networks. Here, by means of event mapping, we propose a method to quantify how quickly agents mingle by transforming temporal network data of agent contacts. We define a novel measure called 'contact sequence centrality', which quantifies the impact of an individual on the contact sequences, reflecting the individual's behavioral potential for spreading. Comparing contact sequence centrality across agents allows for ranking the impact of agents and identifying potential 'behavioral super-spreaders'. The method is applied to social interaction data collected at an art fair in Amsterdam. We relate the measure to the existing network metrics, both temporal and static, and find that (mostly at longer time scales) traditional metrics lose their resemblance to contact sequence centrality. Our work highlights the importance of accounting for the sequential nature of contacts when analyzing social interactions.
https://arxiv.org/abs/2107.01443
Using a Cognitive Network Model of Moral and Social Beliefs to Explain Belief Change
Scepticism towards childhood vaccines and genetically modified food has grown despite scientific evidence of their safety. Beliefs about scientific issues are difficult to change because they are entrenched within many related moral concerns and beliefs about what others think. We propose a cognitive network model which estimates the relationships, dissonance, and randomness between all related beliefs to derive predictions of the circumstances under which beliefs change. Using a probabilistic nationally representative longitudinal study, we found support for our model's predictions: Randomness of the belief networks decreased over time, for many participants their estimated dissonance related positively to their self-reported dissonance, and individuals who had high estimated dissonance of their belief network were more likely to change their beliefs to reduce this dissonance. This study is the first to combine a unifying predictive model with an experimental intervention and sheds light on dynamics of dissonance reduction leading to belief change.
https://arxiv.org/abs/2102.10751
A Network Perspective on Attitude Strength: Testing the Connectivity Hypothesis
Attitude strength is a key characteristic of attitudes. Strong attitudes are durable and impactful, while weak attitudes are fluctuating and inconsequential. Recently, the Causal Attitude Network (CAN) model was proposed as a comprehensive measurement model of attitudes, which conceptualizes attitudes as networks of causally connected evaluative reactions (i.e., beliefs, feelings, and behavior toward an attitude object). Here, we test the central postulate of the CAN model that highly connected attitude networks correspond to strong attitudes. We use data from the American National Election Studies 1980-2012 on attitudes toward presidential candidates (total n = 18,795). We first show that political interest predicts connectivity of attitude networks toward presidential candidates. Second, we show that connectivity is strongly related to two defining features of strong attitudes - stability of the attitude and the attitude's impact on behavior. We conclude that network theory provides a promising framework to advance the understanding of attitude strength.
Network Structure Explains the Impact of Attitudes on Voting Decisions
Attitudes can have a profound impact on socially relevant behaviours, such as voting. However, this effect is not uniform across situations or individuals, and it is at present difficult to predict whether attitudes will predict behaviour in any given circumstance. Using a network model, we demonstrate that (a) more strongly connected attitude networks have a stronger impact on behaviour, and (b) within any given attitude network, the most central attitude elements have the strongest impact. We test these hypotheses using data on voting and attitudes toward presidential candidates in the US presidential elections from 1980 to 2012. These analyses confirm that the predictive value of attitude networks depends almost entirely on their level of connectivity, with more central attitude elements having stronger impact. The impact of attitudes on voting behaviour can thus be reliably determined before elections take place by using network analyses.
https://arxiv.org/abs/1704.00910
ECE/CS 598 AM: CRYPTOGRAPHY WITH IDEAL FUNCTIONALITIES

Spring 2022
Times: Tu Th 9:30am – 10:50am
Location: 2015 Electrical & Computer Eng Bldg (Jan 25 and later)
Zoom link for synchronous lectures (see Piazza)
Piazza: piazza.com/illinois/spring2022/ece598am
Instructor: Andrew Miller Office hours: Thursdays 3pm-4pm, and by appointment
Teaching Assistant (unofficial): Surya Bakshi Office hours: TBD
The Ideal Functionalities model (or “Universal Composability” (UC)) is considered the gold standard for defining security in many cryptographic tasks, such as multiparty computation and zero knowledge proofs.
It can be considered a unification of property-based definition styles, where instead of describing one property at a time (i.e., one game for confidentiality, one game for integrity, and so on), we give a concrete instance of an idealized program that exhibits all these properties at once. While UC is broadly adopted in cryptography, it has yet to gain traction elsewhere in software engineering and in distributed systems.
The aim of this course is to explore the connections between UC in cryptography versus in other domains like fault tolerant systems, and to see what UC can offer to software engineers concerned with implementing large systems and not just modelling small primitives.
The course will give a self contained introduction to UC, making use of our research software prototypes, Haskell-SaUCy and Python-SaUCy, which are programming frameworks that implement UC. We’ll then survey the UC-based cryptography literature for a range of cryptographic tasks, including well known applications like key exchange and multiparty computation, as well as more challenging cases like non-interactive primitives and smart contract blockchain protocols. Using the software frameworks as a secret weapon, we’ll try to improve on and simplify prior UC proofs.
The course is built on recent research efforts to provide software tools for UC, ILC (PLDI’19) and is supported by NSF grants #1801321 “Automated Support for Writing High-Assurance Smart Contracts” and #1943499 “CAREER: Composable Programming Abstractions for Secure Distributed Computing and Blockchain Applications.”
Prerequisites:
It is not necessary to have background knowledge of Ideal Functionalities and UC. However, some mathematical maturity and familiarity with cryptography is expected, such as experience writing traditional game-based security proofs.
Reference texts (all free available online):
- Security and Composition of Cryptographic Protocols: A Tutorial (Canetti 2006). https://eprint.iacr.org/2006/465
- Pragmatic MPC http://securecomputation.org/
- Python-SaUCy https://github.com/amiller/ece598-uc-contracts
- Course slides and course notes from Canetti 2004 http://courses.csail.mit.edu/6.897/spring04/materials.html
Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
Large Language Models are Zero-Shot Reasoners
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding ``Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.
Facing Up to the Problem of Consciousness
David J. Chalmers
Philosophy Program
Research School of Social Sciences
Australian National University
Introduction
Consciousness poses the most baffling problems in the science of the mind. There is nothing that we know more intimately than conscious experience, but there is nothing that is harder to explain. All sorts of mental phenomena have yielded to scientific investigation in recent years, but consciousness has stubbornly resisted. Many have tried to explain it, but the explanations always seem to fall short of the target. Some have been led to suppose that the problem is intractable, and that no good explanation can be given.
To make progress on the problem of consciousness, we have to confront it directly. In this paper, I first isolate the truly hard part of the problem, separating it from more tractable parts and giving an account of why it is so difficult to explain. I critique some recent work that uses reductive methods to address consciousness, and argue that these methods inevitably fail to come to grips with the hardest part of the problem. Once this failure is recognized, the door to further progress is opened. In the second half of the paper, I argue that if we move to a new kind of nonreductive explanation, a naturalistic account of consciousness can be given.
I put forward my own candidate for such an account: a nonreductive theory based on principles of structural coherence and organizational invariance and a double-aspect view of information.
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
Abstract
A wide range of reinforcement learning (RL) problems — including robustness,
transfer learning, unsupervised RL, and emergent complexity — require specifying a distribution of tasks or environments in which a policy will be trained.
However, creating a useful distribution of environments is error prone, and takes
a significant amount of developer time and effort. We propose Unsupervised
Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to
automatically produce a distribution over valid, solvable environments. Existing
approaches to automatically generating environments suffer from common failure
modes: domain randomization cannot generate structure or adapt the difficulty of
the environment to the agent’s learning progress, and minimax adversarial training
leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second,
antagonist agent that is allied with the environment-generating adversary. The
adversary is motivated to generate environments which maximize regret, defined as
the difference between the protagonist and antagonist agent’s return. We call our
technique Protagonist Antagonist Induced Regret Environment Design (PAIRED).
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot
transfer performance when tested in highly novel environments.