Nash, Conley, and Computation: Impossibility and Incompleteness in Game Dynamics

Jason Milionis Columbia University jm@cs.columbia.edu Christos Papadimitriou Columbia University christos@columbia.edu Georgios Piliouras SUTD georgios@sutd.edu.sg Kelly Spendlove University of Oxford spendlove@maths.ox.ac.uk

Abstract

Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players’ behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular, the concepts of chain recurrence, attractors, and Conley index, to prove a general impossibility result: there exist games for which any dynamics is bound to have starting points that do not end up at a Nash equilibrium. We also prove a stronger result for 𝜖-approximate Nash equilibria: there are games such that no game dynamics can converge (in an appropriate sense) to 𝜖-Nash equilibria, and in fact, the set of such games has a positive measure. Further numerical results demonstrate that this holds for any 𝜖 between zero and 0.09. Our results establish that, although the notions of Nash equilibria (and their computation-inspired approximations) are universally applicable in all games, they are also fundamentally incomplete as predictors of long-term behavior, regardless of the choice of dynamics.

Two RL papers

arXiv:2202.09699 [pdf, other]

cs.LG cs.AI stat.ML

Selective Credit Assignment

Authors: Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt

Abstract: Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings. We describe a unified view on temporal-difference algorithms for selective credit assignment. These selective algorithms apply weightings to quantify the contribution of learning updates. We present insights into applying weightings to value-based learning and planning algorithms, and describe their role in mediating the backward credit distribution in prediction and control. Within this space, we identify some existing online learning algorithms that can assign credit selectively as special cases, as well as add new algorithms that assign credit backward in time counterfactually, allowing credit to be assigned off-trajectory and off-policy. △ Less

Submitted 19 February, 2022; originally announced February 2022.
arXiv:2201.06468 [pdf, other]

cs.LG cs.AI stat.ML

Chaining Value Functions for Off-Policy Learning

Authors: Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

Abstract: To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or because the experience was generated out of its own control. However, off-policy learning is non-trivial, and standard reinforcement-learning algorithms can be unstable and divergent. In this paper we discuss a novel family of off-policy prediction algorithms which are convergent by construction. The idea is to first learn on-policy about the data-generating behaviour, and then bootstrap an off-policy value estimate on this on-policy estimate, thereby constructing a value estimate that is partially off-policy. This process can be repeated to build a chain of value functions, each time bootstrapping a new estimate on the previous estimate in the chain. Each step in the chain is stable and hence the complete algorithm is guaranteed to be stable. Under mild conditions this comes arbitrarily close to the off-policy TD solution when we increase the length of the chain. Hence it can compute the solution even in cases where off-policy TD diverges. We prove that the proposed scheme is convergent and corresponds to an iterative decomposition of the inverse key matrix. Furthermore it can be interpreted as estimating a novel objective -- that we call a `k-step expedition' -- of following the target policy for finitely many steps before continuing indefinitely with the behaviour policy. Empirically we evaluate the idea on challenging MDPs such as Baird's counter example and observe favourable results. △ Less

Submitted 2 February, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Quantifying agent impacts on contact sequences in social interactions

Mark M. Dekker, Tessa F. Blanken, Fabian Dablander, Jiamin Ou, Denny Borsboom, Debabrata Panja

Human social behavior plays a crucial role in how pathogens like SARS-CoV-2 or fake news spread in a population. Social interactions determine the contact network among individuals, while spreading, requiring individual-to-individual transmission, takes place on top of the network. Studying the topological aspects of a contact network, therefore, not only has the potential of leading to valuable insights into how the behavior of individuals impacts spreading phenomena, but it may also open up possibilities for devising effective behavioral interventions. Because of the temporal nature of interactions - since the topology of the network, containing who is in contact with whom, when, for how long, and in which precise sequence, varies (rapidly) in time - analyzing them requires developing network methods and metrics that respect temporal variability, in contrast to those developed for static (i.e., time-invariant) networks. Here, by means of event mapping, we propose a method to quantify how quickly agents mingle by transforming temporal network data of agent contacts. We define a novel measure called 'contact sequence centrality', which quantifies the impact of an individual on the contact sequences, reflecting the individual's behavioral potential for spreading. Comparing contact sequence centrality across agents allows for ranking the impact of agents and identifying potential 'behavioral super-spreaders'. The method is applied to social interaction data collected at an art fair in Amsterdam. We relate the measure to the existing network metrics, both temporal and static, and find that (mostly at longer time scales) traditional metrics lose their resemblance to contact sequence centrality. Our work highlights the importance of accounting for the sequential nature of contacts when analyzing social interactions.

https://arxiv.org/abs/2107.01443

Using a Cognitive Network Model of Moral and Social Beliefs to Explain Belief Change

Jonas Dalege, Tamara van der Does

Scepticism towards childhood vaccines and genetically modified food has grown despite scientific evidence of their safety. Beliefs about scientific issues are difficult to change because they are entrenched within many related moral concerns and beliefs about what others think. We propose a cognitive network model which estimates the relationships, dissonance, and randomness between all related beliefs to derive predictions of the circumstances under which beliefs change. Using a probabilistic nationally representative longitudinal study, we found support for our model's predictions: Randomness of the belief networks decreased over time, for many participants their estimated dissonance related positively to their self-reported dissonance, and individuals who had high estimated dissonance of their belief network were more likely to change their beliefs to reduce this dissonance. This study is the first to combine a unifying predictive model with an experimental intervention and sheds light on dynamics of dissonance reduction leading to belief change.

https://arxiv.org/abs/2102.10751

A Network Perspective on Attitude Strength: Testing the Connectivity Hypothesis

Jonas Dalege, Denny Borsboom, Frenk van Harreveld, Han L. J. van der Maas

Attitude strength is a key characteristic of attitudes. Strong attitudes are durable and impactful, while weak attitudes are fluctuating and inconsequential. Recently, the Causal Attitude Network (CAN) model was proposed as a comprehensive measurement model of attitudes, which conceptualizes attitudes as networks of causally connected evaluative reactions (i.e., beliefs, feelings, and behavior toward an attitude object). Here, we test the central postulate of the CAN model that highly connected attitude networks correspond to strong attitudes. We use data from the American National Election Studies 1980-2012 on attitudes toward presidential candidates (total n = 18,795). We first show that political interest predicts connectivity of attitude networks toward presidential candidates. Second, we show that connectivity is strongly related to two defining features of strong attitudes - stability of the attitude and the attitude's impact on behavior. We conclude that network theory provides a promising framework to advance the understanding of attitude strength.

https://arxiv.org/abs/1705.00193

Network Structure Explains the Impact of Attitudes on Voting Decisions

Jonas Dalege, Denny Borsboom, Frenk van Harreveld, Lourens J. Waldorp, Han L. J. van der Maas

Attitudes can have a profound impact on socially relevant behaviours, such as voting. However, this effect is not uniform across situations or individuals, and it is at present difficult to predict whether attitudes will predict behaviour in any given circumstance. Using a network model, we demonstrate that (a) more strongly connected attitude networks have a stronger impact on behaviour, and (b) within any given attitude network, the most central attitude elements have the strongest impact. We test these hypotheses using data on voting and attitudes toward presidential candidates in the US presidential elections from 1980 to 2012. These analyses confirm that the predictive value of attitude networks depends almost entirely on their level of connectivity, with more central attitude elements having stronger impact. The impact of attitudes on voting behaviour can thus be reliably determined before elections take place by using network analyses.

https://arxiv.org/abs/1704.00910

CRYPTOGRAPHY WITH IDEAL FUNCTIONALITIES

ECE/CS 598 AM: CRYPTOGRAPHY WITH IDEAL FUNCTIONALITIES

Spring 2022
Times: Tu Th 9:30am – 10:50am
Location: 2015 Electrical & Computer Eng Bldg (Jan 25 and later)
Zoom link for synchronous lectures (see Piazza)
Piazza: piazza.com/illinois/spring2022/ece598am

Instructor: Andrew Miller Office hours: Thursdays 3pm-4pm, and by appointment

Teaching Assistant (unofficial): Surya Bakshi Office hours: TBD

The Ideal Functionalities model (or “Universal Composability” (UC)) is considered the gold standard for defining security in many cryptographic tasks, such as multiparty computation and zero knowledge proofs.

It can be considered a unification of property-based definition styles, where instead of describing one property at a time (i.e., one game for confidentiality, one game for integrity, and so on), we give a concrete instance of an idealized program that exhibits all these properties at once. While UC is broadly adopted in cryptography, it has yet to gain traction elsewhere in software engineering and in distributed systems.

The aim of this course is to explore the connections between UC in cryptography versus in other domains like fault tolerant systems, and to see what UC can offer to software engineers concerned with implementing large systems and not just modelling small primitives.

The course will give a self contained introduction to UC, making use of our research software prototypes, Haskell-SaUCy and Python-SaUCy, which are programming frameworks that implement UC. We’ll then survey the UC-based cryptography literature for a range of cryptographic tasks, including well known applications like key exchange and multiparty computation, as well as more challenging cases like non-interactive primitives and smart contract blockchain protocols. Using the software frameworks as a secret weapon, we’ll try to improve on and simplify prior UC proofs.

The course is built on recent research efforts to provide software tools for UC, ILC (PLDI’19) and is supported by NSF grants #1801321 “Automated Support for Writing High-Assurance Smart Contracts” and #1943499 “CAREER: Composable Programming Abstractions for Secure Distributed Computing and Blockchain Applications.”

Prerequisites:

It is not necessary to have background knowledge of Ideal Functionalities and UC. However, some mathematical maturity and familiarity with cryptography is expected, such as experience writing traditional game-based security proofs.

Reference texts (all free available online):

Security and Composition of Cryptographic Protocols: A Tutorial (Canetti 2006). https://eprint.iacr.org/2006/465
Pragmatic MPC http://securecomputation.org/
Python-SaUCy https://github.com/amiller/ece598-uc-contracts
Course slides and course notes from Canetti 2004 http://courses.csail.mit.edu/6.897/spring04/materials.html

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Michael Bradley Johanson, Edward Hughes, Finbarr Timbers, Joel Z. Leibo

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

Large Language Models are Zero-Shot Reasoners

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding ``Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.

Facing Up to the Problem of Consciousness

David J. Chalmers
Philosophy Program
Research School of Social Sciences Australian National University

Introduction

Consciousness poses the most baffling problems in the science of the mind. There is nothing that we know more intimately than conscious experience, but there is nothing that is harder to explain. All sorts of mental phenomena have yielded to scientific investigation in recent years, but consciousness has stubbornly resisted. Many have tried to explain it, but the explanations always seem to fall short of the target. Some have been led to suppose that the problem is intractable, and that no good explanation can be given.

To make progress on the problem of consciousness, we have to confront it directly. In this paper, I first isolate the truly hard part of the problem, separating it from more tractable parts and giving an account of why it is so difficult to explain. I critique some recent work that uses reductive methods to address consciousness, and argue that these methods inevitably fail to come to grips with the hardest part of the problem. Once this failure is recognized, the door to further progress is opened. In the second half of the paper, I argue that if we move to a new kind of nonreductive explanation, a naturalistic account of consciousness can be given.

I put forward my own candidate for such an account: a nonreductive theory based on principles of structural coherence and organizational invariance and a double-aspect view of information.

朱小虎 Xiaohu Zhu

一切为了我们世界的长远未来 - All about the long-term future of our world.

Nash, Conley, and Computation: Impossibility and Incompleteness in Game Dynamics

Abstract

Two RL papers

Quantifying agent impacts on contact sequences in social interactions

Quantifying agent impacts on contact sequences in social interactions

Using a Cognitive Network Model of Moral and Social Beliefs to Explain Belief Change

Using a Cognitive Network Model of Moral and Social Beliefs to Explain Belief Change

A Network Perspective on Attitude Strength: Testing the Connectivity Hypothesis

A Network Perspective on Attitude Strength: Testing the Connectivity Hypothesis

Network Structure Explains the Impact of Attitudes on Voting Decisions

Network Structure Explains the Impact of Attitudes on Voting Decisions

CRYPTOGRAPHY WITH IDEAL FUNCTIONALITIES

ECE/CS 598 AM: CRYPTOGRAPHY WITH IDEAL FUNCTIONALITIES

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Large Language Models are Zero-Shot Reasoners

Large Language Models are Zero-Shot Reasoners

Facing Up to the Problem of Consciousness

Facing Up to the Problem of Consciousness