ai论文网（爱可可AI论文推介）

与softmax置信打分不同，能量打分可证明与输入密度一致——能量较高的样本，可被解释为发生概率较低的数据，因此不太容易受到过度置信问题的影响，可显著提高检测性能。在CIFAR-10预训练WideResNet上，与softmax置信打分相比，能量打分平均FPR降低了18.03%。使用单GPU和单一环境实例，在相同计算成本和训练时间下，DreamerV2表现超过顶级的无模型单GPU智能体Rainbow和IQN。

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言

1、[LG] *Energy-based Out-of-distribution Detection

W Liu, X Wang, J D. Owens, Y Li

[University of California, San Diego & University of California, Davis & University of Wisconsin-Madison]

用基于能量打分代替softmax置信打分实现分布外检测(OOD)，核心思想是用一个非概率的能量函数，将较低值赋予分布内数据，较高值赋予分布外数据。与softmax置信打分不同，能量打分可证明与输入密度一致——能量较高的样本，可被解释为发生概率较低的数据，因此不太容易受到过度置信问题的影响，可显著提高检测性能。能量打分可从纯判别分类模型中导出，不需要显式依赖密度估计器，避免了JEM等生成式模型训练的优化过程。该框架内，能量可以灵活地用作所有预训练神经分类器的评分函数，也可作为可训练代价函数来明确地塑造能量表面，实现OOD检测。在CIFAR-10预训练WideResNet上，与softmax置信打分相比，能量打分平均FPR降低了18.03% 。

Determining whether inputs are out-of-distribution (OOD) is an essential building block for safely deploying machine learning models in the open world. However, previous methods relying on the softmax confidence score suffer from overconfident posterior distributions for OOD data. We propose a unified framework for OOD detection that uses an energy score. We show that energy scores better distinguish in- and out-of-distribution samples than the traditional approach using the softmax scores. Unlike softmax confidence scores, energy scores are theoretically aligned with the probability density of the inputs and are less susceptible to the overconfidence issue. Within this framework, energy can be flexibly used as a scoring function for any pre-trained neural classifier as well as a trainable cost function to shape the energy surface explicitly for OOD detection. On a CIFAR-10 pre-trained WideResNet, using the energy score reduces the average FPR (at TPR 95%) by 18.03% compared to the softmax confidence score. With energy-based training, our method outperforms the state-of-the-art on common benchmarks.

https://weibo.com/1402400261/JoDln8nmD

2、[LG] *Mastering Atari with Discrete World Models

D Hafner, T Lillicrap, M Norouzi, J Ba

[Google Brain & DeepMind & University of Toronto]

基于离散世界模型的强化学习智能体DreamerV2在Atari游戏上达到人类水平，DreamerV2单纯从强大世界模型紧凑潜空间预测中学习如何行动，世界模型采用离散表示，并与策略分开训练。使用单GPU和单一环境实例，在相同计算成本和训练时间下，DreamerV2表现超过顶级的无模型单GPU智能体Rainbow和IQN。

Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The world model uses discrete representations and is trained separately from the policy. DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model. With the same computational budget and wall-clock time, DreamerV2 reaches 200M frames and exceeds the final performance of the top single-GPU agents IQN and Rainbow.

https://weibo.com/1402400261/JoDuK4E6R

3、[CL] *LEGAL-BERT: The Muppets straight out of Law School

I Chalkidis, M Fergadiotis, P Malakasiotis, N Aletras, I Androutsopoulos

[Athens University of Economics and Business & University of Sheffield]

专用于法律领域的BERT模型LEGAL-BERT，聚焦法律领域，探索将BERT模型应用于下游法律任务的方法，旨在协助法律NLP研究、计算法和法律技术应用。将BERT移植到新领域的最佳策略可能不同：开箱即用原始BERT，或通过对特定领域语料库的额外预训练来适配BERT，以及对特定领域语料库从零开始对BERT进行预训练。文中指出，当针对最终任务微调BERT对性能有重大影响时，应该始终采用扩展网格搜索。

BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.

https://weibo.com/1402400261/JoDADieQM

4、[LG] Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

U Evci, Y A. Ioannou, C Keskin, Y Dauphin

[Google]

用稀疏神经网络梯度流改善初始化过程，试图回答：(1)为什么随机初始化训练非结构化稀疏网络的性能较差；(2)为什么(LT)和动态稀疏训练(DST)例外？实验发现，随机初始化的非结构化稀疏神经网络在初始化时梯度流表现不佳，并提出了一种可分别缩放每个神经元初始方差的替代初始化方法。相比传统稀疏训练方法，DST方法在训练过程中显著改善了梯度流；LT并没有改善梯度流，其成功在于重新学习所源自的剪枝解决方案。

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exception of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from - however, this comes at the cost of learning novel solutions.

https://weibo.com/1402400261/JoDHMcClb

5、[LG] Online Safety Assurance for Deep Reinforcement Learning

N H. Rotman, M Schapira, A Tamar

[Hebrew University of Jerusalem & Technion]

深度强化学习的在线安全保障。安全部署学习驱动系统，需要实时确定系统行为是否确定(操作环境与训练环境相同)，以便在不确定的情况下默认采用合理的启发式，称为在线安全保障问题(OSAP)。本文提出三种量化决策不确定性的方法，根据用来推断不确定性的信号不同而不同。当操作环境和训练环境匹配时，采用深度强化学习方法，但当两者不匹配时，用简单的启发式进行控制。

Recently, deep learning has been successfully applied to a variety of networking problems. A fundamental challenge is that when the operational environment for a learning-augmented system differs from its training environment, such systems often make badly informed decisions, leading to bad performance. We argue that safely deploying learning-driven systems requires being able to determine, in real time, whether system behavior is coherent, for the purpose of defaulting to a reasonable heuristic when this is not so. We term this the online safety assurance problem (OSAP). We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty. We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming. While deep RL for video streaming bests other approaches when the operational and training environments match, it is dominated by simple heuristics when the two differ. Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising on safety.

https://weibo.com/1402400261/JoDOCchd8