Software

Influence Theory

Authors: Jillian Fisher

This repository contains code which computes the influence function and most influential subsets for two transformer language model experiments. We also provide the code which computes the influence function for two linear models with both synthetic and economic data.

Authorship Obfuscation

Authors: Jillian Fisher

This repository contains code to implement JAMDEC, a lightweight, user-controlled, unsupervised inference-time algorithm for authorship obfuscation that can be applied to any arbitrary text. We provide code which demonstrates JAMDEC on two obfuscation datasets comprised of academic articles (AMT) and diary-style writings (BLOG).

Pluralistic Alignment

Authors: Jillian Fisher

This repository contains code for the experimentation in “Roadmap to Pluralistic Alignment”. Specifically, it explores the hypothesis that that current LLM alignment techniques reduce distributional pluralism w.r.t. the diverse populations. To test the extent to which our hypothesis holds, we test a suite of vanilla pretrained LLMs compared to their partner “aligned” (RLHFed, finetuned) LLMs on two diverse multiple choices datasets: GlobalOpinionQA (Durmus et al., 2023a) and the Machine Personality Inventory (MPI) (Jiang et al., 2022a). We then compare model distributions, averaged over 5 samples, to the human target distributions using Jensen-Shannon distance.

StyleRemix: Authorship Obfuscation Method

Authors: Jillian Fisher

This repository contains code to implement StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. We provide code which demonstrates StyleRemix on four datasets: presidential speeches, fiction writing, academic articles, and diary-style writings.