I am a Statistics + CSE PhD student at the University of Washington researching AI alignment, safety, and societal impact.
Advised by Yejin Choi in the Paul G. Allen School of Computer Science and Engineering and Thomas Richardson in the Statistics Department, my focus in on projects that leverage statistical tools to advance methods and insights in human-centric NLP challenges. Currently, my work centers on alignment, controllable generations, and impact of AI on society.
Jillian Fisher
Preprints
Fisher, J., Feng, S., Aron, R., Richardson, T., Choi Y., Fisher, D., Pan J., Tsvetkov, Y., & Reinecke, K (2024) . Biased AI can Influence Political Decision-Making. arxiv.org/abs/2410.06415.
@unpublished{bias_llm",
title = {Biased AI can Influence Political Decision-Making},
author = {Jillian Fisher, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, Katharina Reinecke},
year = {2023},
note = {https://arxiv.org/abs/2410.06415},
file = {bias_llm.pdf}
}
As modern AI models become integral to everyday tasks, concerns about their inherent biases and their potential impact on human decision-making have emerged. While bias in models are well-documented, less is known about how these biases influence human decisions. This paper presents two interactive experiments investigating the effects of partisan bias in AI language models on political decision-making. Participants interacted freely with either a biased liberal, conservative, or unbiased control model while completing political decision-making tasks. We found that participants exposed to politically biased models were significantly more likely to adopt opinions and make decisions aligning with the AI’s bias, regardless of their personal political partisanship. However, we also discovered that prior knowledge about AI could lessen the impact of the bias, highlighting the possible importance of AI education for robust bias mitigation. Our findings not only highlight the critical effects of interacting with biased AI and its ability to impact public discourse and political conduct, but also highlights potential techniques for mitigating these risks in the future.
Conference Proceedings
Fisher, J., Hallinan, S., Lu, X., Gordon, M., Harchaoui, Z., & Choi, Y. StyleRemix: Interpertable Authorship Obfuscation via Distillation and Perturbation of Style Elements. EMNLP (2024). http://www.arxiv.org/abs/2408.15666
@unpublished{styleremix,
title = {StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements},
author = {Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell Gordon, Zaid Harchaoui, Yejin Choi},
journal = {EMNLP}
year = {2024},
note = {http://www.arxiv.org/abs/2408.15666},
file = {styleremix.pdf}
}
Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall.
To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. StyleRemix outperforms state-of-the-art baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation.
Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions.
Feng, S., Sorensen, Taylor., Liu, Y., Fisher, J., Young Park, C., Choi, Y., & Tsvetkov, Y. Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. EMNLP (2024). https://arxiv.org/abs/2406.15951.
@unpublished{modular_pluralism,
title = {Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration},
author = {Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, Yulia Tsvetkov},
journal = {EMNLP},
year = {2024},
note = {https://arxiv.org/abs/2406.15951},
file = {modular_pluralism.pdf}
}
While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but specialized community LMs, where models collaborate in distinct modes to flexibility support three modes of pluralism: Overton, steerable, and distributional. Modular Pluralism is uniquely compatible with black-box LLMs and offers the modular control of adding new community LMs for previously underrepresented communities. We evaluate Modular Pluralism with six tasks and four datasets featuring questions/instructions with value-laden and perspective-informed responses. Extensive experiments demonstrate that Modular Pluralism advances the three pluralism objectives across six black-box and open-source LLMs. Further analysis reveals that LLMs are generally faithful to the inputs from smaller community LLMs, allowing seamless patching by adding a new community LM to better cover previously underrepresented communities.
Sorensen, T., Moore, Jared., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., & Choi, Y. A Roadmap to Pluralistic Alignment. ICML (2024). https://arxiv.org/abs/2402.05070
@unpublished{pluralism_roadmap,
title = {A Roadmap to Pluralistic Alignment},
author = {Sorensen, Taylor and Moore, Jared and Fisher, Jillian and Gordon, Mitchell and Mireshghallah, Niloofar and Rytting, Christopher Michael and Ye, Andre and Jiang, Liwei and Lu, Ximing and Dziri, Nouha and Althoff, Tim and Choi, Yejin},
journal = {ICML},
year = {2024},
note = {https://arxiv.org/abs/2402.05070},
file = {roadmap_to_pluralistic_alignment.pdf}
}
With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution. We also propose and formalize three possible classes of pluralistic benchmarks: 1) Multi-objective benchmarks, 2) Trade-off steerable benchmarks, which incentivize models to steer to arbitrary trade-offs, and 3) Jury-pluralistic benchmarks which explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures might reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment.
Fisher, J., Lu, X., Jung, J., Jiang, L., & Choi, Y. JAMDEC: Unsupervised Authorship Obfuscation using Contrained Decoding over Small Language Models. NAACL (2024). https://arxiv.org/abs/2402.08761.
@unpublished{author_obf,
author = {Fisher, Jillian and Lu, Ximing and Jung, Jaehun and Jiang, Liwei and Choi, Yejin},
title = {JAMDEC: Unsupervised Authorship Obfuscation using Contrained Decoding over Small Language Models},
journal = {NAACL},
note = {https://arxiv.org/abs/2402.08761},
file = {JAMDEC.pdf},
year = {2024}
}
The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums. In this paper, we propose an unsupervised inference-time approach to authorship obfuscation to address the unique challenges of authorship obfuscation: lack of supervision data for diverse authorship and domains, and the need for a sufficient level of revision beyond simple paraphrasing to obfuscate the authorship, all the while preserving the original content and fluency. We introduce MASQUERADE Decoding, a user-controlled, inference-time algorithm for authorship obfuscation that can be in principle applied to any text and authorship. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM’s APIs, while also reducing the performance gap between small and large language models via algorithmic enhancement. The key idea behind our approach is to boost the creative power of smaller language models through constrained decoding, while also allowing for user-specified controls and flexibility. Experimental results demonstrate that our approach based on GPT2-XL outperforms previous state-of-the-art methods based on comparably small models, while performing competitively against GPT3 175B, a propriety model that is two orders of magnitudes larger.
Jung, J., West, P., Jiang, L., Brahman, F., Lu, X., Fisher, J., Sorensen, T., & Choi, Y. Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing. NAACL (2023). https://arxiv.org/abs/2305.16635.
@unpublished{impossible_distillation,
title = {Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing},
author = {Jung, Jaehun and West, Peter and Jiang, Liwei and Brahman, Faeze and Lu, Ximing and Fisher, Jillian and Sorensen, Taylor and Choi, Yejin},
year = {2023},
note = {https://arxiv.org/abs/2305.16635},
file = {impossible_distill.pdf}
}
It is commonly perceived that the strongest language models (LMs) rely on a combination of massive scale, instruction data, and human feedback to perform specialized tasks – e.g. summarization and paraphrasing, without supervision. In this paper, we propose that language models can learn to summarize and paraphrase sentences, with none of these 3 factors. We present Impossible Distillation, a framework that distills a task-specific dataset directly from an off-the-shelf LM, even when it is impossible for the LM itself to reliably solve the task. By training a student model on the generated dataset and amplifying its capability through self-distillation, our method yields a high-quality model and dataset from a low-quality teacher model, without the need for scale or supervision. Using Impossible Distillation, we are able to distill an order of magnitude smaller model (with only 770M parameters) that outperforms 175B parameter GPT-3, in both quality and controllability, as confirmed by automatic and human evaluations. Furthermore, as a useful byproduct of our approach, we obtain DIMSUM+, a high-quality dataset with 3.4M sentence summaries and paraphrases. Our analyses show that this dataset, as a purely LM-generated corpus, is more diverse and more effective for generalization to unseen domains than all human-authored datasets – including Gigaword with 4M samples.
West, P., Lu, X., Dziri, N., Brahman, F., Li, L., Hwang, J. D., Jiang, L., Fisher, J., Ravichander, A., Chandu, K., Newman, B., Koh, P. W., Ettinger, A., & Choi, Y. The Generative AI Paradox: "What It Can Create, It May Not Understand". ICLR (2024). https://arxiv.org/abs/2311.000595.
@inproceedings{generative_paradox,
author = {West, Peter and Lu, Ximing and Dziri, Nouha and Brahman, Faeze and Li, Linjie and Hwang, Jena D. and Jiang, Liwei and Fisher, Jillian and Ravichander, Abhilasha and Chandu, Khyathi and Newman, Benjamin and Koh, Pang Wei and Ettinger, Allyson and Choi, Yejin},
title = {The Generative AI Paradox: "What It Can Create, It May Not Understand"},
year = {2024},
note = {https://arxiv.org/abs/2311.000595},
file = {generation_paradox.pdf},
month = dec,
address = {Vienna},
publisher = {International Conference on Learning Representations (ICLR)}
}
The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artifi- cial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we rec- oncile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today’s generative models relative to intel- ligence in humans. Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert- like outputs, acquire generative capabilities that are not contingent upon—and can therefore exceed—their ability to understand those same types of outputs. This contrasts with humans, for whom basic understanding almost always precedes the ability to generate expert-level outputs. We test this hypothesis through controlled experiments analyzing generation vs. understanding in generative models, across both language and image modalities. Our results show that although models can outperform humans in generation, they consistently fall short of human capabili- ties in measures of understanding, showing weaker correlation between generation and understanding performance, and more brittleness to adversarial inputs. Our findings support the hypothesis that models’ generative capability may not be con- tingent upon understanding capability, and call for caution in interpreting artificial intelligence by analogy to human intelligence.
@inproceedings{ipa,
title = {Inference-Time Policy Adapters ({IPA}): Tailoring Extreme-Scale {LM}s without Fine-tuning},
author = {Lu, Ximing and Brahman, Faeze and West, Peter and Jung, Jaehun and Chandu, Khyathi and Ravichander, Abhilasha and Ammanabrolu, Prithviraj and Jiang, Liwei and Ramnath, Sahana and Dziri, Nouha and Fisher, Jillian and Lin, Bill and Hallinan, Skyler and Qin, Lianhui and Ren, Xiang and Welleck, Sean and Choi, Yejin},
editor = {Bouamor, Houda and Pino, Juan and Bali, Kalika},
booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
month = dec,
year = {2023},
address = {Singapore},
publisher = {EMNLP},
note = {https://aclanthology.org/2023.emnlp-main.424},
doi = {10.18653/v1/2023.emnlp-main.424},
pages = {6863--6883},
file = {ipa.pdf}
}
While extreme-scale language models have demonstrated exceptional performance on a variety of language tasks, the degree of control over these language models through pure prompting can often be limited. Directly fine-tuning such language models can be effective for tailoring them, but it can be either extremely costly (e.g., GPT-3) or not even feasible for the broader community (e.g., GPT-4). We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model such as GPT-3 without fine-tuning it. IPA guides a large base model during decoding time through a lightweight policy adapter trained to optimize an arbitrary user objective with reinforcement learning. On five challenging text generation tasks, such as toxicity reduction and lexically constrained generation, IPA consistently brings significant improvements over off-the-shelf language models. It outperforms competitive baseline methods, sometimes even including expensive fine-tuning. In particular, tailoring GPT-2 with IPA can outperform GPT-3, while tailoring GPT-3 with IPA brings a major performance boost over GPT-3 (and sometimes even over GPT-4). Our promising results highlight the potential of IPA as a lightweight alternative to tailoring extreme-scale language models.
Fisher, J., Liu, L., Pillutla, K., Choi, Y., & Harchaoui, Z. Statistical and Computational Guarantees for Influence Diagnostics. AISTAT (2023). https://proceedings.mlr.press/v206/fisher23a/fisher23a.pdf.
@inproceedings{influence_theory,
title = {Statistical and Computational Guarantees for Influence Diagnostics},
author = {Fisher, Jillian and Liu, Lang and Pillutla, Krishna and Choi, Yejin and Harchaoui, Zaid},
year = {2023},
month = apr,
address = {Valencia},
publisher = {Artificial Intelligence and Statistics (AISTATS)},
doi = {10.48550/arXiv.2212.04014},
note = {https://proceedings.mlr.press/v206/fisher23a/fisher23a.pdf},
primaryclass = {stat.ML},
file = {influence_theory.pdf}
}
Influence diagnostics such as influence functions and approximate maximum influence perturba- tions are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as com- putational complexity bounds, for influence func- tions and approximate maximum influence per- turbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.
Journal Articles
O.Baird, S., Rinck, M., Rosenfield, D., Davis, M. L., Fisher, J., Becker, E. S., Powers, M. B., & Smits, J. A. J. Reducing approach bias to achieve smoking cessation: A pilot randomized placebo-controlled trial.Cognitive Therapy and Research (2017), 4(41).
@article{smoking_cessation,
author = {O.Baird, Scarlett and Rinck, Mike and Rosenfield, David and Davis, Michelle L. and Fisher, Jillian and Becker, Eni S. and Powers, Mark B. and Smits, Jasper A. J.},
journal = {Cognitive Therapy and Research},
number = {41},
title = {Reducing approach bias to achieve smoking cessation: A pilot randomized placebo-controlled trial.},
volume = {4},
year = {2017},
file = {smoking_cessation.pdf}
}
This study aimed to provide a preliminary test of the efficacy of a brief cognitive bias modification program for reducing approach bias in adult smokers motivated to quit. Participants were 52 smokers who were randomly assigned to four sessions of approach bias modification training (AAT) or sham training. Participants were asked to make a self-guided quit attempt upon completion of the final training session. Approach bias was assessed at baseline and at the end of each session, and days abstinent was assessed 1-week following the quit attempt. Individuals assigned to the AAT training condition evidenced significantly greater reductions in approach bias relative to those in the sham condition (p < .001). Baseline approach bias did not moderate the between-group effect (ps > 0.41); however, higher levels of approach bias at baseline were associated with greater approach bias reduction over time irrespective of condition (p < .001). Consistent with hypothesis, the reduction in approach bias during the intervention period was significantly related to the number of days abstinent following the quit attempt (p = .033). The present study extends recent work in alcohol use disorders by showing that approach bias reduction, in this case for smoking-related stimuli, may also facilitate smoking cessation. Clinical and research implications are discussed.
Dictionary:
NAACL: North American Chapter of the Association of Computational Linguistics
EMNLP: Empirical Methods in Natural Language Processing
ICML: International Conference on Machine Learning
ICLR: International Conference on Learning Representation