Socially Responsible Language Modelling Research (SoLaR) 2024

Contact: solar-neurips@googlegroups.com.

@solarneurips

Accepted Papers:

List of accepted papers

Panel Details

Workshop Panel

Panelists

• Yoshua Bengio
• Margaret Mitchell
• Jeff Clune

Moderator

Jakob Foerster

Invited Talks

Been Kim	Towards Interpretability for Humanity.	(Speaker)
Peter Henderson	Aligning Machine Learning and Law for Responsible Real-World Deployments	(Speaker)
Zico Kolter	LLM Robustness: Recent progress and the challenges ahead.	(Speaker)
Rida Qadri	AI's Cultural Futures: Designing for a Culturally Rich World	(Speaker)
Hannah Rose Kirk	A Tale of Two RCTs: Building a rigorous evidence base on the societal impacts of frontier AI inside the UK Government	(Speaker)

Key Dates

Submissions Open on OpenReview	August 20, 2024
Submission Deadline	September 14, 2024, AoE
Acceptance Notification	October 9, 2024, AoE
Camera-Ready Deadline	December 1, 2024, AoE
Workshop Date	December 14, 2024

All deadlines are specified in AoE (Anywhere on Earth).

Description/Call For Papers

The Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2024 is an interdisciplinary gathering that aims to foster responsible and ethical research in the field of language modeling. Recognizing the significant risks and harms [33-37] associated with the development, deployment, and use of language models, the workshop emphasizes the need for researchers to focus on addressing these risks starting from the early stages of development. The workshop brings together experts and practitioners from various domains and academic fields with a shared commitment to promoting fairness, equity, accountability, transparency, and safety in language modeling research.

Given the wide-ranging impacts of LMs, our workshop will welcome a broad array of submissions. We briefly detail some specific topic areas and an illustrative selection of pertinent works:

Security and privacy concerns of LMs [13, 30, 25, 49, 55].
Bias and exclusion in LMs [12, 2, 26, 53, 44].
Analysis of the development and deployment of LMs, including crowdwork [42, 50], deploy- ment protocols [52, 47], and societal impacts from deployment [10, 21].
Safety, robustness, and alignment of LMs [51, 8, 35, 32, 7].
Auditing, red-teaming, and evaluations of LMs [41, 40, 29, 15, 11].
Examination of risks and harms from any novel input and/or output modalities that are introduced in LMs [14, 28, 54].
Transparency, explainability, interpretability of LMs [39, 17, 3, 46, 22, 38].
Applications of LMs for social good, including sector-specific applications [9, 31, 16] and LMs for low-resource languages [4, 5, 36].
Perspectives from other domains that can inform socially responsible LM development and deployment [48, 1].

We will also have a separate track, with a separate reviewer pool, for sociotechnical submissions from disciplines such as philosophy, law, and policy. We provide a brief illustrative list of works we would welcome:

Studies on economic impacts of LMs, e.g., labor-market disruptions [18, 34].
Risk assessment [33, 24, 37, 23].
Regulation and governance of LMs [45, 6, 27].
Philosophical examination of concepts related to alignment, safety [19, 43, 20].

Papers from the previous iteration of SoLaR can be found here .

References

[1] The Grey Hoodie Project: Big Tobacco, Big Tech, and the Threat on Academic Integrity. In AIES 2021.
[2] Persistent Anti-Muslim Bias in Large Language Models. In AIES 2021.
[3] Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation. In ICLR 2022.
[4] A Few Thousand Translations Goa Long Way! Leveraging Pre-trained Models for African News Translation. In NAACL 2022.
[5] MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. In EMNLP 2022.
[6] Managing Emerging Risks to Public Safety, Sept. 2023. URL http://arxiv.org/abs/2307.03718.
[7] Foundational challenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932, 2024.
[8] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Apr. 2022. URL http://arxiv.org/abs/2204.05862. arXiv:2204.05862 [cs]
[9] Fine-tuning language models to find agreement among humans with diverse preferences. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
[10] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21.
[11] Ai auditing: The broken bus on the road to ai accountability. arXiv preprint arXiv:2401.14462, 2024
[12] Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1004–1015, Online, Aug. 2021.
[13] What Does it Mean for a Language Model to Preserve Privacy? In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, June 2022.
[14] Are aligned neural networks adversarially aligned? Advances in Neural Information Processing Systems, 36, 2023.
[15] Black-box access is insufficient for rigorous ai audits. arXiv preprint arXiv:2401.14446, 2024.
[16] Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa- pers), pages 2970–3005, Minneapolis, Minnesota, June 2019
[17] Towards A Rigorous Science of Interpretable Machine Learning, Mar. 2017. URL http://arxiv.org/abs/1702.08608.
[18] GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models, Aug. 2023. arXiv: 2303.10130
[19] Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020. Publisher: Springer.
[20] he ethics of advanced AI assistants. arXiv preprint arXiv:2404.16244, 2024.
[21] Predictability and Surprise in Large Generative Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 1747–1764, New York, NY, USA, June 2022. Association for Computing Machinery.
[22] Datasheets for datasets. Communications of the ACM, 64(12):86–92, Dec. 2021.
[23] The false promise of risk assessments. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Jan. 2020.
[24] Algorithmic Risk Assessments Can Alter Human Decision-Making Processes in High-Stakes Government Contexts. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):418:1–418:33, Oct. 2021.
[25] Predictability and Surprise in Large Generative Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 1747–1764, New York, NY, USA, June 2022.
[26] Datasheets for datasets. Communications of the ACM, 64(12):86–92, Dec. 2021
[27] The false promise of risk assessments. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Jan. 2020.
[28] Algorithmic Risk Assessments Can Alter Human Decision-Making Processes in High-Stakes Government Contexts. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):418:1–418:33, Oct. 202
[29] Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, May 2023.
[30] ias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892, 2023.
[31] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis, Feb. 2024. URL http://arxiv.org/abs/2307.12856. arXiv:2307.12856 [cs]
[32] The Future of AI Governance, Apr. 2023. URL http://arxiv.org/abs/2304.04914
[33] Uncovering bias in large vision-language models with counterfactuals. arXiv preprint arXiv:2404.00166, 2024
[34] Automatically Auditing Large Language Models via Discrete Optimization, Mar. 2023. URL http://arxiv.org/abs/2303. 04381.
[35] Deduplicating Training Data Mitigates Privacy Risks in Language Models. In Proceedings of the 39th International Conference on Machine Learning, pages 10697–10707. PMLR, June 2022.
[36] ChatGPT for good? On opportunities and challenges of large language mod- els for education. Learning and Individual Differences, 103:102274, Apr. 2023.
[37] Alignment of Language Agents, Mar. 2021. URL http://arxiv.org/abs/2103.14659.
[38] Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 220–229, Jan. 2019
[39] In-context Learning and Induction Heads. Transformer Circuits Thread, 2022.
[40] Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark, Apr. 2023.
[40] Discovering Language Model Behaviors with Model-Written Eval- uations, Dec. 2022. URL http://arxiv.org/abs/2212.09251.
[41] Discovering Language Model Behaviors with Model-Written Eval- uations, Dec. 2022. URL http://arxiv.org/abs/2212.09251.
[42] The Coloniality of Data Work in Latin America. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. ACM, July 2021.
[43] A human rights-based approach to responsible AI. arXiv preprint arXiv:2210.02667, 2022.
[44] Ai’s regimes of representation: A community-centered study of text-to-image models in south asia. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 506–517, 2023.
[45] Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. ACM, July 2022.
[46] Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, May 2019.
[47] Structured access: an emerging paradigm for safe AI deployment, Apr. 2022. URL http://arxiv.org/abs/2201.05159. arXiv:2201.05159 [cs].
[48] The Offense-Defense Balance of Scientific Knowledge: Does Pub- lishing AI Research Reduce Misuse? In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, pages 173–179, New York, NY, USA, Feb. 2020.
[49] Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789, 2023
[50] Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3758– 3769, Online, June 2021. Association for Computational Linguistics
[51] Defining and Character- izing Reward Hacking. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
[52] The Gradient of Generative AI Release: Methods and Considerations, Feb. 2023.
[53] ” kelly is a warm person, joseph is a role model”: Gender biases in llm-generated reference letters. arXiv preprint arXiv:2310.09219, 2023
[54] Debiasing large visual language models. arXiv preprint arXiv:2403.05262, 2024
[55] Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.