Scaling LLMs for Science and Ongoing Collaborations


Scaling LLMs for Science
(& Ongoing Collaborations)


Sam Foreman
Venkat Vishwanath

saforem2/{scaling4science, Megatron-DS-Benchmarking}

Loooooooooong Sequence Lengths

  • Working with Microsoft DeepSpeed team to enable longer sequence lengths (context windows) for LLMs

25B \hspace{30pt} 33B

Figure 1: Maximum (achievable) SEQ_LEN for both 25B and 33B models [WIP]

Ongoing Work & Collaborations

Thank you!

  • Link to slides

  • Huge shout out to

    • Venkat Vishwanath
    • James Osborn
    • Xiao-Yong Jin
    • Rao Kotamarthi
    • Romit Maulik
    • Troy Arcomano
    • Microsoft DeepSpeed Team
    • ALCF Data Science Team (everyone!)
      • ALCF Staff (Ops, Performance, Software, User Support / Documentation, …)


This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.