On-Demand Workshop

Speed-up LLM Development with  Gretel and Predibase

Hear from the the team behind the #1 trending dataset on Hugging Face, and the crew that published a groundbreaking technical report on fine-tuning LLMs that Rival GPT-4.

Recorded on May 15, 2024

Having access to quality training data for Large Language Models (LLMs) as well as efficient LLM fine-tuning and serving solutions are critical challenges in building generative AI applications. Gretel Navigator opens the door to generating high-quality diverse synthetic data quickly and on-demand. This allows teams to innovate faster, shorten time in bringing ML solutions to production, and to substantially lower the costs of AI development. Predibase (maintainers of Ludwig and LoRAX) is the developer platform for LLM fine-tuning and efficient serving, offering out-of-the-box state-of-the-art fine-tuning techniques such as low-rank adaptation, quantization, and memory-efficient distributed training to ensure your fine-tuning jobs are fast and efficient on commodity GPUs, and yield highly accurate model performance.

In this workshop, you'll learn how to leverage Gretel and Predibase together to quickly and cost-effectively train LLMs that outperform commercial options. We dive into how Gretel Navigator generated the synthetic Text-to-SQL dataset – an open-source high-quality training dataset for modern AI development. It quickly became the #1 trending dataset on Hugging Face, boasting 200+ likes and 1k+ downloads in one week, reinforcing the need for high-quality, accessible data in the market. 

After discussing why Gretel Navigator was instrumental in generating this dataset, we touch on LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, a technical report recently published by Predibase clearly demonstrating that LoRA fine-tuning significantly enhances LLM performance, surpassing non-fine-tuned base models and GPT-4. We then leverage Predibase to fine-tune a small open source LLM on the Gretel text-to-SQL dataset and benchmark its performance against other LLMs on text-to-SQL tasks.

Key topics covered are: 

  • Generate synthetic training data with Gretel Navigator
  • Design and iterate on synthetic data for your specific needs
  • Leverage Predibase to fine-tune an open source LLM using state-of-the-art techniques
  • Speed-up innovation and reduce AI development costs

Discord Join us in the Synthetic Data Community Discord  https://gretel.ai/discord