Speed-up LLM Development with Synthetic Data via Gretel Navigator and Predibase

Having access to quality training data for Large Language Models (LLMs) as well as efficient LLM fine-tuning and serving solutions are critical challenges in building generative AI applications. Gretel Navigator opens the door to generating high-quality diverse synthetic data quickly and on-demand. This allows teams to innovate faster, shorten time in bringing ML solutions to production, and to substantially lower the costs of AI development. Predibase (maintainers of Ludwig and LoRAX) is the developer platform for LLM fine-tuning and efficient serving, offering out-of-the-box state-of-the-art fine-tuning techniques such as low-rank adaptation, quantization, and memory-efficient distributed training to ensure your fine-tuning jobs are fast and efficient on commodity GPUs, and yield highly accurate model performance.

In this workshop, you'll learn how to leverage Gretel and Predibase together to quickly and cost-effectively train LLMs that outperform commercial options. We dive into how Gretel Navigator generated the synthetic Text-to-SQL dataset – an open-source high-quality training dataset for modern AI development. It quickly became the #1 trending dataset on Hugging Face, boasting 200+ likes and 1k+ downloads in one week, reinforcing the need for high-quality, accessible data in the market.

After discussing why Gretel Navigator was instrumental in generating this dataset, we touch on LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, a technical report recently published by Predibase clearly demonstrating that LoRA fine-tuning significantly enhances LLM performance, surpassing non-fine-tuned base models and GPT-4. We then leverage Predibase to fine-tune a small open source LLM on the Gretel text-to-SQL dataset and benchmark its performance against other LLMs on text-to-SQL tasks.

Key topics covered are:

Generate synthetic training data with Gretel Navigator
Design and iterate on synthetic data for your specific needs
Leverage Predibase to fine-tune an open source LLM using state-of-the-art techniques
Speed-up innovation and reduce AI development costs

Speed-up LLM Development with Gretel and Predibase

Presented by

Workshop Resource Links