Whether sharing data internally, training large language models (LLMs), or building vector stores for retrieval augmented generation (RAG), organizations constantly juggle the need to protect personally identifiable information (PII) against the demand to extract valuable insights. With unstructured data types like call center transcripts or clinical and doctor notes, adequately protecting PII requires both sophisticated models that can label entities based on their values and the surrounding context, as well as flexible transformations that can remove PII entities while maintaining maximum utility of the text for downstream applications. Transform v2 is a powerful tool that simplifies the process of detecting and redacting PII in tabular and free text formats using Named Entity Recognition (NER). With this update, you can now: 🏷 Label arbitrary PII entity types by simply listing them in the config, without the need for additional training. 🛡 Utilize four robust NER functions to remove PII entities while optimizing the de-identified text for your specific use case. Try Transform today and experience the ease of protecting PII while unlocking valuable insights from your data. #dataprotection #dataquality #PII #RAG https://lnkd.in/eVE3wbf8
Gretel.ai
Software Development
Palo Alto, California 17,368 followers
Gretel's APIs automatically fine-tune AI models to generate accurate and safe synthetic data on demand.
About us
Gretel is solving the data bottleneck problem for AI scientists, developers, and data scientists by providing them with safe, fast, and easy access to data without compromising on accuracy or privacy. Designed by developers for developers, Gretel’s APIs make it easy to generate anonymized and safe synthetic data so you can preserve privacy and innovate faster. You can learn more about synthetic data from Gretel's engineers, data scientists, and AI research team on our blog: https://gretel.ai/blog
- Website
-
https://gretel.ai
External link for Gretel.ai
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- Palo Alto, California
- Type
- Privately Held
- Founded
- 2020
- Specialties
- Generative AI, Synthetic Data, Privacy, AI, and Deep Learning
Products
The Developer Stack for Synthetic Data.
Data Privacy Management Software
Synthetic data that’s as good, or even better than the data you have. Or don’t have. Create and share data with the best-in-class accuracy and privacy guarantees – on demand.
Locations
-
Primary
Palo Alto, California, US
-
San Diego, California 92122, US
Employees at Gretel.ai
Updates
-
In this video, Gretel's CPO Alexander Watson walks through the process of generating differentially private synthetic text data using Gretel GPT. This topic has been popular among our users who need privacy guarantees when creating training sets for language models. I will explain how differential privacy works and provide a practical example using clinical notes. You can try running this experiment yourself. A link to the dataset is in the comments below. https://lnkd.in/e2pz9QQD
Generating Differentially Private Synthetic Text using Gretel GPT
https://www.youtube.com/
-
Gretel will be at the Databricks Data+AI Summit (June 10-13) in San Francisco. We’ll be joining a panel discussion and also presenting on privacy-preserving synthetic data for #AI - stay tuned for more details. If you’re attending, you can drop by our booth #E1, or book a meeting in advance with our team here: https://lnkd.in/eHjemNpZ #DataAISummit #syntheticdata #privacy
-
Gretel.ai reposted this
⭐ After releasing our state-of-the-art synthetic Text-to-SQL dataset, we have now successfully fine-tuned the CodeLlama-7B and CodeLlama-13B models. The results? Enhanced SQL query capabilities, evidenced by nice improvements in the Bird-Benchmark EX and VES scores. This is just the start as we continue to unlock the transformative potential of synthetic data! 🚀 🙏 A huge shoutout to AWS Jumpstart for streamlining our fine-tuning process, and thanks to Dr. Qiong (Jo) Zhang and Shashi Raina for their collaboration. 🔗 Want to learn more about how we fine-tuned the CodeLlama models? Dive into our blogpost: [https://lnkd.in/gdhvpB3z] #gretel #AWS #codellama #syntheticdata
CodeLlama + Amazon SageMaker + synthetic data = nearly 40% improvement in model performance 🚀 🚀 🚀 Checkout our latest blog below to learn how to leverage the world's largest Text-to-SQL dataset (created by Gretel) and Amazon SageMaker to fine-tune CodeLlama models and significantly improve model performance. It's really that simple, better data makes better models. #syntheticdata #CodeLlama #AWS https://lnkd.in/erCy9krn
Fine-Tuning CodeLlama on Gretel & AWS SageMaker JumpStart
gretel.ai
-
Excited to share our latest breakthroughs in generating differentially private synthetic text data for advanced language model training and fine-tuning using Gretel GPT. In this post, we dive into the challenges of utilizing sensitive text data, like customer call logs or patient interactions, for model development and demonstrate how differentially private #syntheticdata offers a secure solution. Differential privacy plays a crucial role in safeguarding the private information of individuals or entities by adding calibrated noise during the learning process. This significantly reduces the risk of exposing unique linguistic patterns or specific contextual details, which can be exploited by adversarial attacks. Learn how you can leverage Gretel GPT to fine-tune language models with provable privacy guarantees: https://lnkd.in/e3XAXhP7
Generate Differentially Private Synthetic Text with Gretel GPT
gretel.ai
-
Gretel Navigator, our advanced AI system for high-quality synthetic data generation, will soon be available to #AzureAI developers as a service. Navigator gives enterprise developers the ability to seamlessly create, augment, or enhance data from scratch, and safely generate differentially private proprietary datasets for training and fine-tuning custom models. We’re thrilled to expand our partnership with Microsoft, a company dedicated to scaling data privacy alongside generative AI, and helping developers build great products. For more info on Gretel or to meet us this week at Microsoft Build, click here: https://lnkd.in/dGGpzdhz #MaaS #SyntheticData
-
Gretel.ai reposted this
We are excited to announce a deeper partnership with Microsoft as part of the new Models-as-a-Service Platform. This collaboration will expand the reach of the Gretel.ai Synthetic Data Platform, enabling organizations to seamlessly operationalize and scale to meet the growing data needs in #AI. Recent trends have shown that training general-purpose models on large volumes of low-quality, privacy-compromised data is costly. As we shift to training models on private domain data, we need high-quality, safe, and smaller models for sustainable organizational power. This future, in collaboration with Microsoft, aims to solve data bottlenecks. It's not about how good your models are, but how good your data is.
-
This week, Gretel is in the wild in Seattle at #MicrosoftBuild. 🏗 Here's where you can find us:
Will we see you at Microsoft Build 2024? Come meet Gretel's staff and #syntheticdata experts, and learn how to generate high-quality, privacy-preserving data to power your next #AI initiative. You can book a meeting with us in advance at the link below, or catch one of our live demos at space #FP55 during the following PT times: Tuesday, May 21, - 1:30pm - 2:45pm Wednesday, May 22, 11:25am - 12:50pm Thursday, May 23, 3:35pm - 5:00pm #MicrosoftBuild https://lnkd.in/eSyjZrQQ
-
Gretel.ai reposted this
The AI community should take OpenAI's GPT-4o tokenizer issue as a valuable lesson. To build safe and capable AI systems, we need high-quality, representative training data. Synthetic data generation is a promising approach that even OpenAI should take more advantage of 🧪🔬 Gretel.ai https://lnkd.in/gbNHF9X4
GPT-4o’s Chinese token-training data is polluted by spam and porn websites
technologyreview.com
-
Gretel.ai reposted this
The new Google DeepMind blog on differentially private synthetic text generation is giving us déjà vu! 😄 Probably because we published our research on the same topic, with the same example datasets, use cases and metrics at Gretel.ai way back in January. Great minds think alike, but we've been pioneering this approach for a while now. 😉 Check out our original blog post for a deep dive into generating high-quality synthetic data with strong privacy guarantees: https://lnkd.in/gDEVFN55. In all seriousness, excited to see more research energy going into this important space - the future of safe data sharing is bright! 🚀 #dataprivacy #syntheticdata #werehiring
X. It’s what’s happening
twitter.com