Great Expectations

Great Expectations

Software Development

Revolutionizing the speed and integrity of data collaboration

About us

The mission of Great Expectations is to revolutionize the speed and integrity of data collaboration. Great Expectations is the leading open source tool for defeating pipeline debt through data testing, documentation, and profiling. Data teams all over the world use Great Expectations to boost the confidence, integrity, and speed of their work.

Website
https://greatexpectations.io
Industry
Software Development
Company size
11-50 employees
Headquarters
Remote
Type
Privately Held
Founded
2017
Specialties
data science, data engineering, data pipelines, pipeline debt, data quality, data monitoring, and MLOps

Locations

Employees at Great Expectations

Updates

  • View organization page for Great Expectations, graphic

    3,794 followers

    #dataquality is 🔑

    View profile for Chad Sanderson, graphic

    CEO @ Gable.ai (Shift Left Data Platform)

    "Data Quality is our largest barrier to AI adoption" - said by one of the world's most sophisticated technology companies. While the hype cycle of AI is still on an exponentially upward trend (for now, anyway) many teams are starting to run into a stark reality: The data required to power the models executives want either A.) does not exist or B.) does exist but isn't trustworthy. Andrew Ng, founder of the AI Fund once stated that the purpose of MLOps is to ensure the availability of high-quality data throughout the ML project lifecycle. I will go one step further than Andrew. I do not believe that Data Quality is a problem that can be solved by data scientists or AI teams. Data Quality is a problem that MUST be solved by data producers. If quality does not exist at the source, then every action you might take is reactive and remedial in nature. Due to the variety of changes to data being virtually limitless, it is impossible to write a test before every possible change without understanding how that change impacts downstream systems. Unfortunately, it will usually take a major incident before most executives acknowledge the potential risks of not having preventative / proactive data quality solutions in place, with explicitly defined ownership at the source. Downstream teams must do the hard work of communicating to data producers the outcomes on AI and other data products when incidents occur. This is the only way engineering organizations will take this problem seriously. Good luck!

    • No alternative text description for this image
  • View organization page for Great Expectations, graphic

    3,794 followers

    Understanding your data’s quality is about finding the right answers, right? Nope. Understanding the quality of your data isn’t the process of finding the right answer; it’s the process of ruling out all the wrong answers. This is actually good news, because now we have a framing that actually empowers us in developing confidence in our data. Learn more ➡️https://hubs.ly/Q02y_VBd0 #dataquality #dataengineering #dataengineer #dataanalyst #datascience

    Why data quality is actually really difficult

    Why data quality is actually really difficult

    greatexpectations.io

  • View organization page for Great Expectations, graphic

    3,794 followers

    Thanks for the shoutout, Fabiana Clemente!

    View profile for Fabiana Clemente, graphic

    Data quality for data science | Data-Centric AI | Data Preparation & Synthetic data

    Here's a bit of a rant... #DataQuality often gets overlooked because it's not glamorous or exciting, and it lacks a dedicated champion. But ignoring it could jeopardize your entire #AI strategy. Despite being crucial to the success of AI, the importance of data quality and the development tools that ensure it have not received the attention they deserve. The movement of #datacentricAI seem to be the beginning to change it, but #LLMs and #foundationalmodels took over the hype. In my opinion #dataquality should take the narrative back, and here's why: 1. Poor-quality data results in inaccurate models, biased outcomes, and unreliable AI systems. 2. Neglecting data quality tools in favor of model development and productization has hindered innovation in the field. While progress is being made in improving data quality tooling, it’s not keeping pace with the rapid development of new AI models. 3. Investing in data quality is essential for the future of AI, regardless of the models or architectures used. In an AI-dominated world, the key differentiator is your data! As someone activate among the data community and #opensource, I do believe that we should invest in robust data management infrastructure, promote data literacy among AI practitioners, and encourage collaboration between data engineers and data scientists. Strong data governance frameworks are also essential to ensure data quality, privacy, and security. A big shoutout to the companies that are giving visibility to the importance of #dataquality Cleanlab YData Great Expectations! What tooling do you use? Tag others that have been relevant for your journey into #ai and #data.

  • View organization page for Great Expectations, graphic

    3,794 followers

    If data quality problems are so universal, why is solving them so hard? Shouldn’t we have the tools by now? Is that even the right question? 🤔 Spoiler: it’s not. We have the tools ⚒️, but we’re not putting them together in the right way. Today on the GX blog, we talk about the role of surprise in data quality 😯 and go down the rabbit hole 🐇 to discover a framing for data quality that actually empowers data teams. 📖Read it here: https://hubs.ly/Q02y_Srz0 #dataquality #dataengineering #dataengineer #dataanalyst #datascience

    Why data quality is actually really difficult

    Why data quality is actually really difficult

  • View organization page for Great Expectations, graphic

    3,794 followers

    Great thoughts from Chad Sanderson on the importance of #dataquality. "Don't treat Data Quality as a nice to have..."

    View profile for Chad Sanderson, graphic

    CEO @ Gable.ai (Shift Left Data Platform)

    Here are a few simple truths about Data Quality: 1. Data without quality isn't trustworthy 2. Data that isn't trustworthy, isn't useful 3. Data that isn't useful, is low ROI Investing in AI while the underlying data is low ROI will never yield high-value outcomes. Businesses must put an equal amount of time and effort into the quality of data as the development of the models themselves. Many people see data debt as another form of technical debt - it's worth it to move fast and break things after all. This couldn't be more wrong. Data debt is orders of magnitude WORSE than tech debt. Tech debt results in scalability issues, though the core function of the application is preserved. Data debt results in trust issues, when the underlying data no longer means what its users believe it means. Tech debt is a wall, but data debt is an infection. Once distrust drips in your data lake, everything it touches will be poisoned. The poison will work slowly at first and data teams might be able to manually keep up with hotfixes and filters layered on top of hastily written SQL. But over time, the spread of the poison will be so great and deep that it will be nearly impossible to trust any dataset at all. A single low-quality data set is enough to corrupt thousands of data models and tables downstream. The impact is exponential. My advice? Don't treat Data Quality as a nice to have, or something that you can afford to 'get around to' later. By the time you start thinking about governance, ownership, and scale it will already be too late and there won't be much you can do besides burning the system down and starting over. What seems manageable now becomes a disaster later on. The earliest you can get a handle on data quality, you should. If you even have a guess that the business may want to use the data for AI (or some other operational purpose) then you should begin thinking about the following: 1. What will the data be used for? 2. What are all the sources for the dataset? 3. Which sources can we control versus which can we not? 4. What are the expectations of the data? 5. How sure are we that those expectations will remain the same? 6. Who should be the owner of the data? 7. What does the data mean semantically? 8. If something about the data changes, how is that handled? 9. How do we preserve the history of changes to the data? 10. How do we revert to a previous version of the data/metadata? If you can affirmatively answer all 10 of those questions, you have a solid foundation of data quality for any dataset and a playbook for managing scale as the use case or intermediary data changes over time. And speaking of foundations...my first live course on Building AI-Ready Data Infrastructure starts THIS weekend, so if you're interested in joining, I'll be covering these topics and more. Here's the link: https://lnkd.in/gpGTpErD Good luck! #dataengineering

    • No alternative text description for this image
  • View organization page for Great Expectations, graphic

    3,794 followers

    Check out this recent podcast featuring our CEO Abe Gong discussing data quality on The Darius Grant Show! #AI #dataquality

    View profile for M. Darius Gant, CPA, graphic

    Pioneering AI Talent Acquisition Across the Americas

    New #AI Episode: Abe Gong shares his insights into the world of data quality and how Great Expectations is solving the systemic data quality issue in organizations. (link below) We cover topics such as: - Common errors and issues in data quality - Identifying and solving data quality issues - How Great Expectations supports companies deploying AI models - Involvement in generative AI use cases - Building a remote-first team with a focus on open-source collaboration - Great Expectations fund raising journey - Importance of technical leads on data teams - Difference between enterprise software sales and open source models Apple:https://lnkd.in/g-HNVRER Spotify: https://lnkd.in/gD9gxQvu

    • No alternative text description for this image
  • View organization page for Great Expectations, graphic

    3,794 followers

    Choosing the right data quality testing framework is one of the best things a data architect or data engineer can do for their organization. It will help them create a data quality program that produces real, actionable insights, and a data quality culture that includes, and is accessible to, everyone at their organization. Here’s how to see if the tests you’re using can do all that: https://hubs.li/Q02xhz6C0 #dataquality #dataengineer #dataarchitect

     The 3 E’s of good data quality testing

    The 3 E’s of good data quality testing

  • View organization page for Great Expectations, graphic

    3,794 followers

    View profile for Zach Wilson, graphic
    Zach Wilson Zach Wilson is an Influencer

    Founder @ DataExpert.io | YouTube: Data with Zach | ADHD | contact: [email protected]

    Quick wisdom from my 10 year data engineering career: - data quality should be built into the pipeline from the beginning, not tacked on later - monthly and weekly metrics should come from cumulative tables to avoid high cloud bills - anonymizing data should happen in lines with your company policy. You can still get tons of rich analytics from anonymized data - aggregate data is the history books of your company. Make sure to publish and maintain aggregates as soon as possible to tell the most complete story possible #dataengineering

Similar pages

Browse jobs

Funding

Great Expectations 2 total rounds

Last Round

Series B

US$ 40.0M

See more info on crunchbase