Sign up for a GX Cloud workshop: https://hubs.li/Q02Dwk3c0
Great Expectations
Software Development
Revolutionizing the speed and integrity of data collaboration
About us
The mission of Great Expectations is to revolutionize the speed and integrity of data collaboration. Great Expectations is the leading open source tool for defeating pipeline debt through data testing, documentation, and profiling. Data teams all over the world use Great Expectations to boost the confidence, integrity, and speed of their work.
- Website
-
https://greatexpectations.io
External link for Great Expectations
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- Remote
- Type
- Privately Held
- Founded
- 2017
- Specialties
- data science, data engineering, data pipelines, pipeline debt, data quality, data monitoring, and MLOps
Locations
-
Primary
Remote, US
Employees at Great Expectations
Updates
-
"7...fix your DQ issues." Good advice Benjamin Rogojan!
9 pieces of advice after 8+ years of helping companies set up their data infra 1. Start simple, the less tools and pipelines you have the less you have to maintain 2. Focus on building process maturity along with data infrastructure maturity 3. Get buy in early from the business 4. Spend time understanding the business, their problems, operations, etc 5. Don’t get distracted by ad-hoc requests(figure out what you should do and what you shouldnt and why) 6. Always try to fix data issues at the source 7. Being $1 off today means you could be off thousands tomorrow, fix your DQ issues 8. Set-up a peer review process for your analytics and data science projects 9. Communicate, communicate, communicate, with your stakeholders, the business, etc. Whether you have good or bad news What lessons have you learned?
-
Glad to be part of your stack, Zach Wilson!
Founder @ DataExpert.io | YouTube: Data with Zach | ADHD | contact: [email protected]
My favorite stack to build a data analytics product - Apache Spark (for processing) - Amazon S3 (for storage) - Apache Iceberg (for metadata) - Apache Airflow (for scheduling) - Apache Superset (for visualization) - Great Expectations (for data quality) #dataengineering
-
📣 The new GX Expectations Gallery is here! 🎉 You asked, we listened! We’ve been collecting feedback and are thrilled to announce a new Expectations Gallery. 💡Learn more about the improved search and navigation, sample data and pass/fail case examples, and other updates on the GX blog: https://hubs.li/Q02CKSK50 👀 Visit the new Expectations Gallery and enjoy a better way to build your validations. https://hubs.li/Q02CKKgq0
-
If you love the advantages of GX OSS highlighted in this post, be sure to check out GX Cloud for even more benefits! https://lnkd.in/gDMbuj43
DAY 12 - Project: Automated data quality testing using Great Expectations and BigQuery Data engineering is all about delivering high-quality data to the right people at the right time. Data quality metrics can be grouped as: business dimensions (metrics accuracy, etc) and technical dimensions (not missing data, non duplicate, etc). Read more about Data Quality as the introductory of this project https://lnkd.in/dsWA-FEe After we understand how data quality can be critical, we will try to implement Great Expectations as 1 of the powerful tool to automate and perform data quality testing. Enjoy and follow this through https://lnkd.in/d8ARch2w side note: I also added section about `dbt test`, take it a look and compare with Great Expectations to see how they differ. Next time, I will cover also about Open Metadata as 1 of the exciting tools, stay tune! #DataEngineeringIn30Days
-
-
📣 Our June community meetup is tomorrow! We've got a full schedule of demos and updates to share. Join us! When: Tuesday, June 18 at 9am PT / 12pm ET / 4pm UT Register: Visit https://lnkd.in/eFYSs5Cc to sign up. See you there! PS: We've adjusted the cadence of our meetups—the next one after tomorrow's will be August 20.
Great Expectations Community Events
addevent.com
-
#dataquality is 🔑
"Data Quality is our largest barrier to AI adoption" - said by one of the world's most sophisticated technology companies. While the hype cycle of AI is still on an exponentially upward trend (for now, anyway) many teams are starting to run into a stark reality: The data required to power the models executives want either A.) does not exist or B.) does exist but isn't trustworthy. Andrew Ng, founder of the AI Fund once stated that the purpose of MLOps is to ensure the availability of high-quality data throughout the ML project lifecycle. I will go one step further than Andrew. I do not believe that Data Quality is a problem that can be solved by data scientists or AI teams. Data Quality is a problem that MUST be solved by data producers. If quality does not exist at the source, then every action you might take is reactive and remedial in nature. Due to the variety of changes to data being virtually limitless, it is impossible to write a test before every possible change without understanding how that change impacts downstream systems. Unfortunately, it will usually take a major incident before most executives acknowledge the potential risks of not having preventative / proactive data quality solutions in place, with explicitly defined ownership at the source. Downstream teams must do the hard work of communicating to data producers the outcomes on AI and other data products when incidents occur. This is the only way engineering organizations will take this problem seriously. Good luck!
-
-
Understanding your data’s quality is about finding the right answers, right? Nope. Understanding the quality of your data isn’t the process of finding the right answer; it’s the process of ruling out all the wrong answers. This is actually good news, because now we have a framing that actually empowers us in developing confidence in our data. Learn more ➡️https://hubs.ly/Q02y_VBd0 #dataquality #dataengineering #dataengineer #dataanalyst #datascience
Why data quality is actually really difficult
greatexpectations.io
-
Thanks for the shoutout, Fabiana Clemente!
Here's a bit of a rant... #DataQuality often gets overlooked because it's not glamorous or exciting, and it lacks a dedicated champion. But ignoring it could jeopardize your entire #AI strategy. Despite being crucial to the success of AI, the importance of data quality and the development tools that ensure it have not received the attention they deserve. The movement of #datacentricAI seem to be the beginning to change it, but #LLMs and #foundationalmodels took over the hype. In my opinion #dataquality should take the narrative back, and here's why: 1. Poor-quality data results in inaccurate models, biased outcomes, and unreliable AI systems. 2. Neglecting data quality tools in favor of model development and productization has hindered innovation in the field. While progress is being made in improving data quality tooling, it’s not keeping pace with the rapid development of new AI models. 3. Investing in data quality is essential for the future of AI, regardless of the models or architectures used. In an AI-dominated world, the key differentiator is your data! As someone activate among the data community and #opensource, I do believe that we should invest in robust data management infrastructure, promote data literacy among AI practitioners, and encourage collaboration between data engineers and data scientists. Strong data governance frameworks are also essential to ensure data quality, privacy, and security. A big shoutout to the companies that are giving visibility to the importance of #dataquality Cleanlab YData Great Expectations! What tooling do you use? Tag others that have been relevant for your journey into #ai and #data.
-
☀️ It’s almost June, and that means new GX Cloud workshops! 📆 June dates: Tuesday June 11, 12pm ET (9am PT) for “Getting started with GX Cloud and PostgreSQL” Tuesday June 25, 12pmET (9am PT) for “Getting started with GX Cloud and Snowflake” 🖊️ Register here: https://hubs.ly/Q02y_W7M0
GX Cloud workshop signup
pages.greatexpectations.io