The 7 hidden costs of in-house data automation
Consumer packaged goods (CPG) companies are increasingly embracing internal digital transformations to streamline operations, improve decision-making, and enhance customer experiences. Central to this transformation is the availability of retail data to provide rich insights for sales planning, demand forecasting, marketing budgeting, and supply chain optimization.
Retail data ingestion is the process of collecting and processing retail data from various sources such as point-of-sale (POS) systems, e-commerce platforms, warehouses, and distribution centers. This data is transformed into a clean, usable format and centralized for easy analysis.
Retail data automation: in-house vs software solution
The process of automating retail data ingestion is essential for CPG companies to gain regular insights into their inventory levels, sales performance, and business health with retail partners.
It ensures timely and centralized data availability, allowing business analysts and other teams to access up-to-date information seamlessly. The goal is to quickly identify and respond to market trends and shifts.
However, automating retail data ingestion is a complex, resource-intensive process that can vary from retailer to retailer. It typically involves the following steps for each data pipeline:
- Research and planning: This involves identifying the data sources that need to be ingested, understanding the data formats, and developing a plan for how the data will be collected and processed.
- Building: Develop the necessary code or scripts to regularly collect and process the data.
- Testing: Thoroughly test the data ingestion process to ensure that the data is being regularly collected and processed for best use. Make continuous adjustments as necessary.
- Maintaining and updating: This involves maintaining the data ingestion process and making regular updates to ensure that it continues to work properly.
Note: For automated retail data to be useable, it also needs to be cleaned and normalized, as raw data often contains inconsistencies, errors, duplicates, or irrelevant information. These can distort analysis results and lead to inaccurate business decisions.
Keeping all of this in mind, an automated retail data solution can be built in-house or outsourced. Outsourcing saves significant time, money, and resources, allowing companies to focus more on their core strengths as a CPG organization and less on becoming a technology developer.
Here, we’ll explore the underlying costs associated with DIY retail data ingestion, and how reliable, integrated software can prove the best solution for companies of all sizes.
7 costs of internal data pipelines
Amid retail’s digital transformation, DIY retail data automation may seem cost-effective, but it often conceals hidden costs that can snowball over time. We’ve spoken to hundreds of CPG companies, and even the most advanced struggle to maintain automation for even 50% of their retailer data.
Here are seven costs associated with managing digital data ingestion in-house:
- High up-front and operational costs: Building an in-house data ingestion system from the ground up requires a hefty initial investment in planning, project management, and technology. Moreover, hiring and retaining a dedicated team to engineer retail data ingestion can be a tall order. It requires specialized knowledge and skills, and the recruitment process can be time-consuming and costly.
Example: Research shows that it takes two engineers and one project manager 18 weeks on average to build automated ingestion for a new data source — costing an estimated $100k each time. This is just for a single data source. Each source has its intricacies but simply building out an internal operation for a single retailer source can cost this much. From there, it costs companies an average of $500k per year to maintain all of their data pipelines. - Computing costs: Daily retail data ingestion can demand substantial computational power. Companies might face escalating costs, especially if they have underestimated the infrastructure and processing power required for timely and efficient data ingestion.
Example: Consider a Crisp customer with 4.7 million points of distribution (PODs), which amounts to 100 billion rows of data — over 100 TB worth. On top of staggering computing costs managing this much data in-house, latency issues can lead to delayed data availability and affect responsive decision-making. - Retail portal outages and updates: Keeping up with frequent changes in retailer data structures or portal updates is a constant challenge. Retailers may not always communicate these modifications, leading to data discrepancies and downstream reporting disruptions. Even a minor addition like a new data field can derail the entire ingestion, harmonization and reporting process.
Now, how long does it take a team to identify the update? How long does it take to discover where it impacts reports? And finally, how long does it take to learn what the new data sets reference and what insights can teams use from them?
Fixing these interruptions can be time-consuming and just as one issue is addressed, another might emerge, resulting in a reactive rather than proactive approach to data management. The ongoing costs can be unsustainable to manage.
Example: Retailers update portals and reports every 3-6 months, meaning the pipelines must be consistently and constantly maintained.
Opportunity costs represent the benefits a business misses out on when choosing one alternative over another. While they may not always be easily quantified, understanding opportunity costs is critical to the “build vs buy” equation. Here are some opportunity costs tied to DIY retail data automation:
- Time and resources: The process of DIY data ingestion demands a significant allocation of time and resources — particularly from highly skilled teams like engineers, IT staff, and analysts. This redirection of resources toward maintaining data pipelines can lead to potential slowdowns in other areas, limiting the company's ability to innovate, respond to market changes, or capitalize on new opportunities.
Example: Maintaining just one or two data pipelines often takes up ~25% of an engineering team’s time, keeping them from making progress on higher-return projects. - Delay in strategic initiatives: Given the unpredictable nature of DIY data ingestion — where unexpected challenges or retail portal changes can arise suddenly — companies may find themselves constantly reacting rather than proactively moving forward. Such interruptions can lead to delays in launching strategic projects, rolling out new campaigns, or entering new markets. Over time, these delays can compound, causing missed market opportunities.
Example: Consider a CPG analyst who spends most of their time managing and cleaning data. With automated and centralized data, their team could be conducting a deeper analysis on the data, for example by overlaying it with weather trends. Such insights could guide supply chain and marketing decisions, boosting revenue.
The time spent managing rather than analyzing data represents a significant opportunity cost for the company. - Reduced media spend efficiency: When companies consolidate retail data into a single hub, there are applications for various departments. Marketers may be accustomed to using syndicated data sources like SPINS, but real-time, shelf-level data is a new frontier with major implications for campaign optimization. With insights from precise sales and inventory data, marketing teams can refine their media budgets, pinpointing opportune regions and diverting spend from areas with potential stock shortages. This precise targeting can drive a significant surge in return on ad spend (ROAS).
When relying on DIY data pipelines for such information, marketers may find themselves making decisions based on outdated or incomplete data — if it is even available for reference at all. These inefficiencies could lead to wasted budgets and missed advertising opportunities. - Scalability challenges: As businesses grow and the volume of data they deal with increases, DIY solutions might not be equipped to handle the increased load. Scaling the infrastructure and system can involve additional unexpected costs or even a complete overhaul. Time spent on data management can hinder emerging opportunities to grow and expand.
The advantages of retail data solutions
In an era where data drives decision-making, having a robust and reliable data management solution is non-negotiable. A dedicated retail data software solution like Crisp offers many advantages beyond convenience and cost savings.
- Single source of truth: Crisp automates the flow of data from 40+ retailers and distributors directly into the tools teams uses most, including leading cloud and BI platforms like Snowflake, Databricks, Google Cloud, Power BI, Excel, and more. Additionally, Crisp's native dashboards are made available to all users with unlimited licenses, ensuring everyone in an organization has access to vital data insights, regardless of the retailer account they work on or the tools they use.
- Efficiency and speed: With Crisp, even the largest organizations can gain access to real-time data in less than 30 days, inclusive of historical backfill, with data refreshes occurring mostly daily.
- Quality and accuracy: Crisp ensures data is not just available, but also clean, usable, and harmonized across retail data sources. This gives companies a holistic view of their retail business while also getting to business-critical insights faster.
- Reliability: Crisp ensures consistent and unbroken data flows. The platform is built to adapt quickly to portal changes and to seamlessly integrate new feeds as they become available, offering a foolproof solution for uninterrupted data access.
- Advanced analytical capabilities: Crisp offers advanced reporting including a retail voids dashboard that uses machine learning to identify at the individual store level where products should be selling, but aren’t.
- Major cost savings: By adopting the Crisp platform, companies can sidestep the hefty development and maintenance costs associated with DIY solutions right away. This leads to substantial short-term and long-term financial savings.
Retail data software solution: what’s the cost?
Investing in Crisp is not just about streamlining retail data ingestion; it's about the broader value it brings to the entire organization. With integrations to over 40 retailers and distributors, Crisp offers a solution tailored to your company's specific needs. Whether you're a small business or a large enterprise, Crisp's pricing structure is designed to be scalable, based on points of distribution or PODs. This approach ensures you only pay for what you need.
The initial cost pays off quickly in time savings, relieving technical teams of maintenance tasks. And when considering the benefits — such as Carbone's reported $500k savings from reduced out-of-stocks, or one enterprise customer’s savings of $100k a week in inventory allocation — the return on investment in Crisp becomes clear.
Compared to DIY retail data pipelines, a technology solution like Crisp offers immediate cost savings and convenience, a future-proof infrastructure, and countless insights to grow your business.
Get insights from your retail data
Crisp connects, normalizes, and analyzes disparate retail data sources, providing CPG brands with up-to-date, actionable insights to grow their business.