Your data analysis may be wrong.
By comparing irregular time periods, reacting to normal fluctuations and not knowing the previous state of your data, you can make bad decisions. Snapshots address these issues by recording the results of a data query at a specific point in time. By taking snapshots on a consistent schedule, you can see what’s changed, compare equal time periods for improved analysis and stop reacting to normal fluctuations in your data.
Snapshots help you see what changed and what didn’t. They allow you to manage by exception, detect trends and, when done right, ignore normal fluctuations. In a world focused on real-time reporting, snapshots give you needed longer-term perspective.
Why Snapshots Are Needed
Snapshots aid in decision-making by saving data that would otherwise be lost or corrupted:
- Status Indicators
Indicators like lead status, % done and current cost can change at any moment. Snapshots record the value at a specific point in time to aid in later analysis. Finding out what changed from last week to this week loses accuracy if you compare last Monday at 9am to this Tuesday at 2pm.
- Summary Results
Summaries of underlying data like total open issues, number of active leads or average usage by payment plan can return different results when the individual data items change. Summaries that rely on filtering, searching or grouping attributes whose values change can cause inconsistency in your reports, showing incorrect growth or loss numbers, or triggering false alerts.
- Deleted Items
Deleted data items can bias your summaries, leading to wrong conclusions. Remove a product from your catalog and your weekly sales report may exclude that product from the prior week sales comparisons. Survivorship bias, when you ignore removed or failed entities, can skew metric calculations like success rates.
- Added Items
New data items can also bias your summaries. Start a new project and your average percent done across all projects will drop, even if all your other projects made progress that week. Snapshots can help you make better sense of your metrics by giving you the ability to normalize your metrics when adding items.
- Derived Values
Values derived from any of the above items, such as % change in cost, new open issues or lead-to-opportunity ratio, will return different results if the underlying data changes. Using these derived values to monitor or analyze your data can lead you to incorrect conclusions.
Techniques like recording the creation date of an item or saving audit trails for specific attributes can solve these issues. But targeting the problem at the field or row level requires a level of complexity that can easily introduce errors. Snapshots address the problem at the data set level, simplifying the problem by making the implementation details of queries irrelevant, thus providing a consistent solution across all reports.
Snapshots should be saved on a consistent schedule based on how often you analyze your data and how much it varies.
If you have highly irregular daily metrics, avoid daily snapshots. Choose a weekly or monthly snapshot that smooths out variations and allows you to make meaningful period-over-period comparisons.
If you have highly consistent daily or weekly metrics, be careful about monthly or quarterly snapshots. Variations in the number of days in a month or quarter can skew your numbers. Q4 has 2 days more than Q1, a 2% difference. Normalize your data first when possible.
Regardless, for maximum consistency, use an automated system to schedule your snapshots and avoid relying on manual snapshots whenever possible.
A good snapshot engine will not only automatically save snapshots, but provide tools for you to effectively analyze your snapshots. Useful snapshot operations include:
Calculate the % change of your metrics between two snapshots. Use to monitor period-over-period changes.
Calculate metric statistics across a series of snapshots. Use to identify the average, minimum or maximum value of a metric. Calculate the moving average of a metric to balance recent fluctuations with historical performance.
Adjust the value of a metric to improve analysis. Use to remove biases from different length reporting periods, changes in how a metric is calculated or additions or removals from the data set.
See only items with metrics that changed (or didn’t change) since the last snapshot. Find out what was added or removed. Use to focus on only the items relevant to your current analysis.
Modify the values of specific data items, or add or remove data items, without affecting the source snapshot. Use to make adjustments to your snapshots to correct it for business process differences, data quality issues or what-if scenarios.
Select specific data items or values that you want to watch for future changes. Use to watch important items for changes.
Snapshot operations work best when integrated into an existing reporting or database management system, allowing users to compare entire snapshots with the same ease they compare individual rows and columns in Excel.
Snapshots can help users analyze their data more effectively and consistently. Modern databases support snapshots at the database level, but use those snapshots mainly for data recovery. And no database I know of supports snapshots at the query level. In the future, I predict we’ll see snapshot technology standardized and used increasingly for data analysis.
In an upcoming article, I’ll explore applying the Photoshop concept of layers to data sets and how that relates to improved analysis using snapshots. In the meantime, to see an example of snapshots in action, check out this video showing the analytic snapshot feature in Salesforce.com.
Credits: The image used in this article used a photo taken by Paul Downey.