26 May Dirty Data – Preventing the Pollution of Your IoT Data Lake
e’ve all heard it before: garbage in – garbage out. This is especially problematic with IoT data where dirty data, data that is well-formed but wrong, can cause a butterfly effect chain reaction, polluting the data lake, resulting in bad decisions and bad business. In this episode of the IoT Business Show, I speak with James Branigan about what bad data is, how to identify it, stop it and then deal with the aftermath.
In this episode of the IoT Business Show, I speak with James Branigan about what bad data is, how to identify it, stop it and then deal with the aftermath.
James is the Co-Founder of Bright Wolf, an enterprise IoT tech and solution provider headquartered in Durham, NC. He has over 15 years’ experience as an architect and developer of m2m and large-scale industrial IoT systems.
In every long-term IoT deployment, chances are that something is going to go wrong with the integrity of your data. As software, firmware and hardware is updated over time, especially at the edge, bugs will creep in and data will get corrupted. By following the best practices discussed in this episode, most of it can be prevented or spotted on route to the data lake, but not all. Then you need to go into forensic mode, which is only possible with upfront planning. Like cyberattacks, dirty data must be planned for upfront. The best way to fight dirty data is with more data, or contextual data to be exact – the type you store in your application protocol.
Here’s What We’ll Cover in this Episode
- The key to avoiding HIPAA regulation problems.
- What dirty data is and how to avoid it.
- The difference between cleaning IoT big data and non-IoT big data.
- The four sources of dirty data.
- The three different categories of dirty data.
- The gotchas with OTA systems in real IoT deployments.
- The three ways of finding if there are any data issues.
- How to fix data problems once they’ve happened.
- The differences between discrete product data issues and those found in IoT systems and processes.
Mentioned in this Episode and Other Useful Links
Support this Podcast
If you have been enjoying this podcast, there are a few ways you can support it:
Have an opinion? Join the discussion in our LinkedIn group
Most deployments will have at one time or another dirty data – have you found any in yours?Click here if you have an opinion on this podcast or want to see the opinion of others