Data Thing Part 10: Notes dump
as per the last data thing, here are bits of my notes that i used to present a case for going forward with this project. odds and sods on strategy and a modern approach to analytics. no time to edit them and add context so just gonna paste em here and be done with it...
The “Modern Data Stack”
Traditional | Modern |
---|---|
• On premise | • Cloud based |
• Requires maintenance – patching and upgrading | • Data and integrations first – maintenance is automated |
• Manual processing | • Automated processing with version control, and reusability |
• Separate pipelines for each business use | • Focus on delivering a data model so users can self serve – joining information from multiple sources. |
• Sources are often limited to single, structured back end databases. | • Data Lake / Lakehouse capability can easily process structured and unstructured data. |
• Focus on delivering specific reports or analyses – output is siloed | |
• Scaling limited by ICT development resource |
Data maturity levels – beyond dashboards
We cannot move up the maturity scale with fancier reports or dashboards - we need to modernise behind the scenes
Ad-hoc
Data is collected and stored in an unstructured or inconsistent manner. There is little to no governance or analysis.
- Separate files and databases
- Sources are not linked
- No standardization or quality practices
- Manual data manipulation
Organized
Data is collected and stored in a more structured way, but there is still a lack of standardization and integration. Basic reporting and analysis may be performed.
- More data stored in databases.
- Basic access controls, cleansing and validation
- Basic analytics – descriptive statistics, simple reports, backwards looking, separate from work
Integrated
Data from various sources is integrated and standardized, enabling more comprehensive analysis. Data governance and quality practices are starting to be implemented.
- Integrating data across systems and departments
- Analytics are customizable and interactive, self –service
- Usage is focused on supporting decision making, not describing past performance
- Analytics and data usage are starting to be embedded
Managed
Data is managed as a strategic asset, with robust governance, quality, and security measures in place. Data is used to support decision-making across the organization.
- More advanced use of predictive analytics and forecasting
- Data usage and management is an organisational priority – not a function limited to ICT
- Data driven decision making is integrated into user workflows as opposed to separate reporting and dashboards
Optimized
Data is fully optimized to drive innovation and competitive advantage. Advanced analytics and machine learning techniques are used to uncover new insights and create value.
- Cloud data management is geared towards achieving organisational outcomes, over maintaining on premise systems
- Definition of “data” goes well beyond the information in databases – encompassing 3rd party data and unstructured data
on embedding analytics and "Data driven decision making is integrated into user workflows"
roll out the trusty screenshot of google maps. THIS is data embedded into workflow - look up a restaurant and see ratings, the travel time from where you are now, the time it closes today etc.
it's not a dashboard of stats on the top restaurants in X area. its data presented in the thing you are using (maps) relevant to the context (time and day, location) and job you are trying to do (find somewhere for dinner)
shout out to Benn Stancil who articulates this much better than me
Notes on why
ICT focus moves to designing and maintaining data models that reflect business priorities and activities, over producing specific outputs
Users have flexibility and ability to self serve - power users can be given enhanced permissions
Cloud hosted, no maintenance overhead
Automates as much as possible - a data lakehouse can handle a huge variety of structured and unstructured data.
Integrates with other 365 apps.
Users work on a copy of the data that is read only
The purpose of the product is to integrate between systems and sources. We4 currently focus on specific reporting outputs.
current way of working means each analytics / reporting project exists in isolation. models are part of the query which runs and produces the output. code can be reused but the process is manual. a query only produces one fixed output. everything is centred around the back office database it relates to (e.g SQL database? we'll do you an SSRS report). very few combine data from multiple systems because its difficult. and none combine with data from places outside the council.
Access to processing power for machine learning
Harness the analytics skills and business area knowledge across the organization
The data coding etc is not that complex for us (ICT), its time consuming but simple. The difficult bit is making the work effective. We need to educate people on outcome based work, identifying decision points they can influence, translating general goals into measurable ones, being user focused
Dashboards based on databases are easy and pretty but not always useful.
A person DOING the work will always be best placed and motivated to produce something really useful that benefits them and their work. So we need to safely and easily open up analytics etc to business areas
Current: Inefficient and old fashioned approach, only reaching a small area. Reactive, "decorative" reporting.
Siloed - constrained by back office systems software and architecture.
As we move more systems to cloud it becomes more difficult to draw information from different sources together – consolidation on prem negates the benefits of moving to the cloud. Need a more modern approach if we want to enable being data driven.
Our corporate strategies contain objectives to be data driven and unlock our data - we need to define these terms for us into something tangible * I am working on a definition of unlock our data = open up to business areas to work with, expand our definition of data
Hypothesis
We will get the most value from business areas being able to design and maintain solutions themselves, and from them having the skills to identify and implement effective use cases and continuously improve There is unrecognised value outside the traditional data sources - back end databases - unstructured stuff and things we dont own but we do interact with, that mirror the world we operate in
Enable this with strategy:
- Data needs to be accessible to end users – physically accessible but also cleaned and easy to use
- Handle as much processing as possible "up stream" to maximise reuse
- Have clear standards for standardisation enabling systems to be linked up
- Design models that reflect our priorities, not the architecture of back office systems
- Identify information outside system databases that should be incorporated
- Technical training for end users
- Educating users on identifying valuable cases – linking data to decision making points
- Prioritise opportunities to embed data into workflows, over descriptive reporting
Other posts with tag "data thing":