Dashboards vs Reports vs What? First steps on the Analytics path

This is a post about the differences between dashboards and reports, and where they overlap, and how either (or both) can be used to form the foundation of an analytics framework. While most (if not all) of the main points apply across multiple sectors and industries, the focus will be on the use of tools in the rail and also wider transport sectors - that's the intent, anyway!

The first thing is to ask what is the difference between a dashboard and a report - isn't a dashboard a collection of reports or highlights? The answer is, unfortunately, "yes and no": all too often, that is exactly what a dashboard presents, so 'yes'. It is not, however, what a dashboard should be (in my view, anyway): a dashboard should provide, in one place, an overview of the information and facts an individual needs to know at the time they view that dashboard. Not a snapshot, some time-based 'photograph' of how things looked the last time the reports were run, but up-to-date information, and it certainly should not require a full data re-import each time it loads for a specific user. That does not mean dashboards should not contain reports in part, of course: some data change slowly, and sometimes what the user wants to see is performance as will be shown in a management report. Some times it is not possible to capture data in "real time", in which case the dashboard should show the most recent data possible - and, again, not require a lengthy data import process each time it is opened. The keys to dashboards are simple: speed and ease of use, and not necessarily in that order. If it takes too long to load, people won't use it as a routine tool, and the effort in creating it is partly wasted; if it doesn't show the information people want, in an accessible and easy to grasp form, then they definitely won't use it. One classic mistake is to create a dashboard that shows all available information to all users, and requires a refresh or a re-load when someone logs in: "MyFirstDashboard" (we've all done it - and in some egregious examples they can take two or three minutes to load, or worse!).

A report, on the other hand, is typically a more static view: obviously dynamic reports exist (often called "on demand"), and in many cases are extremely useful. In common use though a report is something prepared for a specific area, featuring specific data over a specific time period (sales last quarter, or average passengers on the route A - B last month by day of week). Fault category reports (doors, traction, passengers causing trouble, and so on) are not usually dynamic data - they can be, obviously, but higher level managers typically want to see recent results and trends over time, not "what is happening right now?". The other main distinction is that it's not just timeliness of data, it's when is the report required: the majority of reports are likely to be run or used on a scheduled and predictable basis - once a day, once a week, once a month and so on. Depending on the compute model chosen (in house, cloud, hybrid) it may even be possible to run these reports during "quiet time", when the dashboard servers are lightly used, then store them for people to access when they wish (a far more efficient use of computing resources than running each report every time someone requests it, which some dashboard-based solutions would do).

Beyond both of these, in terms of capabilities (and in some ways complexity) there are the analytics tools. These are usually capable of both producing reports and providing dashboards, and then have more advanced capabilities: trend analysis, forecasting, statistical modelling, anomaly detection and data exploration capabilities are typical features, although not day-to-day requirements for most users. For some, of course, these are an essential requirement: for most users, though, where forecasting is required it is within the context of a report or dashboard. Statistical models can be created for real time performance monitoring, with anomaly or outlier detection driving alerts: the configuration and development of those models is probably not an efficient use of control centre or depot staff? (Responding to those alerts, absolutely: that can be used as the basis of an incident response system for some types of incidents.)

There is then the use of such models and modelling techniques for predictive failure analysis: techniques such as regression modelling can be used, trend analysis can be combined with things such as meterological data to predict possible failures before they happen, and so on. Trends can be monitored to watch for increasing failures in a particular subsystem, or as components age (and if that changes with season, perhaps): these can then be integrated with a "normal" schedule to provide optimised maintenance. Of course, static data such as fault count by subsystem can be used as a very basic start to that process, but if you are using "clever" analytics tools then more advanced capabilities are not that difficult. This may perhaps need vendor or some external support, perhaps not: sometimes internal specialists have domain knowledge that an external consultant simply does not have. However, whether they are available and have 'free' time to support such a project is obviously a different question!

The next big step is a move to live data: perhaps in some ways this is a target that should be kept in sight while addressing the other areas to this point. Once operational data can be collected in real time from on-board, signalling and other wayside systems, a live view of the operation of the railway can be created. This can then be compared with any number of models for anomaly detection or early warning of failures in service: it can also be used to assist the implementation of software changes, the integration of rolling stock and new or updated signalling, and so on. Fed into a maintenance optimisation scheme, then it becomes possible to monitor a train in passenger service, bring it back before failures occur, or in the event of a lower impact failure (something that does not affect service, for example) then move that train to "front of the queue" for maintenance, ahead of scheduled works. If the asset management system is also linked, service history can be a data source for modelling: at this point, it should become possible to track component versions and fault cases, and not just generate a report to identify which units or trains need attention if a particular generation or version of component has an issue, but identify that as the source of an in-service problem and have a replacement component waiting for the train's return. The "joined up" use of data for both monitoring and modelling provides the foundation for a step change in both operations and overall management, in terms of speed of both information presentation / decision support and also the monitoring of the results of any such decisions.