February 2016

Understanding big data

By Sean Dessureault

Sean Dessureault

Interest in big data, specifically the value of applying advanced analytics to the massive amount of data today’s operations produce, has soared in recent years. This interest is fueled by the growing number of equipment monitoring technologies offered by many vendors, each with their own custom database structure, servers to host the data and reporting systems. The result of this expansion has been that many mines possess large data assets: repositories of information, such as machine health data and work orders, that could provide new production management approaches, reveal new resources and lower safety and environmental risks. But before the disparate monitoring systems can be integrated and the pursuit of big data analytics begun, it is important to understand the state and type of data most relevant to analytics used in mines.

Relational Data

Mobile equipment monitoring data is generated both from operator input via an in-cab touch screen, as well as embedded systems that automatically monitor important parameters such as location and machine health. This data is then transmitted to an on-site server, wherein it is stored for analysis in a relational database. Other software, such as enterprise systems or computer maintenance management systems, have back-end relational databases that process transactions such as payments or work order requests and fulfillments.

Relational databases use tables that have interconnected relationships. For example, a truck haulage record is contained in a table that lists truck cycles: which shovel loaded a particular truck, the time it was loaded and the duration of the haul. A user may be interested in generating reports that list production based on shovel type. Those shovel details are contained in a related table for which there is a relationship, hence the expression relational database.

Ad hoc data exported from a mine planning system or spreadsheets that have safety data may also be loosely considered relational, where details of particular columns are found in other tables.

Process Data

Process data is typically analog and control data collected in processing plants. These plants are highly instrumented and automated to the point where a single operator can often manipulate most settings in the plant to adjust to conditions or to modify the process output.

There are several techniques to safely and efficiently extract data from these process control systems. The most common mechanism is through software known as process historians that scan the control network and record the analog values and set points for future reference. These historians have highly efficient algorithms that compress the analog signals, allowing for easy transmission of analog data. Recently this technology has been used to log mobile equipment’s analog data, such as engine exhaust temperature. If compressed, this information can be transmitted wirelessly so that a maintenance technician can monitor the machine as though it were a small plant. New features that have been added to some historians also allow for automated detection of events (such as the start and end of a period of zero weight on a conveyor belt) through pattern recognition algorithms.

Unstructured Data

Unstructured data is information, including text or media files, that do not fit neatly into a structured database, like those above, and whose underlying technologies are web and mobile-based languages and approaches such as Javascript Object Notation (JSON) or document structures. Because computer programmers now change data elements so frequently, relational data modelling programmers are unable to continuously alter the database structure to accommodate the changes. Large text and media files are also being stored in this unstructured format, without an immediate plan for use. As a result, databases need to accommodate this unstructured data. Since the volume of data that needs to be stored and processed is so large, a single processor is unable to cope. Therefore, the technology uses a stack of relatively recent technological developments, such as MapReduce, which is a mechanism where a large data set can be simplified and then processed through algorithms by distributing the calculations, hence reducing their complexity, across multiple processors in a set of ‘mapped’ servers.

Although the mining industry has little unstructured data today, it is undoubtedly the future for numerous applications. Many of the new analytical tools being developed use this flexible form of data structure. The next generation of equipment monitoring technology will be developed on low-cost flexible platforms such as tablets and use low-cost Internet of Things (IoT) sensors, both of which make extensive use of unstructured data.

Most of our current mining and cost data is in a relational format though, and our processing data is stored in semi-structured databases and often only partially contextualized in historians. Yet, there is enormous untapped potential within our existing data structures today. Signal processing of analog data and the mining of relational data are technically within our grasp now. A practical, honest approach to integration using current techniques in both a relational and process environment can be applied immediately while we wait for the next generation of technology to roll out, and our work processes reengineered to accommodate these new capabilities.

Sean Dessureault is the president and CEO of Mining Information Systems and Operations Management (MISOM) Technologies.
Got an opinion on one of our columns? Send your comments to editor@cim.org.
In this issue
     Project Profile
    Upfront: Copper
Post a comment


PDF Version