# 17 analytics to transform your organisation, Part 4

(Feel free to have a look at the previous part of our series just here).

**14 - Linear Programming (Linear Optimization) **

Linear programming is a problem-solving method based on the "economic function". It is used mainly for optimisation purposes in the fields of logistics, production and finance. Developed between 1948-1950, its heyday was in the late 1960s, but it is still widely used on account of its simplicity of implementation and effectiveness in dealing with basic problems.

This approach considers an economic phenomenon as the result of several basic effects (also referred to as constraints). For example, the production cost of a transformed product can be derived (schematically) from addition: cost of raw materials + cost of transportation to production units + cost of machines and their operation + cost of packaging and shipping to sales networks; including the costs of direct labour and support at each stage. Each cost item is a basic effect, and the various effects are added together to produce a final cost; they are described as "additive".

Furthermore, for most of the problems commonly dealt with in linear programming, each effect is (at least approximately) proportional to its cause. For example, cost of transportation is proportional to distance (or number of kilometres to be travelled). Each constraint can thus be associated with a proportionality coefficient.

Ultimately, the overall formula representing the whole equation is a linear function with n constraints, each associated with a multiplier coefficient.

Continuing with our example, to optimise the cost of a product, we need to solve the corresponding linear equation in order to obtain the best possible (i.e. the lowest possible) final cost. This is referred to as maximising the linear function.

The main solution technique is the simplex method. Initially developed in algebraic and geometric form, the simplex method then developed into an algorithmic form via the use of matrices.

As is often the case with technology cycles, the military (and the US Army in particular) were the first mainstream adopters of linear programming for their work. The most striking success in this area (pun intended) was 1991's Operation Desert Storm in Kuwait.

From September 1990, the goal was to find the quickest and cheapest possible way to transport an entire army to the Arabian Peninsula. Over four months, the US Army sent 400,000 soldiers, 1,500 helicopters, 900 tanks, 2,000 armoured vehicles and 1,800 artillery pieces, mainly by air (in addition to 1,200 aircraft, 80 combat vessels and 6 CATOBAR aircraft carriers). This prompted the US Military Airlift Command to organise more than a hundred rotations per day of C-5 and C-141 cargo aircraft, transporting freight and personnel in the largest airlift ever undertaken by the United States in such a short time period, and mobilising 95% of its fleet.

And of course, civilian organisations have similar stories to tell: in 1994, Delta Airlines used Linear Programming to maximise its profits by refining the distribution of aircraft types for its 2,500 domestic flights in the United States.

Today, digital linear programming tools are available to all, notably through the implementation of a dedicated solver in Excel.

A powerful solver tool is available in Excel add-ins via the SIMPLEX LP function. This is designed to maximise linear functions. Menus are used to easily add constraints and adjust the various parameters and resolution options.

Recent applications are easy to find; for example, in France's food industry, studies are often carried out to optimise the nutritional quality of processed foods. Because manufacturing processes tend to degrade the nutritional value of raw materials, the final composition needs to be adjusted to favour certain nutrients and reduce others. A calculation establishes progress margins, incorporating technical, regulatory and cost constraints... and of course, taste constraints too! This is the method used to establish quantities of synthetic supplements to be added to obtain (for example) a product rich in folate (Vitamin B9) or iron... and to be able to state this fact on the packaging.

In recent Linear Programming technical developments, modern digital tools are making it possible to fully parametrize a linear function by adjusting coefficients to produce an algorithmic simulation of all possible solutions. This is an extension of the initial possibilities afforded by the simplex method, with considerably increased solving power. Modern personal computers offer more than enough calculation power to place linear programming in anyone's hands; all that is needed is to learn how to use the solver.

**15 - Factor Analysis**

Factor analysis is hardly a recent addition to the toolbox, either. Invented in 1904 by Charles Spearman for psychology studies, it has seen a number of developments since. Its first incarnation was Principal Components Analysis (PCA); then, from the 1960s onwards, Jean-Paul Benzécri invented and popularised Correspondence Analysis (CA). PCA remains the most widely used method today. From the 1970s onwards, these techniques have been implemented and incorporated as standard in flagship software in this domain, such as SAS (Statistical Analysis System) and SPSS (Statistical Package for the Social Sciences).

Factor analysis is the name given to a collection of statistical techniques for analysing data sets arranged into rows and columns on one or more axes. It forms the core of surveying and polling techniques for analysing and processing collected data.

These data generally relate to a population of individuals or group of organisations whose characteristics we wish to analyse, either by comparing or by opposing them.

Logically, the starting point for this type of analysis is one or more large tables which need to be manipulated in order to view and rank the information. Schematically, each row is imagined as representing an individual, and the columns as bearing the characteristics or variables associated with that individual, representing axes of analysis.

Such data quickly becomes hard to interpret when there is too much redundancy between several variables, or when a number of variables belong to the same categories scattered throughout the table. The primary goal is therefore to reduce their complexity by distributing them over a limited number of dimensions.

PCA is an exploratory method to be used when it is not clear in advance which categories to break the data down into: indeed, its purpose is to enable correlated variables to be grouped into a single main component. This operation involves the construction of a correlation matrix, with scores (or weightings) calculated. It is also a good choice when faced with heterogeneous data. Another of its purposes is to find latent variables; that is, variables which are not directly observable, but which are thought to influence the answers given in surveys or polls.

For example, if the subject in question is higher education study paths for a set of high schools, literary, scientific and technical spheres, etc., can be grouped together with the aim of determining the impact of the geographical location of the relevant schools as a latent variable.

The following operation consists of redistributing data on several axes to make them more readable by omitting non-relevant axes; this is referred to as a rotation. The main aim is to obtain analysis axes which are independent of one another.

When using CA, the approach is significantly different, and assumes that all our data items can be compared with one another. The first step is to perform an independence calculation. In our case (a study of high schools), we will be able to establish an average distribution of study paths for the group as a whole. For example, 50% of all high school students choose to pursue scientific studies, 30% literary studies and 20% technical studies. This allows us to posit theoretical results. The entire school-by-school analysis is then based on the identification and assessment of variables that deviate from the theoretical result. Each time a deviation is found, a dependency coefficient is calculated. If the deviation is nil, the variable is said to be independent. Using the coefficients, with knowledge of the overall result, it is possible to break down the initial table and group all high schools with the same deviation profile, and thus (for example) identify schools that have a greater than average tendency towards literary studies, and identify factors that encourage such a trend.

In Factor Analysis with SPSS (named PASW since its acquisition by IBM), the first operation is to reduce data complexity and extract the relevant axes of analysis.

Digital tools make factor analysis much more accessible for those without extensive knowledge of statistical mathematics. Users can experiment and focus on the proper use of the method, without concerning themselves about the manner in which results are obtained. In SPSS, a single integrated interface is used to manipulate data simply and easily. There are then dedicated menus for the factor extraction stage, and for simulating rotations. Similarly, in SAS, a step-by-step approach offers users a full guide throughout the variable reduction procedure.

There have been a number of theoretical developments in factor analysis since the early 2000s, particularly in the field of image processing, with each pixel being considered as a component of a two-dimensional matrix.

Over a few years, there has been a proliferation of extensions to the method, tailored to various data forms:

MCA or Multiple Correspondence Analysis (qualitative variables), FAMD or Factor Analysis of Mixed Data (quantitative + qualitative variables), MFA or Multiple Factor Analysis (variables structured into groups), and HFMA or Hierarchical Multiple Factor Analysis (variables arranged into themes / sub-themes). This trend is evidence of the constant pursuit of ever-greater precision and rigour in this field of study. MCA is now the basic default method for processing opinion surveys based on MCQs (Multiple Choice Questionnaires)... and other fruitful developments are sure to follow.

**16 – Neural Networks Analysis & Advanced Analytics: Machine Learning, Deep Learning et Cognitive Computing**

Neural networks undoubtedly represent the future of analytics with a capital A. However, it has taken considerable time and effort to get to where we are today. Since their invention in the 1950s, they have always fuelled passions, fantasies, procrastination – and even rejection – by turns. It has been a long and convoluted journey that has finally led, in the last ten years, to the development of Machine Learning in the first instance, followed by Deep Learning, and now Extreme Learning Machines.

A neural network consists of layers of interconnected artificial cells. Each layer resembles a grid, with rows and columns, in which cells are placed. Each cell simulates the operation of a human neuron, from whence they derive their name, i.e. the cell contains one or more input signals, a single output signal, and its own individual excitability threshold; the neuron is said to have a synaptic weight.

The principle of connectivity is that the output of a neuron in one layer acts as one of the inputs to a neuron in the immediately adjacent layer. In the most advanced networks, it is possible to have "fully connected" layers in which each neuron is connected to every neuron in the adjacent layer.

The Perceptron – the first neural network, built in 1958 – comprised three layers: an input layer, an internal layer and an output layer. Since then, all "traditional" networks have been formed of three to five layers, i.e. containing between one and three internal layers.

The Perceptron Mark 1 machine was able to recognise 400-pixel images with a 20x20 grid of interconnected photoelectric cells. It was built by Frank Rosenblatt in 1958. In the foreground is the patch panel, for wiring the input combinations. On the right are the racks of potentiometers for varying the weights of the cells.

The most recent networks are able to contain a large number of internal layers referred to as "deep layers". For example, in 2015 Microsoft developed a network containing more than a hundred layers. Such a network is then described as "extremely deep". Today, architectures are evolving very quickly, with the most sophisticated systems using multiple networks working in tandem.

To operate correctly and produce the expected results, a neural network must follow a learning process. In the traditional approach, the network parameters (the synaptic weights) are initialised randomly before entering a supervised learning phase. This means that the final parameters are developed step by step by an expert or by an entire team. In new "Deep Learning" approaches, with networks boasting a large number of layers, learning for each layer is supervised, after which the network learns "on its own". The approach can also be reversed, starting by the unsupervised pre-training of each layer, followed by the supervised final optimisation of parameters for all layers. In all cases, autonomous learning requires a very large volume of data samples.

A real technical milestone was reached in 2005 when neural networks began to use graphics processing units (GPUs) instead of the traditional central processing units (CPUs). The result was an immediate performance boost. GPUs offer computing power at low cost, and thanks to their architecture, it is easy to use several (or even many) for massively parallel distributed processing.

There are numerous applications for neural networks, starting with traditional regression or classification problems, for which they have been shown to be more refined and precise than statistical techniques. The next set of preferred applications involve recognition: of shapes, language, writing, etc., and the detection of weak signals or anomalies.

Research for each class of use is progressing well, regularly producing new generations of architecture and technical solutions. A few recent innovations have attracted attention.

Convolutional Neural Networks (CNNs) are used mainly for image processing and recognition. Convolution is a matrix-based calculation process, in which an image is formed of a matrix of pixels and each pixel equates to a numerical value which can be interpreted or recalculated. An image can then be fully mapped, cut into tiles, searched for specific elements, described, recomposed, etc.

This particular area of image recognition has seen considerable progress thanks to the ILSVRC (ImageNet Large Scale Visual Recognition Challenge), also known as the ImageNet Contest.

ImageNet is the largest image database dedicated to research into computer vision and pattern recognition. Created at Princeton University and presented for the first time in 2003, it now contains more than 14 million images divided into 20,000 categories.

The aim of the ILSVRC competition, launched in 2010, is to pit the world's best networks against one another. The challenge is to identify 1,000 discrete classes of objects in images with the lowest possible error rate. In the beginning, in 2010, this figure stood at an average of around 25%. In 2012, the AlexNet CNN beat this result, lowering the error rate to 15.3%. It was itself overtaken in 2014 by Microsoft's Very Deep CNN, which offered a single-figure error rate. It is now possible to conceive of applications and industrial development for these systems.

And for "animated" images, a development of CNN known as CNN-LSTM specialises in activity recognition and video description.

Recurrent Neural Networks (RNNs) are distinctive in that they loop some or all of their output signals back to the input layer – hence the "recurrent" part of their name. They are, in a sense, able to produce their own input themselves. This gives them a very dynamic pattern of behaviour, and they are used in particular for real-time writing or speech recognition.

Generative networks, as their name suggests, are capable of generating original content for a collection of samples supplied as input. Of course, this collection requires a significant volume of data before the network can produce useful results.

Many networks are "generative" in addition to their other characteristics.

GANs (Generative Adversarial Networks) are generative networks which learn in adversarial fashion. Adversarial learning is a type of entirely unsupervised learning which consists of pitting two networks against one another: a generator network produces samples, while a discriminator (adversary) network detects whether the samples are real, or produced by the generator.

This type of network is, for example, used in bioinformatic and pharmaceutical research to create new molecular structures.

ARTs are generator networks specialising in the automatic generation of music or images.

Lastly, VAEs (Variational AutoEncoders) generate encoders and decoders, and are used to generate images and improve learning.

And, of course, a whole collection of variants and possibilities can be created by combining the different network types: MLP (Multi Layer Perceptron), HCNN (High-performance Convolutional Neural Network), FCN (Fully Convolutional Network), C-RNN-GAN (Continuous Recurrent Neural Network with adversarial training), CNN LSTM (Convolutional Neural Network Long Short Term Memory network), and LRCN (Long Term Recurrent Convolutional Network)… to name but a few.

All these recent developments belong to the class known as Extreme Learning Machines (ELM). Deep Learning currently appears to be the most promising approach, particularly for its ability to manage without a previous model when producing results. This gives it genuine viability in economic terms, because the machine works alone, and also tackles complex problems to be analysed using conventional methods.

The only limits are, on the one hand, the need to have access to large volumes of data, and on the other, the fact that – for now – neural networks have no memory. They must therefore repeat every calculation, even if the same inputs are submitted twice in a row. In fact, they do precisely what they were programmed to do.

In future, a key challenge will be to develop increasingly powerful learning strategies in order to implement networks that are able to learn with a minimum of human intervention, and thus automatically absorb more and more use cases.

**17 - Meta-Analytics – Literature Analysis**

Meta-analytics consists of aggregating pre-existing studies on the subject of a given problem to produce a full report into the available knowledge and conclusions (so called literature). An initial analysis in this genre was carried out in 1955 by Henry K. Beecher on the subject of a medical treatment. This first step marked the start of its large-scale and increasingly widespread use in clinical studies. Its techniques and methodology were then significantly improved in the 1970s, with the term "meta-analytics" itself being invented in 1976.

In the beginning, these studies were quantitative only, but from 1994 onwards, qualitative meta-analyses began to appear. Their main purpose was to establish whether or not a problem required additional studies to be conducted. Its secondary goals were, on the one hand, to produce an overall interpretation of studies already carried out, and on the other hand, to detect possible method biases in these pre-existing studies.

In any event, the study protocol needs to be reproducible. The main benefit of a meta-analysis is the increase in the number of cases the study covers, with the re-use of already available data and results. In a way, the aggregation of all these data results in a statistical power greater than that obtained from studies conducted separately. The intended goal is either the creation of new knowledge, or the re-interpretation/checking of existing knowledge.

Of course, meta-analytics has now expanded beyond its original purely medical scope, and is being applied to all areas of research to guide and plan programmes and to inform investments, particularly in biotechnologies, energy, logistics and finance.

In its most recent development, meta-analytics has been augmented using "meta-heuristic" techniques, and the two are now fully merged.

Meta-heuristics are a class of methods developed in the 1960s and 1970s to understand and efficiently solve problems that are too big, or too complex, for traditional optimisation methods; this includes many real-life problems encountered in the worlds of business, science and industry.

This new incarnation of meta-analytics is therefore a set of advanced techniques specially tailored to the complexity of the real world. It combines and encompasses optimisation, prediction and machine learning techniques. At the heart of meta-analytics lie cutting-edge developments in terms of algorithms, such as evolutionary algorithms, Tabu search, swarm intelligence and memetic algorithms.

Meta-analytics research is now very active, with many topics of investigation: the creation of extensions to existing methods, performance improvements, the exploration and refinement of promising ideas which have not been sufficiently investigated, the development of new research proposals, the creation of specialist tools and interfaces for better interpretation of results, the development of more comprehensive methods to solve higher-level problems, the analysis and explanation of how alternative approaches work, and the identification of areas in which meta-analytics offers significant advances.

All of this work in the field of meta-analytics now foreshadows the next generation of analytics, in which systems are natively hybrid (hybrid by design) and use a wide range of tools and techniques to deduce useful information from data. These hybrid systems are able to combine a number of experimental designs and methods within the same system ("system of systems").

There are a wide range of promising fields of application:

- multidisciplinary research (NBICS, nanotechnologies, biotechnologies, information technologies and cognitive sciences) in robotics, life sciences,
- In the health sector, epidemiological studies, psychiatric and psychological studies, health reference works (DSM),
- In economics, models of production, distribution and consumption of goods and services,
- In finance, risk assessment,
- etc.

The Random Effects Model is one of the main methods for aggregating statistical data from several studies by harmonising the local effects associated with divergences between locations, contexts and populations (participants, panels) for each isolated study. The vertical grey line is the average result of all studies.

Meta-analytics has the ability to compare results from different studies, thus enabling the construction of models that cover all results, the identification of sources of disagreement and divergences, or the identification of other interesting phenomena that only emerge from the perspective of multiple studies.

It is then important to avoid falling into one of the biases often seen with this type of large-scale study.

The first – and best-known – of these is publication bias, which consists of looking at only work and studies that have been officially published, rather than all the available results.

The other – and sadly more insidious – problem is the possibility of a conflict of interest in the agenda of a meta-analysis (or meta-study), if it is commissioned for a parliamentary deadline, by a pharmaceutical or an industrial lobby, or for a political party with a known stance on the subject.

In 1998, a US federal judge annulled chapter 1 to 6 and all the appendices in an EPA (Environmental Protection Agency) study concerning cancer risks associated with passive smoking, because they excised nearly half of the studies produced on the subject - those contradicting the expected conclusions ("Respiratory Health Effect of Passive Smoking: Lung Cancer and other Disorders").

The way meta-analytics has developed leaves it very well equipped to play a key role in the future of analytics – or at least in one of its possible futures. However, let's not forget that this represents a further step towards complexity, with all the difficulties of technical development and understanding entailed therein.

**Conclusion**

With this grand tour of the 17 most widespread types of analytics around the world, it would appear at first sight that this concept of analytics, while omnipresent, is not at all indicative of the nature of the methods and techniques employed on a case-by-case basis to analyse and exploit data. In truth, one might even say that one of the functions of this word "analytics" is to cover over the reality of practices in this sphere, and keep them at arm's length. And this reality is a multifaceted set of varied techniques, each developed to a high level of sophistication, and reserved for specialists.

My provisional conclusion is that with the inexorable march towards ever-greater data volumes, we also need a precise vision of the effective possibilities of such analytics; that is, of what they are, and are not, capable of doing. This therefore means making progress in our shared understanding of these special technologies, enabling us to move towards a much more evenly-shared analytics culture.

A quick visit to Google Trends also provides an enlightening insight into trends since 2004: we clearly see the appearance and early days of Sentiment Analysis and Data Visualisation (in green and purple), but the most striking phenomenon is an ever-growing interest in the Neural Network (in red), which has marched emphatically ahead of Data Mining (in blue), although the latter is still very much with us.

The fact that the interest in neural network technologies has not waned is because they form the heart of Artificial Intelligence (AI), unanimously tipped to be a major topic in the 21st century and beyond. The most favourable estimates (in this case, produced by the Chinese government) show around 200,000 professionals currently active in the AI sphere worldwide, plus an estimated 150,000 students currently in training. The largest contingents are in the USA and China. However, a note of caution: if we consider only genuine "high-level" experts, this figure falls to 22,000.

By way of comparison, GitHub – acquired by Microsoft in 2008 – is used by 31 million programmers. A reminder: GitHub is currently the leading platform for hosting and managing the development of open-source software, mainly in JavaScript. In other words, it would be well wide of the mark to talk of large-scale industrial deployments of AI: that would require at least a tenfold increase in the number of experts!

This is clear evidence that the implementation of a Deep Learning system is not within everyone's reach. Indeed, when implementing its SkyWise offer, no less a key player than Airbus was obliged to partner with Palantir, a US company with its share of controversy, but with unique expertise in predictive algorithms.

The immediate future of analytics is currently highly constrained, and shaped by two key factors: an educational factor concerning the mass training of the researchers and professionals of the future; and a cultural factor as we learn to live with analytics, rather than under them, and learn to interpret them and use them for our benefit.

And you may be interested in our offer Big Data.

Let’s have a chat about your projects.