InData Labs data science experts Denis Pirshtuk and Denis Dus talked to Bel.Biz about how InData Labs solves business problems based on our own algorithms, using advanced technologies in the fields of big data and data science. Denis Pirshtuk and Denis Dus spoke about the prospects for the world of big data and shared their advice on how to organize the work with big data to move your business forward. Enjoy the interview below.
According to the company’s experience, many business owners are confused because of the big data boom. They want to introduce the technology into their business but do not know how. “We have to explain that this is not a magic elixir and it is necessary to wisely accumulate the information. The strategy of “collecting absolutely everything” does not bring valuable insights in most of the cases.”
You Need a Strategy to Collect Data
– Each year the volume of data collected in the world is growing by about 30%, and since 2006, the amount of information has increased by 30 times. The companies that benefited from big data had a good data strategy in the first place.
Social media contain 25% of all the Internet data. Such volumes of unstructured data can not be fully analyzed manually. In order to extract valuable insights from such data it is necessary to collect the data and automatically pre-preprocess it – conduct sentiment analysis, extract named entities (proper names), determine the subject of the statement, etc. On the basis of preprocessed data, a specialist can compose reports that help to make business decisions and implement changes in a service or a product.
Now a lot of attention is paid to image processing. People share a lot of personal information on social media. Instagram of an ordinary person entirely reflects their life preferences: where they go, what they eat, with whom they communicate. There is a lot of data of such kind, and it is not structured in any way. Extraction of such information can help improve many business processes, for example, targeted advertising.
The processes of collecting and analyzing big data are changing. Now we need to create models that capture complex patterns, but at the same time can scale to large audiences of a million or ten million active users. This requires a mathematical base and deep knowledge of software engineering. Badly written code simply does not work, since there is much more data available.
The demand for data scientists is very high at the moment, especially those who are developers with knowledge of machine learning. The market is looking for people who know how to deeply analyze data and have strong engineering skills.
However, it is impossible to create a universal solver. So far all existing applications are associated with weak artificial intelligence (AI). Algorithms in these applications are aimed at a specific task. The same with Siri, it consists of a chain of weak AIs that can, show an exchange rate or find something on the Internet. Self-driving cars also include many models that are trained to calculate the distance to the nearest obstacle.
But due to the fact that each model is only a weak AI, the work of data scientists is still valuable. Creation and implementation of truly innovative solutions require a lot of analytical, mathematical and technical skills. People need to explore data, design algorithms and write programs, providing them with data, and finally getting trained predictive models – weak AIs.
When you have tens or hundreds of gigabytes of data and it is scattered and unstructured, you can use deep learning – a method that helps you to extract valuable knowledge directly from raw data. It works the following way: the input layer consists of raw data, and at the output layer there is a target variable, the one we want to predict. Next, an End-To-End system is constructed. A human doesn’t participate in the process a lot, influencing only the structure of the neural network and the parameters of its learning. The system is computationally complex, but it allows to process the data that was hard or even impossible to work with before, especially, texts, videos, and images.
We use deep learning to work with multidimensional sparse time series, where it is very difficult to look for dependencies because such series often do not fall under any well-known mathematical models. They are not stationary, highly variable, and often have very non-trivial periphery components. This is the only way to extract useful information in the shortest time possible. In general, the intelligent use of both classical and deep learning shows good results.
Individual Approach is Important in Data Science
Data analysis can solve classic business problems. The most common tasks often include segmentation of the client base and choosing the right strategy for working with them. The main goal of such tasks is to deeper understand the audience. It makes it possible to predict customer churn and understand whether the consumption level will decrease or cease altogether. Recommender systems are also popular – data helps to build a collaborative filtering model that shows individual recommendations to each user.
The best example of using big data is advertising. The market has radically changed, personalized Internet advertising has replaced billboards and television. It accumulates a lot of knowledge about the user, and the fee is calculated not for impressions, but for the number of clicks on the ad, image or banner.
It is important to approach the solution of each big data problem individually. It happens that product data-driven companies (especially startups) “copy” each other and provide similar solutions to the same problems. In a growing market, even with such “tactics”, it is quite possible to survive. But copying each other is unreasonable – it is better to look for a new niche or to introduce a fundamentally new functionality.
Data Encryption is a Modern Dilemma
Now you can make the data access process very complicated or even impossible. But there is always a human factor – hackers do not pick up passwords, they use vulnerabilities. From a mathematical point of view, the systems themselves and the encryption tools used are normally very secure. But it is important to make sure there are no other weak points left associated with human irresponsibility.
Experts argue about ethical issues associated with using certain personal data and providing it to third parties. For scientific and technological progress it is necessary to work with large amounts of information, and any restrictions prevent progress.
On the other hand, a lot of experts promote full privacy, when it is impossible to obtain any information (for example, someone else’s correspondence) through end-to-end encryption.
There is a contradiction: if we encrypt all the information transmitted on the Internet, it will be difficult to fight crime, such as espionage and terrorism. It is necessary to seek compromise solutions – to protect personal data about each specific person, but to use information in a generalized form for security measures.
As a big data consulting company, we benefit from working with established business, since the solutions in this area are more or less similar to each other. It is clear right away what to do when you need to implement customer segmentation, app personalization or recommender system. Each client receives a personal solution, but the same knowledge is used to prepare it. This is very cost-effective as new projects are implemented faster.
Working with startups involves a more custom-tailored approach. As a rule, startups do not have large volumes of data. InData Labs consults all projects but is capable to deliver the best results only when companies have enough data.
It does not matter what stage the startup is at. The main thing is that it can collect enough of necessary data while being still quite young. It also happens so that rather experienced teams come to us from specific spheres where it is difficult to get information – for example, finance and insurance. In this case, we can only advise how to collect big data or find a partner who will provide it. But we are still happy to work on a data strategy for such company.