Categories

Category: Data Engineering

  • Keys to building robust data infrastructure for a data science project Keys to building robust data infrastructure for a data science project

    Ones you decide to leverage data science techniques in your company, it is time to make sure the data infrastructure is ready for it. Starting a data science project is a big investment, not just a financial one. It involves a lot of time, effort, and preparatory work. Data science is about leveraging a company’s data…

    Read More
  • Converting Spark RDD to DataFrame and Dataset. Expert Opinion. Spark RDD to DataFrame

    Generally speaking, Spark provides 3 main abstractions to work with it. First, we will provide you with a holistic view of all of them in one place. Second, we will explore each option with examples. RDD (Resilient Distributed Dataset). The main approach to work with unstructured data. Pretty similar to a distributed collection that is…

    Read More