See how we are responding to COVID-19 and supporting our employees and customers

Data Scientist /NLP/

Currently, we are looking for a Data Scientist / Machine Learning Engineer who will be a part of the general-purpose data science team and work with tasks covering a wide variety of business needs with a soft focus on NLP applications.

In this position, you will work with multiple data sources (usually textual, numerical and time-related data), huge and small datasets to develop, validate and deploy machine learning models, tune their performance & integrate them into data processing pipelines.

Responsibilities:

  • Deal with both structured and unstructured data, collaborate with data engineers on defining data storage formats, state data collection requirements;
  • Not only solve technical tasks but understand business needs and offer appropriate solutions, describe a chosen approach to non-technical people;
  • Set up reproducible experiments: selection, training, validation and optimization of machine learning models, evaluation of their quality in business-related terms;
  • Integrate data preprocessing and model inference into general data processing pipelines;
  • Research new tools, papers, etc. in the machine learning area.

Requirements:

  • Strong knowledge and deep understanding of
    • Сlassical machine learning (linear models, decision trees, ensembles for classification and regression tasks, clustering and dimensionality reduction)
    • Main concepts and stages of the modelling process (validation scheme, regularization, overfitting and generalization, data leaks, feature selection, etc.)
  • Experience with Python scientific, visualization and ML-related libraries (numpy, scipy, scikit-learn, etc.)
  • Experience with different clustering techniques
  • Experience with classic NLP tools and techniques (nltk, spacy, n-grams, skip-grams, TF-IDF, tokenizers, lemmatization, dependencies parsing, etc.)
  • Experience with NN frameworks, NLP-related architectures and libraries (Pytorch / Tensorflow, HuggingFace, fasttext, flair, sentence transformersWord2Vec, ElMo, RNN, CNN, Transformer, BERT, etc.)
  • Experience in tuning pre-trained models for different NLP tasks
  • Good Python programming skills
  • Good spoken and written English (at least B1)
  • Ability and desire to convert raw business requests into strictly formulated machine learning tasks
  • Ability to formulate data gathering (or data labelling) requirements
  • Minimum 2-year experience in machine learning

Would be a plus:

  • Experience with relational databases and SQL, familiarity with non-relational databases (Cassandra, Elasticsearch, MongoDB, etc.)
  • Experience with distributed data processing (PySpark)
  • Experience with Cloud ML services (Amazon ML & SageMaker, Microsoft Azure ML & AI Platform, Google Cloud AutoML & AI Platform)
  • Experience in software engineering, deployment and integration with data delivery systems and other components, building microservices, providing APIs for the access to models
  • Experience in developing recommender systems, time series analysis
  • Experience with gradient boosting libraries (xgboost, LightGBM, CatBoost)
  • Experience with similarity search optimization (FAISS, NMS-LIB)
  • Experience in classic and deep learning computer vision
  • Experience in data collection (labelling) process setup using third-party or self-made tools
  • Participation in ML competitions (Kaggle, etc)
  • Masters, PhD, or equivalent experience in Mathematics or Computer Science.

You will work with smart people who love to solve hard problems, and who not only expect but also foster high performance.

If you fit the description above, we’d love to hear from you! Email us at hrm@indatalabs.com.