awesome-feature-engineering is an Open Source & Models tool that provides a curated list of resources for feature engineering techniques in machine learning. It helps users quickly find relevant resources to improve their skills.
awesome-feature-engineering is a comprehensive, curated list of resources dedicated to various feature engineering techniques essential for machine learning. This open-source repository covers a wide array of data types, including numeric, textual, image, categorical, time series, and geospatial data. It provides links to relevant libraries, articles, and tutorials for methods such as scaling, ranking, quantization, Box-Cox transformation, feature interactions, clustering, t-SNE, PCA, Bag of Words, TFIDF, word embeddings, one-hot encoding, count encoding, label encoding, mean encoding, hashing, rolling window features, and lag features. Maintained by Andrei Khobnia, this resource is invaluable for data scientists and machine learning engineers looking to enhance their feature engineering skills and find practical implementations.
Best used for
Ideal for data scientists and machine learning engineers who need to explore various feature engineering techniques, find practical implementations, and deepen their understanding of data preprocessing. Especially valuable for those looking for a structured collection of resources across different data types like numeric, textual, and image data.
What types of feature engineering techniques are covered?
The resource covers a wide range of techniques for numeric data (scaling, binning, transformations), textual data (Bag of Words, TFIDF, embeddings), image data (computer vision, deep learning features), categorical data (encoding methods), time series data (rolling window, lag features), and geospatial data.
Is this resource suitable for beginners in machine learning?
While comprehensive, the resource is best suited for users with an intermediate to advanced understanding of machine learning concepts. It provides links to specific techniques and libraries, assuming some foundational knowledge of data science and programming.
How can I contribute to the awesome-feature-engineering list?
The project is open-source and welcomes contributions. You can contribute by creating pull requests on the GitHub repository to add new resources, correct existing information, or suggest improvements to the curated list.