Automated Feature Engineering

Presented by Namita Lokare Feature engineering plays a significant role in the success of a machine learning model. Most of the effort in training a model goes into data preparation and choosing the right representation. In this talk, I will focus on a robust feature engineering method, Randomized U...

Full description

Bibliographic Details
Main Author: Salon, Data
Format: eBook
Language:English
Published: Data Science Salon 2019
Edition:1st edition
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
LEADER 02292nmm a2200301 u 4500
001 EB001910035
003 EBX01000000000000001072937
005 00000000000000.0
007 cr|||||||||||||||||||||
008 210123 ||| eng
100 1 |a Salon, Data 
245 0 0 |a Automated Feature Engineering  |h [electronic resource]  |c Salon, Data 
250 |a 1st edition 
260 |b Data Science Salon  |c 2019 
300 |a 1 video file, approximately 18 min. 
653 |a Vidéo en continu 
653 |a Vidéos sur Internet 
653 |a Internet videos / http://id.loc.gov/authorities/subjects/sh2007001612 
653 |a streaming video / aat 
653 |a Streaming video / http://id.loc.gov/authorities/subjects/sh2005005237 
041 0 7 |a eng  |2 ISO 639-2 
989 |b OREILLY  |a O'Reilly 
500 |a Mode of access: World Wide Web 
500 |a Made available through: Safari, an O'Reilly Media Company 
776 |z 0000000T91R98JO0 
856 4 0 |u https://learning.oreilly.com/videos/~/00000OOT9LR98JO0/?ar  |x Verlag  |3 Volltext 
082 0 |a E VIDEO 
520 |a Presented by Namita Lokare Feature engineering plays a significant role in the success of a machine learning model. Most of the effort in training a model goes into data preparation and choosing the right representation. In this talk, I will focus on a robust feature engineering method, Randomized Union of Locally Linear Subspaces (RULLS). We generate sparse, non-negative, and rotation invariant features in an unsupervised fashion. RULLS aggregates features from a random union of subspaces by describing each point using globally chosen landmarks. These landmarks serve as anchor points for choosing subspaces. Our method provides a way to select features that are relevant in the neighborhood around these chosen landmarks. Distances from each data point to k closest landmarks are encoded in the feature matrix. The final feature representation is a union of features from all chosen subspaces. The effectiveness of our algorithm is shown on various real-world datasets for tasks such as clustering and classification of raw data and in the presence of noise. We compare our method with existing feature generation methods. Results show a high performance of our method on both classification and clustering tasks