# Feature Engineering in Machine Learning

Feature engineering means transforming raw data into a feature vector. Expect to spend significant time doing feature engineering

## Mapping numeric values to Machine Learning Features

Mapping numeric values to features is easy. Integer and floating-point data don't need a special encoding because they can be multiplied by a numeric weight. As suggested in Figure 2, converting the raw integer value 6 to the feature value 6.0 is trivial:

## Mapping categorical values to Machine learning Features

Categorical features have a discrete set of possible values. For example, there might be a feature called street_name with options that include:

`{'Charleston Road', 'North Shoreline Boulevard', 'Shorebird Way', 'Rengstorff Avenue'}`

Since models cannot multiply strings by the learned weights, we use feature engineering to convert strings to numeric values.

One quick approach is to map our street names to numbers:

• map Charleston Road to 0
• map North Shoreline Boulevard to 1
• map Shorebird Way to 2
• map Rengstorff Avenue to 3
• map everything else (OOV) to 4

However, this approach has two disadvantages:

1. Our model assigns θ (weights) to features. We'll be learning a single weight that applies to all streets. For example, if we learn a weight of 6 for street_name, then we will multiply it by 0 for Charleston Road, by 1 for North Shoreline Boulevard, 2 for Shorebird Way and so on. It is unlikely that there is a linear adjustment of price based on the street name
2. We aren't accounting for cases where street_name may take multiple values. For example, many houses are located at the corner of two streets, and there's no way to encode that information in the street_name value if it contains a single index

So, to overcome these limitations, we can instead create a binary vector for each categorical feature in our model that represents values as follows:

• For values that apply to the example, set corresponding vector elements to `1`.
• Set all other elements to `0`.

The length of this vector is equal to the number of elements in the vocabulary. This representation is called a one-hot encoding when a single value is 1, and a multi-hot encoding when multiple values are 1

This approach effectively creates a Boolean variable for every feature value (e.g., street name). Here, if a house is on Shorebird Way then the binary value is 1 only for Shorebird Way. Thus, the model uses only the weight for Shorebird Way

`One-hot encoding extends to numeric data that you do not want to directly multiply by a weight, such as a postal code`

Similarly, if a house is at the corner of two streets, then two binary values are set to 1, and the model uses both their respective weights

## Sparse Representation of Features

Suppose that you had 1,000,000 different street names in your data set that you wanted to include as values for street_name. Explicitly creating a binary vector of 1,000,000 elements where only 1 or 2 elements are true is a very inefficient representation in terms of both storage and computation time when processing these vectors. In this situation, a common approach is to use a sparse representation in which only nonzero values are stored. In sparse representations, an independent model weight is still learned for each feature value, as described above

SQL.info