Machine Learning – Scale

Today we are covering Python ML scale with some sort of examples in hopes of meeting the needs of the learners.

Table Of Content-

Machine Learning – Scale:
- Features of scale:
- Predict Performance:

Features of scale

You can have a hard time comparing data if it has different values and measurements.

How do pounds (lbs) compare to kilos (km)?

How about altitude versus time?

Scaling is the solution to this problem. Scaling data allows us to compare new values more easily.

Check out the table below, where liabilities and Performance are represented in USD.

Firm	Depart	employees	liabilities	Performance
Amazon Inc	warehouse	30000	90000000	89
Amazon Inc	IT	5000	45000000	200
Amazon Inc	Support	10000	25000000	400
Apple Inc	Designer Dep	1000	11000000	80
Apple Inc	Audit	500	1750000	90
Apple Inc	Tech	25000	187500000	130
BlackRock Inc	Advisors	5000	91500000	90
BlackRock Inc	Analysts	2000	56300000	134
BlackRock Inc	Tech	7500	18000000	100
BlackRock Inc	Sales	10000	34500000	66
BlackRock Inc	Consultants	22000	97680000	76
China Petroleum & Chemical Corp. (SNP)	Mechnical	75000	166500000	68
China Petroleum & Chemical Corp. (SNP)	Research	2500	12500000	30
China Petroleum & Chemical Corp. (SNP)	Supply	110000	220000000	240
CVS Health Corp	Research	10000	92300000	30
CVS Health Corp	Maintainace	5000	8390000	69
CVS Health Corp	PharmD	2500	8500000	508
Google Inc	AI	25000	49975000	157
Google Inc	Advert	40000	180000000	782
Google Inc	Research	10000	134300000	52
Google Inc	Finance	5000	14000000	89
Microsoft Inc	Research	24000	244800000	50
Microsoft Inc	AI	5000	468000000	29
Microsoft Inc	OS	27000	151200000	1130
Royal Dutch Shell PLC	Research	15000	184500000	40
Royal Dutch Shell PLC	Finance	10000	43800000	78
Tesla Inc	Engineer	8000	48000000	210
Tesla Inc	Assemble	17000	81600000	330
Tesla Inc	Finance	3000	10500000	99
Tesla Inc	Advisors	1000	18860000	40
Tesla Inc	Audit	1300	6501300	94
Wallmart Inc	Supply	134000	368500000	566
Wallmart Inc	Finance	100000	335700000	79
Wallmart Inc	warehouse	155000	376960000	198
Wallmart Inc	Tech	200000	800000000	164
Wallmart Inc	Support	360000	1620000000	303

Using comparable values, we can easily see how much one value is compared to another if we scale the liabilities 90000000 with the employees 30000.

Scaling data can be achieved in a variety of ways. We will use a method called standardization in this tutorial.

This formula is used in the standardization method:

z = (x - u) / s

In this equation, z represents the new value, x represents the original value, u represents the mean, and s represents the standard deviation.

Based on the above data set, the first value of liabilities is 90000000, and the scaled value is:

(90000000 - 175100452.77) / 293691058.741973 = -0.28976181

According to the data set above, the first value of the employees column is 3000, and the scaled value is:

(30000 - 40647.222222222) / 72005.90441111 = -0.147865961

Instead of comparing 90000000 with 30000, you can now compare -0.28 with -0.14.

There are several common utility functions and transformer classes in the sklearn.preprocessing package for converting raw feature vectors into a format suitable for downstream estimation.

The standardization of data sets generally benefits learning algorithms.

Python sklearn module has a method called StandardScaler() that returns a Scaling object with methods for transforming data sets.

Scaling should be done as follows:

Example

import pandas from sklearn import linear_model from sklearn.preprocessing import StandardScaler scale = StandardScaler() vardf = pandas.read_csv("dataset.csv") mrx = vardf[['liabilities', 'employees']] scaledmrx = scale.fit_transform(mrx) print(scaledmrx)

Result:

As you can see, the first two values correspond to our calculations: -0.28 and -0.14.

[[-0.28976181 -0.14786596]
[-0.44298404 -0.49505971]
[-0.51108281 -0.42562096]
[-0.55875195 -0.55061071]
[-0.59024763 -0.55755459]
[ 0.0422197 -0.21730471]
[-0.2846544 -0.49505971]
[-0.40450824 -0.53672296]
[-0.53491738 -0.46034034]
[-0.4787359 -0.42562096]
[-0.26361188 -0.25896796]
[-0.02928401 0.47708279]
[-0.55364455 -0.52977909]
[ 0.1528802 0.96315404]
[-0.28193045 -0.42562096]
[-0.56763884 -0.49505971]
[-0.5672643 -0.52977909]
[-0.42604447 -0.21730471]
[ 0.01668266 -0.00898846]
[-0.13892303 -0.42562096]
[-0.54853714 -0.49505971]
[ 0.23732267 -0.23119246]
[ 0.99730495 -0.49505971]
[-0.08137957 -0.18952921]
[ 0.03200488 -0.35618221]
[-0.44706997 -0.42562096]
[-0.43276923 -0.45339646]
[-0.31836329 -0.32840671]
[-0.56045442 -0.52283521]
[-0.53198914 -0.55061071]
[-0.57406975 -0.54644439]
[ 0.65851357 1.29646004]
[ 0.54683159 0.82427654]
[ 0.68731935 1.58810279]
[ 2.12774454 2.21305154]
[ 4.91979413 4.43509154]]

Predict Performance

Scaling the data set will require you to use the scale when predicting values:

Predict the performance results of 10 employees whose liabilities are 100,000 dollars

Example

import pandas from sklearn import linear_model from sklearn.preprocessing import StandardScaler scale = StandardScaler() df = pandas.read_csv("dataset.csv") mrx = df[['liabilities', 'employees']] ample = df['Performance'] scaledmrx = scale.fit_transform(mrx) regr = linear_model.LinearRegression() regr.fit(scaledmrx, ample) scaled = scale.transform([[100000, 10]]) predictedPerformance = regr.predict([scaled[0]]) print(predictedPerformance)

Result:

scale prediction result

Now you know:

When working with many machine learning algorithms, data scaling is recommended as a pre-processing step.
Input and output variables can be normalized or standardized to achieve data scaling.
Standardization and normalization can be applied to improve the performance of predictive modeling algorithms.

We value your feedback.

Machine Learning – Scale

Features of scale

Example

Result:

Predict Performance

Example

Result:

Now you know:

Leave a Reply Cancel reply

Feeling bored?