Best Python Libraries for Machine Learning
Top 10 python libraries for machine learning in 2021
According to Stack Overflow Developer Survey 2020 ,Python is the Most Wanted Programming Language In 2020.
Python is the most widely used programming language in data science and machine learning projects in the real world. Python is utilized in a variety of machine learning applications because of its readily available and easy to use libraries.
In this post, we will talk about the most popular Python libraries for machine learning.
Table of contents
· NumPy
· Pandas
· Matplotlib
· Seaborn
· Scikit learn
· XGBoost
· Open CV
· TensorFlow
· NLTK
· Spacy
· Conclusion
NumPy
NumPy is the fundamental package for array computing with Python.
NumPy provides a multidimensional array object and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<class 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<class 'numpy.ndarray'>
Pandas
Powerful data structures for data analysis, time series, and statistics
Pandas provides fast, flexible, and expressive data structures designed to make working with structured and time series data both easy and intuitive. It is the the most powerful and flexible open source data analysis tool available for doing practical, real world data analysis in python.
>>> import pandas as pd
>>> data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],
'Age':[28, 34, 29, 42]}
>>> df = pd.DataFrame(data)
>>> print(df) Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Matplotlib
A comprehensive library for creating static, animated, and interactive visualizations
Matplotlib provides programmers with an object-oriented API to integrate graphs and plots into their programmes using standard GUI toolkits such as Qt, wxPython, GTK+, or Tkinter. It helps in the creation of high-quality two-dimensional plotting graphs and figures in a variety of formats.
import matplotlib.pyplot as plt
import numpy as npx = np.linspace(0, 2, 100)plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')plt.xlabel('x label')
plt.ylabel('y label')
plt.title("Simple Plot")
plt.legend()
Seaborn
Seaborn is built on top of matplotlib and closely integrated with pandas data structures for making statistical graphics in python
Seaborn’s charting functions work with dataframes and arrays containing entire datasets, performing the necessary semantic mapping and statistical aggregation internally to generate useful graphs. Its dataset-oriented, declarative API allows you to concentrate on the meaning of your charts rather than the mechanics of drawing them.
# Import seaborn
import seaborn as sns
# Apply the default theme
sns.set_theme()
# Load an example dataset
tips = sns.load_dataset("tips")
# Create a visualization
sns.relplot(
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size",
)
Scikit learn
Scikit-learn is an open-source Python library for machine learning and data mining
Scikit-learn is the most usable and robust machine learning package in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction, etc. It is built on top of NumPy, SciPy and Matplotlib.
>>> from sklearn import datasets
>>> from sklearn.svm import SVC
>>> iris = datasets.load_iris()
>>> clf = SVC()
>>> clf.fit(iris.data, iris.target)
SVC()
>>> list(clf.predict(iris.data[:3]))
[0, 0, 0]
>>> clf.fit(iris.data, iris.target_names[iris.target])
SVC()
>>> list(clf.predict(iris.data[:3]))
['setosa', 'setosa', 'setosa']
XGBoost
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable
It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. It is the most successful algorithm for winning data science competitions.
import xgboost as xgb# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')# specify parameters via map
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' }
num_round = 2bst = xgb.train(param, dtrain, num_round)# make prediction
preds = bst.predict(dtest)
Open CV
Open cv is an open-source library for computer vision and image processing
OpenCV is a powerful library for computer vision tasks. We can perform many tasks using the OpenCV library such as face detection, face recognition, blob detection, edge-detection, image filter, template matching, and etc. We can perform real time image and video processing with OpenCV.
# Example for drawing different geometric shapes with OpenCV
import numpy as np
import cv2 as cv# Create a black image
img = np.zeros((512,512,3), np.uint8)# Draw a diagonal blue line with thickness of 5 px
cv.line(img,(0,0),(511,511),(255,0,0),5)cv.rectangle(img,(384,0),(510,128),(0,255,0),3)
cv.circle(img,(447,63), 63, (0,0,255), -1)
cv.ellipse(img,(256,256),(100,50),0,0,180,255,-1)
pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
pts = pts.reshape((-1,1,2))
cv.polylines(img,[pts],True,(0,255,255))
font = cv.FONT_HERSHEY_SIMPLEXcv.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv.LINE_AA)
TensorFlow
TensorFlow is an open source machine learning framework for everyone.
TensorFlow is an end-to-end open source machine learning platform. It has a comprehensive, flexible ecosystem of tools and libraries that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization.
# Example for creating a deep learning model in tensorflow
# define model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(3, activation='softmax'))# compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)# evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Test Accuracy:', acc)
NLTK
The Natural Language Toolkit (NLTK) is a Python package for natural language processing.
NLTK is a popular Python package for working with human language data. It includes a set of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, as well as wrappers for industrial-strength NLP libraries.
# Example for Tokenize and tag some text>>> import nltk
>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good.""">>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',
'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']>>> tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
('Thursday', 'NNP'), ('morning', 'NN')]
Spacy
Industrial-strength Natural Language Processing (NLP) in Python
spaCy is a library for advanced Natural Language Processing in Python.It includes cutting-edge speed and neural network models for tagging, parsing, named entity recognition, text classification, and more, as well as a production-ready training system, and simple model packaging, deployment, and workflow management.
# Example for Named Entity Recognition in spacyimport spacynlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)# output
Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY
Conclusion
We have seen the most popular python libraries for machine learning. By knowing these libraries one can solve any machine learning , computer vision and natural language processing tasks easily and quickly.
Happy machine learning!!
Follow me on medium: Ashok kumar Palivela