Goalist Developers Blog

Audio Classification using AutoML Vision

For a given audio dataset, can we do audio classification using Spectrogram? well, let's try it out ourselves and let's use Google AutoML Vision to fail fast :D

We'll be converting our audio files into their respective spectrograms and use spectrogram as images for our classification problem.

Here is the formal definition of the Spectrogram

A Spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.

For this experiment, I'm going to use the following audio dataset from Kaggle

www.kaggle.com

go ahead and download the dataset {Caution!! : The dataset is over 5GB, so you need to be patient while you perform any action on the dataset. For my experiment, I have rented a Linux virtual machine on Google Could Platform (GCP) and I'll be performing all the steps from there. Moreover, you need a GCP account to follow this tutorial}

Step 1: Download the Audio Dataset

Training Data (4.1 GB)

curl https://zenodo.org/record/2552860/files/FSDKaggle2018.audio_train.zip?download=1 --output audio_train.zip

upzip audio_train.zip

Test Data (524 MB)

curl https://zenodo.org/record/2552860/files/FSDKaggle2018.audio_test.zip?download=1 --output audio_test.zip

unzip audio_test.zip

Metadata (150 KB)

curl https://zenodo.org/record/2552860/files/FSDKaggle2018.meta.zip?download=1 --output meta_data.zip

unzip meta_data.zip

After downloading and unzipping you should have the following things in your folder
(Note: I have the renamed the folder after unzipping )

f:id:vivek081166:20190325191702p:plain

Step 2: Generate Spectrograms

Now that we have our audio data in place, let's create spectrograms for each audio file.

We'll need FFmpeg to create spectrograms of audio files

ffmpeg.org

Install FFmpeg using the following command

sudo apt-get install ffmpeg

Try it out yourself… go into the folder which has an audio file and run the following command to create its spectrogram

ffmpeg -i audioFileName.wav -lavfi showspectrumpic=s=1024x512 anyName.jpg

For example, "00044347.wav" from training dataset will sound like this

clyp.it

and spectrogram of "00044347.wav" looks like this

f:id:vivek081166:20190326105505j:plain

As you can see, the red area shows loudness of the different frequencies present in the audio file and it is represented over time. In the above example, you heard a hi-hat. The first part of the file is loud, and then the sound fades away and the same can be seen in its spectrogram.

The above ffmpeg command creates spectrogram with the legend, however; we do not require legend for image processing so let's drop legend and create a plain spectrogram for all our image data.

Use the following shell script to convert all your audio files into their respective spectrograms
(Create and run the following shell script at the directory level where "audio_data" folder is present)

gist.github.com

I have moved all the generated image file into the folder "spectro_data"

f:id:vivek081166:20190325192534p:plain

Step 3: Move image files to Storage

Now that we have generated spectrograms for our training audio data, let's move all these image files on Google Cloud Storage (GCS) and from there we will use those files in AutoML Vision UI.

Use the following command to copy image files to GCS

gsutil cp spectro_data/* gs://your-bucket-name/spectro-data/

f:id:vivek081166:20190325192818p:plain

Step 4: Prepare file paths and their label

I have created the following CSV file using metadata that we have downloaded earlier. Removing all the other columns, I have kept only the image file location and its label because that's what is needed for AutoML.

f:id:vivek081166:20190325193037p:plain

docs.google.com

You will have to put this CSV file on your Cloud Storage where the other data is stored.

Step 5: Create a new Dataset and Import Images

Go to AutoML Vision UI and create a new dataset

cloud.google.com

f:id:vivek081166:20190325193335p:plain

Enter dataset name as per your choice and for importing images, choose the second options "Select a CSV file on Cloud Storage" and provide the path to the CSV file on your cloud storage.

f:id:vivek081166:20190325193453p:plain

The process of importing images may take a while, so sit back and relax. You'll get an email from AutoML once the import is completed.

After importing of image data is done, you'll see something like this

f:id:vivek081166:20190325193633p:plain

Step 6: Start Training

This step is super simple… just verify your labels and start training. All the uploaded images will be automatically divided into training, validation and test set.

f:id:vivek081166:20190325193844p:plain

Give a name to your new model and select a training budget
For our experiment let's select 1 node hour (free*) as training budget and start training the model and see how it performs.

f:id:vivek081166:20190325193935p:plain

Now again wait for training to complete. You'll receive an email once the training is completed, so you may leave the screen and come back later, meanwhile; let the model train.

f:id:vivek081166:20190325194009p:plain

Step 7: Evaluate

and here are the results…

f:id:vivek081166:20190325194207p:plain

Hurray … with very minimal efforts our model did pretty well

f:id:vivek081166:20190325194242p:plain

Congratulations! with only a few hours of work and with the help of AutoML Vision we are now pretty much sure that classification of given audio files using its spectrogram can be done using machine learning vision approach. With this conclusion, now we can build our own vision model using CNN and do parameter tuning and produce more accurate results.

Or, if you don't want to build your own model, go ahead and train the same model with more number of node-hours and use the instructions given in PREDICT tab to use your model in production.

That's it for this post, I'm Vivek Amilkanthawar from Goalist. See you soon with one of such next time; until then, Happy Learning :)

goalist.co.jp

Choosing a Deep Learning Framework

Implementing deep learning algorithms from scratch using Python and NumPY is a good way to get an understanding of the basic concepts, and to understand what these deep learning algorithms are really doing by unfolding the deep learning black box.

However, as you start to implement very large or more complex models, such as convolutional neural networks (CNN) or recurring neural networks (RNN), it is increasingly not practical, at least for most of the people like me, is not practical to implement everything yourself from scratch.

Even though you understand how to do matrix multiplication and you are able to implement it in your code. But as you build very large applications, you'll probably not want to implement your own matrix multiplication function but instead, you want to call a numerical linear algebra library that could do it more efficiently for you. Isn't it?

The efficiency of your algorithm will help you fail fast 😃 and thus will help you to complete your iteration throughout the IDEA -> EXPERIMENT -> CODE cycle much more quickly. 

f:id:vivek081166:20190228134309p:plain

I think this is crucially important when you are in the middle of Deep Learning pipeline.

f:id:vivek081166:20190228134333p:plain

So let's take a look at the frameworks out there…

Today, there are many deep learning frameworks that make it easy for you to implement neural networks, and here are some of the leading ones.

f:id:vivek081166:20190228134407p:plain

Each of these frameworks has a dedicated user and developer community and I think each of these frameworks is a credible choice for some subset of applications. However, when I see the below graph my obvious choice goes for TensorFlow.

f:id:vivek081166:20190228134323p:plain

Well, I just said that I would choose TensorFlow. But, is only the above popularity scores matter while choosing a framework for your deep learning project? Turns out not… 

I think many of these frameworks are evolving and getting better very rapidly. If the framework scores top in popularity in 2018 then by the end of 2019 it may not hold the same position. 

There are a lot of people writing articles comparing these deep learning frameworks and how well these deep learning frameworks changes. And because these frameworks are often evolving and getting better month to month, I'll leave you to do a few internet searches yourself, if you want to see the arguments on the pros and cons of some of these frameworks.

So, how can you make a decision about which framework to use?

Rather than strongly endorsing any of these frameworks, I would like to share three factors that Stanford Professor Andrew Ng considers important enough to influence your decision.

1) Ease of programming

This includes developing, iterating, and finally, deploying your neural network to production where it may be used by millions of users.

2) Running Speeds

Training on large data sets can take a lot of time, and differences in training speed between frameworks can make your workflow a lot more time efficient.

3) Openness

This last criterion is not often discussed, but Andrew Ng believes it is also very important. A truly open framework must be open source, of course, but must also be governed well.
So it is important to use a framework from the company that you can trust. As the number of people starts to use the software, the company should not gradually close off what was open source, or perhaps move the functionality into their own proprietary cloud services.

But at least in the short term depending on your preferences of language, whether you prefer Python or Java or C++ or something else, and depending on what application you're working on, whether this can be division or natural language processing or online advertising or something else, I think multiple of these frameworks could be a good choice.

f:id:vivek081166:20190228135807p:plain

So that was just a higher level abstraction of deep learnig programming framework. Any of these frameworks can make you more efficient as you develop machine learning applications.

In a subsequent post, we'll take a step from zero → one to learn TensorFlow

That's it for this post, my name is Vivek from Goalist. See you soon with one of such next time; until then, Happy Learning :)

goalist.co.jp

センチメント分析用の言葉辞書と遊んでみました。綺麗なグラフをいっぱい描けました。

こんにちは、皆さん。ゴーリストのチナパです!

f:id:c-pattamada:20190228110449p:plain

こういうキャラらしいです。

以前はこちら http://developers.goalist.co.jp/entry/2018/11/16/150000 の辺りに顔の気持ちのを機械学習で当ててみるような記事もありますが、 この度は、顔の気持ちではなく、言葉の気持ちについての記事です。

文書の気持ちを理解するためにはまずは言葉の気持ちを理解しないといけません。今日はセンチメント分析によく利用される「辞書」のデータを調べてみます。

こちらの辞書を使わせていただいてます。

http://www.lr.pi.titech.ac.jp/\~takamura/pubs/pn_ja.dic

同じく英語の辞書もありましたので、そちらもみてみたいです!

http://www.lr.pi.titech.ac.jp/\~takamura/pubs/pn_en.dic

こちらの辞書では言葉、ふりがな、品詞、と 1から-1までの数字があります。 最後の数字の方は「ポジティブ性」や「ネガティブ性」を表そうとしてます。では、中身はどうなのかを調べてみましょう!

まずは開く

必要なインポートは以下

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

先ほどのurlでダウンロードし、パイソンに入れておきましょう。

english_words = pd.read_csv(en_dict_path, sep=':', ['words', 'type', 'score']) jp_words = pd.read_csv(jp_dict_path, sep=':', names=['words', 'reading', 'type', 'score'], header=None)

なんとなく何が入っているのかをみてみましょう print(jp_words.type.unique()) => ['動詞' '形容詞' '名詞' '副詞' '助動詞'] print(jp_words.groupby('type').type.count()) =>

type
副詞      1207
助動詞        2
動詞      4252
名詞     48999
形容詞      665

Pandasの力で割と簡単にできますね。 つまり、圧倒的に名詞が多くて、助動詞は無視できそうですね。

グラッフを書こう!

これで、項目名も設定されます。

sns.set(style='ticks', palette='Set2')
sns.despine()

そして、グラフが綺麗に出るような設定をしました。

では全体のデータの偏りなどをみてみましょう。

sns.distplot(jp_words.score)

f:id:c-pattamada:20190228003327p:plain
日本語単語のネガポジグラフ

おー、意外ですね!全体的に言葉が左に片よっれます。なんででしょうね。英語に比較しましょう!

sns.distplot(englishwords.score)

f:id:c-pattamada:20190228003247p:plain
英語単語のネガポジグラフ

こっちは偏ってないですね。でも非常に真ん中に集まってます。これは、、日本語にはちょっとだけネガティブなニュアンスが入っている言葉が多そうですね。理由を当てようとしますと、この辞書が作られた元のサンプルでのネガティブな気持ちが入っていた日本語文章が多かったとか…しか思い当たらないですね。

さて、最後にもう一つの軸でみたいですね。言葉の種類5つありましたが、それぞれの偏りがどうなのかも調べてみたいです。

軸が分けられたグラフ

つまり、動詞、名詞、などのそれぞれのカテゴリー別にもどんな感じなのかもみてみたいということです。

g = sns.FacetGrid(jp_words, hue='type', height=6)
g.map(plt.hist, 'score')
g.add_legend()
new_labels = ['verb', 'adj', 'adv', 'ex']
for t, l in zip(g._legend.texts, new_labels): t.set_text(l)

f:id:c-pattamada:20190228014530p:plain
名詞が多い!

うげ、、、名詞が多い!何も見えないですね。下の辺の副詞の気配を薄くしか感じないですね。もちろん、名詞のデータが全部のデータの形に非常に似ています。量が圧倒的なので、当然ですが。仕方ありません、名詞を抜いてグラフをみましょう。

sns.FacetGrid(jp_words[jp_words.type != '名詞'], hue='type', height=6)

上記のコードをこう編集しますとできます。

f:id:c-pattamada:20190228014611p:plain
名詞なしの図

なるほどですね、形容詞がすごく極端になってます。動詞が割と多く、名詞よりちょっと左に肩よっています。

結論

これで、pandasとseabornを利用し、持っているデータをより把握できるようなったかなと思います。この次には「なんで単語が左偏ってる?」みたいな質問も聞いて調べるのが次の段階です。そこでまた新たな質問が出てくるかも知れませんし、プロジェクトに対してこの辞書を本当に活かせる方法はなんでしょうかを判断できるようになりますが、そこは時と場合に応じて違う答えだと思います。旅みたいなものです。「データ」を利用して「情報」を得る「情報」を利用して「決断」をする。それがデータサイエンスの中心だと思います。

一緒に旅しましょう〜

Data Visualization in Python

こんにちは、 ゴーリストのビベックです。
Hello World! This is Vivek from Goalist.

If you want to build a very powerful machine learning algorithm on structured data then the first step to take is to explore the data every which way you can. You draw a Histogram; you draw a Correlogram; you draw Cross Plots, you really want to understand what is in that data what does each variable mean, what's its distribution ideally how was it collected.

Once you have a real rock solid understanding of what's in the data, only then can you smoothly move into creating your machine learning model.

f:id:vivek081166:20190227141304j:plain

In this post, let's go through the different libraries in Python and gain some insight into our data by visualizing it.

Along with me, you may want to try and experiment with the packages that we are about to explore. Use Google Colab to follow along.

Well, you may ask what is Google Colab?

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.
With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Learn more about Google Colab here…

youtu.be

Follow this URL to create your notebook

colab.research.google.com

So let's get started…
We'll be using the following packages to plot different graphs

  • Matplotlib
  • Seaborn
  • Bokeh

Out of these, Matplotlib is the most common charting package, see its documentation for details, and its examples for inspiration.

1) Line Graphs

A line graph is commonly used to show trends over time. It can help readers to understand your data in a few seconds.

f:id:vivek081166:20190227145617p:plain

Sample code to generate a Line Graph is given below. Go ahead and open the sample code in Colab and experiment with it.

gist.github.com

2) Bar Charts

Bar graphs, also known as column charts, use vertical or horizontal bars to represent data along both an x-axis and a y-axis visually. Each bar represents one value. When the bars are stacked next to one another, the viewer can compare the different bars, or values, at a glance.

f:id:vivek081166:20190227201142p:plain

Sample code to generate a Bar Chart and 3D Bar Chart is given below.

gist.github.com

3) Histogram

A histogram is used to summarize discrete or continuous data. In other words, it provides a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values (called "bins"). 

It is similar to a vertical bar graph. However, a histogram, unlike a vertical bar graph, shows no gaps between the bars.

f:id:vivek081166:20190227201314p:plain

gist.github.com

4) Scatter Plots

Scatter Plot is a graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present.

f:id:vivek081166:20190227201410p:plain

Sample code to generate a Scatter Plot and 3D Scatter Plot is given below.

gist.github.com

5) Pie Charts

A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.

f:id:vivek081166:20190227201516j:plain
(A Donut Chart is a variation of a Pie Chart but with a space in the center.)

gist.github.com

6) Wireframe Plots

Wireframe plots are used to graphically represent skeletal sketches of functions defined over a rectangular grid. Geographic data is an example of where this type of graph would be used.

f:id:vivek081166:20190227201611j:plain

gist.github.com

7) Regplot

Regplot is a simple scatterplot with a nice regression line fit to it. 
We'll use Seaborn's regplot to draw Regplot

f:id:vivek081166:20190227202106p:plain

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Here are some more examples of Seaborn for inspiration

gist.github.com

8) HeatMap

A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors. Github's contribution calendar is an example of a heatmap.

f:id:vivek081166:20190227202430p:plain

gist.github.com

9) Bubble Chart

A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There is a third data set, which is indicated by the size of the bubble or circle.
f:id:vivek081166:20190227202519p:plain

To draw an interactive bubble chart let's use bokeh

Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

gist.github.com

That's all for this post see you soon with one of such next time; until then, 
Happy Learning :)

Scan documents using OpenCV python

こんにちは、 ゴーリストのビベックです。 Hello World! This is Vivek from Goalist.

In this blog post, let's play around OpenCV library and write our own python script to scan documents like receipts, business cards, pages of book etc.

f:id:vivek081166:20190212112431p:plain

For those who are not aware of OpenCV, let's quickly answer a few questions about this library

What is OpenCV?
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision. The library is cross-platform and free for use under the open-source BSD license. OpenCV supports the deep learning frameworks TensorFlow, PyTorch, and Caffe.

What OpenCV can do?
1. Read and Write Images
2. Detection of faces and its features
3. Detection of shapes like Circle, rectangle etc in an image
4. Text recognition in images
5. Modifying image quality and colors
6. Developing Augmented reality apps
and much more.....

Which Languages does OpenCV support?
1. C++
2. Python
3. Java
4. Matlab/Octave
5. C
6. There are wrappers in other languages like Javascript, C#, Perl, Haskell, and Ruby to encourage adoption by a wider audience.

The initial version of OpenCV was released in June 2000, that does mean; (at the time of writing this post) it's almost 19 years this library is in use.

Some papers also highlight the fact that OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in commercial products.

So let's get started and let's see what we can build with it...

Step 1: Setting up the environement

We will be using Python 3 for our project, so, ensure that you have Python version 3 as your development environment.
You may refer the following link to set up Python on your machine.

www.python.org

Step 2: Gather required packages

We will be needing following packages in our project
1) Pre-built OpenCV packages for Python
opencv-python==4.0.0.21
opencv-contrib-python==4.0.0.21

2) For Array computation
numpy==1.16.1

3) For applying filters to image (image processing)
scikit-image==0.14.2

4) Utility package for image manupulation
imutils==0.5.2

Step 3: Let's make it work

Import the installed packages into your python script

import cv2 # opencv-python
import numpy as np
from skimage.filters import threshold_local # scikit-image
import imutils

Read the image to be scanned into your script by using OpenCV's imread() function.

We are going to perform edge detection on the input image hence in order to increase accuracy in edge detection phase we may want to resize the image. So, compute the ratio of the old height to the new height and resize() it using imutils

Also keep the cloned copy of original_image for later use

# read the input image
image = cv2.imread("test_image.jpg")

# clone the original image
original_image = image.copy()

# resize using ratio (old height to the new height)
ratio = image.shape[0] / 500.0
image = imutils.resize(image, height=500)

Generally paper (edges, at least) is white so you may have better luck by going to a different color space like YUV which better separates luminosity. (Read more about this here YUV - Wikipedia )
In order to change the color space of the input image use OpenCV's cvtColor() function.
From YUV image let's get rid of chrominance {color} (UV) components and only use luma {black-and-white} (Y) component for further proccesing.

#  change the color space to YUV
image_yuv = cv2.cvtColor(image, cv2.COLOR_BGR2YUV)

# grap only the Y component
image_y = np.zeros(image_yuv.shape[0:2], np.uint8)
image_y[:, :] = image_yuv[:, :, 0]

f:id:vivek081166:20190212132516p:plain

The text on the paper is another problem while detecting edges so let's use blurring effect GaussianBlur(), to remove these high-frequency noises (hopefully to some extent)

# blur the image to reduce high frequency noises
image_blurred = cv2.GaussianBlur(image_y, (3, 3), 0)

It's time to detect edges in our input image.
Use Canny() function to detect edges. You may have to tweak threshold parameters of this function in order to get the desired output.

# find edges in the image
edges = cv2.Canny(image_blurred, 50, 200, apertureSize=3)

f:id:vivek081166:20190212142140p:plain

Now that we have detected edges in our input image let's find contours around the edges and draw it on the original image

# find contours
contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# draw all contours on the original image
cv2.drawContours(image, contours, -1, (0, 255, 0), 1)
# !! Attention !! Do not draw contours on the image at this point
# I have drawn all the contours just to show below image

f:id:vivek081166:20190212142135p:plain

Now that we should have a bunch of contours with us, it's time to find the right ones.
For each contour cnt, first, find the Convex Hull (Convex hull - Wikipedia), then use approaxPolyDP to simplify the contour as much as possible.

# to collect all the detected polygons
polygons = []

# loop over the contours
for cnt in contours:
    # find the convex hull
    hull = cv2.convexHull(cnt)
    
    # compute the approx polygon and put it into polygons
    polygons.append(cv2.approxPolyDP(hull, 0.01 * cv2.arcLength(hull, True), False))

Sort the detected polygons in the descending order of contour area so that we will get a polygon with the largest areas found inside the image

# sort polygons in desc order of contour area
sortedPoly = sorted(polygons, key=cv2.contourArea, reverse=True)

# draw points of the intersection of only the largest polyogon with red color
cv2.drawContours(image, sortedPoly[0], -1, (0, 0, 255), 5)

f:id:vivek081166:20190212155446p:plain

We now check if the largest detected polygon has four points.
If the polygon has four points congratulations we have detected four corners of the document in the image.

It's time to crop the image and transform the perspective of the image with respect to these four points

# get the contours of the largest polygon in the image
simplified_cnt = sortedPoly[0]

# check if the polygon has four point
if len(simplified_cnt) == 4:
    # trasform the prospective of original image
    cropped_image = four_point_transform(original_image, simplified_cnt.reshape(4, 2) * ratio)

Refer the following to get to know about four_point_transform() function in detail.

Finally binarize the image to have scanned version of the cropped image

# Binarize the cropped image
gray_image = cv2.cvtColor(cropped_image, cv2.COLOR_BGR2GRAY)
T = threshold_local(gray_image, 11, offset=10, method="gaussian")
binarized_image = (gray_image > T).astype("uint8") * 255

# Show images
cv2.imshow("Original", original_image)
cv2.imshow("Scanned", binarized_image)
cv2.imshow("Cropped", cropped_image)
cv2.waitKey(0)

f:id:vivek081166:20190212161727p:plain

🎉There we go... we just managed to scan a document from a raw image with the help of OpenCV.

That's all for this post see you soon with one of such next time; until then,
Happy Learning :)

一般化がいつも重要!(これも一般化なのかな)

チナパです! 早速ですが、A few useful things to know about machine learning - Pedro Domingos

https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

f の続きをしたいと思います!以前、この論文にまとめられてる分類器の3つの部分(表現、評価、改善)について書きましたので、気になる方はこちらで読んでみてください!

developers.goalist.co.jp

その続きには、まず一般化について説明したいと思います!

一般化が全て!

学校・社会では「一般化」はあんまり良いイメージを持ってないかも知れません。例えば、数学がいやな友達が「インド人みんな二桁乗算できるじゃん!」ということに対していつも「それはただ一般化だよ!人はそれぞれ!」みたいに強く否定することもあります。そんな状況には確かに、一般化の弱点が明確に見えますが、その限られた視点を使って「すべての一般化が悪い」という結論にたどり着くのも…

人間も使ってる重要な道具

一般化は完璧ではありませんが、人間が持つ思考道具のとても重要な一つです。なぜなら、世界の「すべて」を経験していない限り、新しい状況に入ってしまう時には一般化を無くして他に頼る物が心細い。中学生が高校に行く時にも、大学生が社会人になった時にも、今までの経験から学んだことを新しく入る場所にもちろん使おうとします。完璧ではないが、人間が新しい状況で素早く活躍し、新しい学びを作ることができる理由は一般化にあると思います。

その力は今になってパソコンも使うとしています。

機械学習も今まで見たデータを「経験」として生かして、見たことがない新しい(似ているかも知れない)問題を解決しようとしています。

機械学習のモデルを作る時に、データ・サイエンティストあるいはAIエンジニアとしての責任はモデルが一般化の力を身につけるように作ることです。

どうやって一般化できるモデルが作れます?

作る方法の前に、確認の方法があるのが重要です。

まずはですね、学習データ、バリデーションデータと評価データをそれぞれ持ちましょう。

学習データの精度は「このモデルがパターンを見つけられるかどうか」をわかるためには便利ですが、実際のモデルの良さが分かりません。言い換えましょう。学習データでの精度が低ければ、「モデルが良くない」のが分かりますが、学習データでの精度が100%でも、「モデルが良い」を言い切れません。

人間の学生が教科書の通りの言葉を繰り返しても本当のコンセプトが理解できているのが言い切れないことと同じです。そのために、教科書・学習データと異なるものに対しての評価が必要です。これは「テストデータ」と言います。

このように、学習データ(教科書)とテストデータ(試験)が全く同じような物ではないことによって、一般化の確認ができたかどうかが分かるはず…よね?

そうですね、、、、まだです。

その理由を知るために、まず、実践的な状況を想像しましょう。

現実はこうですよ

研究者Aくんがモデルを作ろうとしています。データを100件持っているとしましょう。上記のように、ちゃんと持ってるデータを学習データ(90件)とテストデータ(10件)に分けてます。ここまでよし。

Round 1:いくつかのフィーチャを使って、学習させます。 結果:テストしたら40%なので、もっかい!

Round 2:フィーチャをちょっと改善して、新しいのを作成し、学習させます。 結果:テストしたら41%なので、もっかい!

Round 3:モデルをちょっと変換して、フィーチャも編集して、学習させます。 結果:テストしたら50%なのでよし、がもっかい!

1ヶ月後

Round 21:何回も何回もモデルとフィーチャーを編集して、学習させました。 結果:テストしたら93%なのでやった!90%以上だし、すごい!

と思ってます。

お試しで他の人に使わせてみたら、

「え?」

友達がやってみたら、80%ぐらいの正答率しか出てこない。しかも、微妙に二つのクラスに混乱するのが多い。

何が行われたのでしょう?

絶望の原因をしっかり把握しましょう

ここで、確かに、学習の時に別なデータを利用して、評価の時に別なデータを利用したためにそこでオーバーフィッティング(一般化できないこと)がそんなに見えなかったが、テスト結果に応じてのモデル編集自体を何度も、行なったため、テストデータに特に合いそうなモデルを作ってしまいました。

そこで、本当の状況を知らず、みんなに「90%以上!」と自慢してた….絶望的。

一般化を確認するための理想的なプロセス

学習するたびに毎回一般化の確認を含まなくても良いですが、最終的にいくつかの検討できるモデルの中で、実際に使いたい時にこのような形で一般化の確認ができます。

学習データとテストデータを分ける

全てのデータをまず二つに分ける。例えば、8割・2割で学習とテストデータを作りましょう。(割合に多少融通が利きます)

ここのテストデータを一旦放置して、残っている学習データをさらに8割・2割に分けましょう。今回の2割は「クロスバリデーションデータ」と言います。

クロスバリデーションデータを使う

学習して→クロスバリデーションデータで評価しましょう。 何度も編集しても、このクロスバリデーションに対しての評価が高くなっていっても、まだ一度も利用されてないテストデータが残ってます。

クロスバリデーションデータはずっと同じデータでもなくても良いです。

学習データとテストデータと学習データをシャッフル

テストデータを使わずに、一般化ができていると確認したい場合に、クロスバリデーションデータと学習データを混ぜて、もう一回ランダムで分け直しましょう。

例:最初に1-64まで、が学習データとして、

65-80がクロスバリデーションとして使っていた場合、にシャッフルして

1-16までをクロスバリデーションに使い、今度学習データを17-80まで使って、

それぞれの状態の評価には変動が激しくないと確認できます。

テストデータで評価

モデル修正が色々満足した時に、やっとテストデータを利用して、公開できる評価を今までみていない、ランダムのはずのデータに対して得られます。

まとめ

この度、一般化について詳しく説明いたしました。これで、一般化がなんで重要なのか、そして、一般化できないモデルの騙されない方法をいくつか使えるようになりましたかな!

では、また今度!

チナパ。

Business Card Reader : Part 2 : Frontend (Ionic App)

Hello World! My name is Vivek Amilkanthawar
In the last blog post, we had written cloud function for our Business Card Reader app to do the heavy lifting of text recognition and storing the result into database by using Firebase, Google Cloud Vision API and Google Natural Language API.

We had broken down the entire process into the following steps.
1) User uploads an image to Firebase storage via @angular/fire in Ionic.
2) The upload triggers a storage cloud function.
3) The cloud function sends the image to the Cloud Vision API
4) Result of image analysis is then sent to Cloud Language API and the final results are saved in Firestore.
5) The final result is then updated in realtime in the Ionic UI.

Out of these, step #2, step #3 and step #4 are explained in last blog post
If you have missed the last post, you can find it here...

developers.goalist.co.jp

In this blog post, we'll be working on the frontend to create an Ionic app for iOS and Android (step #1 and step #5)

The final app will look something like on iOS platform

f:id:vivek081166:20190121142143g:plain

So let's get started

Step 1: Create and initialize an Ionic project

Let’s generate a new Ionic app using the blank template. I have named my app as meishi (めいし) it means 'business card' in Japanese.

ionic start meishi blank
cd meishi

Making sure you are in the Ionic root director then generate a new page with the following command

ionic g page vision

We'll use the VisionPage as our Ionic root page inside the app.component.ts

import { VisionPage } from '../pages/vision/vision';
@Component({
  templateUrl: 'app.html'
})
export class MyApp {
  rootPage:any = VisionPage;
  // ...skipped
}

Add @angular/fire and firebase dependencies to our Ionic project for communicating with firebase.

npm install @angular/fire firebase --save

Add @ionic-native/camera to use native camera to capture buisness card image for processing.

ionic cordova plugin add cordova-plugin-camera
npm install --save @ionic-native/camera

At this point, let's register AngularFire and the native camera plugin in the app.module.ts
(add your own Firebase project credentials in firebaseConfig)

import {BrowserModule} from '@angular/platform-browser';
import {ErrorHandler, NgModule} from '@angular/core';
import {IonicApp, IonicErrorHandler, IonicModule} from 'ionic-angular';
import {SplashScreen} from '@ionic-native/splash-screen';
import {StatusBar} from '@ionic-native/status-bar';

import {MyApp} from './app.component';
import {HomePage} from '../pages/home/home';
import {VisionPage} from '../pages/vision/vision';

import {AngularFireModule} from '@angular/fire';
import {AngularFirestoreModule} from '@angular/fire/firestore';
import {AngularFireStorageModule} from '@angular/fire/storage';

import {Camera} from '@ionic-native/camera';

const firebaseConfig = {
  apiKey: 'xxxxxx',
  authDomain: 'xxxxxx.firebaseapp.com',
  databaseURL: 'https://xxxxxx.firebaseio.com',
  projectId: 'xxxxxx',
  storageBucket: 'xxxx.appspot.com',
  messagingSenderId: 'xxxxxx',
};

@NgModule({
  declarations: [
    MyApp,
    HomePage,
    VisionPage,
  ],
  imports: [
    BrowserModule,
    IonicModule.forRoot(MyApp),
    AngularFireModule.initializeApp(firebaseConfig),
    AngularFirestoreModule,
    AngularFireStorageModule,
  ],
  bootstrap: [IonicApp],
  entryComponents: [
    MyApp,
    HomePage,
    VisionPage,
  ],
  providers: [
    StatusBar,
    SplashScreen,
    {provide: ErrorHandler, useClass: IonicErrorHandler},
    Camera,
  ],
})
export class AppModule {
}

Step 2: Let's make it work

There is so much going on in the VisionPage component, let's break it down and see it step by step.

1) User clicks "Capture Image" button which triggerscaptureAndUpload()to bring up the device camera.

2) Camera returns the image as a Base64 string. I have reduced the quality of the image in order to reduce processing time. For me, even with 50% of the image quality, Google Vision API is doing well.

3) We generate an ID that is used for both the image filename and the Firestore document ID.

4) We then listen to this location in Firestore.

5) An upload task is created to transfer the file to storage.

6) We wait for the cloud function (refer my last post) to update Firestore.

7) Once the data is received from Firestore we use helper methods extractEmail() and extractContact() to extract email and contact information from the received string.

8) And it's done!!

import {Component} from '@angular/core';
import {IonicPage, Loading, LoadingController} from 'ionic-angular';

import {Observable} from 'rxjs/Observable';
import {filter, tap} from 'rxjs/operators';

import {AngularFireStorage, AngularFireUploadTask} from 'angularfire2/storage';
import {AngularFirestore} from 'angularfire2/firestore';

import {Camera, CameraOptions} from '@ionic-native/camera';

@IonicPage()
@Component({
  selector: 'page-vision',
  templateUrl: 'vision.html',
})
export class VisionPage {

  // Upload task
  task: AngularFireUploadTask;

  // Firestore data
  result$: Observable<any>;

  loading: Loading;
  image: string;

  constructor(
    private storage: AngularFireStorage,
    private afs: AngularFirestore,
    private camera: Camera,
    private loadingCtrl: LoadingController) {

    this.loading = this.loadingCtrl.create({
      content: 'Running AI vision analysis...',
    });
  }

  startUpload(file: string) {

    // Show loader
    this.loading.present();

    // const timestamp = new Date().getTime().toString();
    const docId = this.afs.createId();

    const path = `${docId}.jpg`;

    // Make a reference to the future location of the firestore document
    const photoRef = this.afs.collection('photos').doc(docId);

    // Firestore observable
    this.result$ = photoRef.valueChanges().pipe(
      filter(data => !!data),
      tap(_ => this.loading.dismiss()),
    );

    // The main task
    this.image = 'data:image/jpg;base64,' + file;
    this.task = this.storage.ref(path).putString(this.image, 'data_url');
  }

  // Gets the pic from the native camera then starts the upload
  async captureAndUpload() {
    const options: CameraOptions = {
      quality: 50,
      destinationType: this.camera.DestinationType.DATA_URL,
      encodingType: this.camera.EncodingType.JPEG,
      mediaType: this.camera.MediaType.PICTURE,
      sourceType: this.camera.PictureSourceType.PHOTOLIBRARY,
    };

    const base64 = await this.camera.getPicture(options);

    this.startUpload(base64);
  }

  extractEmail(str: string) {
    const emailRegex = /(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))/;
    const {matches, cleanedText} = this.removeByRegex(str, emailRegex);
    return matches;
  };

  extractContact(str: string) {
    const contactRegex = /(?:(\+?\d{1,3}) )?(?:([\(]?\d+[\)]?)[ -])?(\d{1,5}[\- ]?\d{1,5})/;
    const {matches, cleanedText} = this.removeByRegex(str, contactRegex);
    return matches;
  }

  removeByRegex(str, regex) {
    const matches = [];
    const cleanedText = str.split('\n').filter(line => {
      const hits = line.match(regex);
      if (hits != null) {
        matches.push(hits[0]);
        return false;
      }
      return true;
    }).join('\n');
    return {matches, cleanedText};
  };

}

Step 3: Display your result

Let's create a basic UI using ionic components

<!--
  Generated template for the VisionPage page.

  See http://ionicframework.com/docs/components/#navigation for more info on
  Ionic pages and navigation.
-->
<ion-header>

  <ion-navbar>
    <ion-title>Meishi</ion-title>
  </ion-navbar>

</ion-header>


<ion-content padding>
  <ion-row>

    <ion-col col-12 text-center>

      <button ion-button icon-start (tap)="captureAndUpload()">
        <ion-icon name="camera"></ion-icon>
        Capture Image
      </button>

    </ion-col>

    <ion-col col-12>
      <img width="100%" height="auto" [src]="image">
    </ion-col>

    <ion-col *ngIf="result$ | async as result">

      <h4>
        <span class="title">名前: </span><br>
        {{result.requiredEntities.PERSON}}
      </h4>
      <h4>
        <span class="title">Email:</span><br>
        <span *ngFor="let email of extractEmail(result.text)">{{email}}<br></span>
      </h4>
      <h4>
        <span class="title">電話番号:</span><br>
        <span *ngFor="let phone of extractContact(result.text)">{{phone}}<br></span>
      </h4>
      <h4>
        <span class="title">組織:</span><br>
        {{result.requiredEntities.ORGANIZATION}}
      </h4>
      <h4>
        <span class="title">住所:</span><br>
        {{result.requiredEntities.LOCATION}}
      </h4>

      <h4><span class="title">認識されたテキスト</span></h4>

      <h5>
        {{result.text}}
      </h5>


    </ion-col>
  </ion-row>

</ion-content>

Step 4: Generate an app into platform of your choice

Finally, let's generate an application into iOS or Android
Run the following command to create a build of the app for iOS

 ionic cordova build ios

In a similar way to generate android app, run the following command

 ionic cordova build an Android

Open the app on an emulator or on an actual device and test it yourself

Congrats!! we just create a Business Card Reader app powered with Machine Learning :)

That's it for this post see you soon with one of such next time; until then,
Happy Learning :)