Goalist Developers Blog

Teachable Desktop Automation

// Teach your computer to recognize gestures and trigger a set of actions to perform after a certain gesture is recognized.//

Hello World! I'm very excited to share with you my recent experiment wherein I tried to teach my computer certain gestures and whenever those gestures are recognized certain actions will be performed.

In this blog, I'll explain all you need to do to achieve the following

IF: I wave to my webcam
THEN: move the mouse pointer a little to its right

I have used power for Node js to achieve this. The idea is to create a native desktop app which will have access to the operating system to perform certain actions like a mouse click or a keyboard button press and also on the same native desktop app we'll try to train our model and draw inferences locally.

To make it work, I thought of using tensorflow.js and robotjs in an Electron App created using Angular.

f:id:vivek081166:20190329175638p:plain

So, are you ready? let's get started…

Generate the Angular App

Let's start by creating a new Angular project from scratch using the angular-cli

npm install -g @angular/cli

ng new teachable-desktop-automation

cd teachable-desktop-automation

Install Electron

Add Electron and also add its type definitions to the project as dev-dependency

npm install electron --save-dev

npm install @types/electron --save-dev

Configuring the Electron App

Create a new directory inside of the projects root directory and name it as "electron". We will use this folder to place all electron related files.

Afterward, make a new file and call it "main.ts" inside of the "electron" folder. This file will be the main starting point of our electron application.

Finally, create a new "tsconfig.json" file inside of the directory. We need this file to compile the TypeScript file into JavaScript one.

Use the following as the content of "tsconfig.json" file.

gist.github.com

Now it's time to fill the "main.ts" with some code to fire up our electron app.

gist.github.com

Visit electronjs for details

Make a custom build command

Create a custom build command for compiling main.ts & starting electron. To do this, update "package.json" in your project as shown below

{
  "name": "teachable-desktop-automation",
  "version": "0.0.0",
  "main": "electron/dist/main.js", // <-- this was added
  "scripts": {
    "ng": "ng",
    "start": "ng serve",
    "build": "ng build",
    "test": "ng test",
    "lint": "ng lint",
    "e2e": "ng e2e",
    "electron": "ng build --base-href ./ && tsc --p electron && electron ."  // <-- this was added
  },
  // ...omitted
}

We can now start our app using npm:

npm run electron

f:id:vivek081166:20190329180447p:plain

There we go… our native desktop app is up and running! However, it is not doing anything yet.

Let's make it work and also add some intelligence to it…

Add Robotjs to the project

In order to simulate a mouse click or a keyboard button press, we will need robotjs in our project.
I installed robotjs with the following command

npm install robotjs

and then tried to use in the project by referring to some examples on their official documentation. However, I struggled a lot to make robotjs work on the electron app. Finally here is a workaround that I came up with

Add ngx-electron to the project

npm install ngx-electron

And then inject its service to the component where you want to use the robot and use remote.require() to capture the robot package.

import { Component } from '@angular/core';
import { ElectronService } from 'ngx-electron';
@Component({
selector: 'app-root',
templateUrl: './app.component.html',
styleUrls: ['./app.component.scss'],
})
export class AppComponent {
constructor(private electronService: ElectronService) {
   
   this.robot = this.electronService.remote.require('robotjs');
// move mouse pointer to right
   const mousePosition = this.robot.getMousePos();
   this.robot.moveMouse(mousePosition.x + 5, mousePosition.y);
}
}
Add Tensorflow.js to the project

We'll be creating a KNN classifier that can be trained live in our electron app (native desktop app) with images from the webcam.

npm install @tensorflow/tfjs

npm install @tensorflow-models/knn-classifier

npm install @tensorflow-models/mobilenet

A quick reference for KNN Classifier and MobileNet package

Here is a quick reference to the methods that we'll be using in our app. You can always refer tfjs-models for all the details on implementation.

KNN Classifier

  • knnClassifier.create() : Returns a KNNImageClassifier.
  • .addExample(example, classIndex) : Adds an example to the specific class training set.
  • .predictClass(image) : Runs the prediction on the image, and returns an object with a top class index and confidence score.

MobileNet

  • .load() : Loads and returns a model object.
  • .infer(image, endpoint) : Get an intermediate activation or logit as Tensorflow.js tensors. Takes an image and the optional endpoint to predict through.

Finally make it work

For this blog post, I'll keep aside the cosmetics part (CSS I mean) apart and concentrate only on the core functionality

Using some boilerplate code from Teachable Machine and injecting robotjs into app component here is how it looks

gist.github.com

and now when running the command npm run electron you see me (kidding)

f:id:vivek081166:20190329181118p:plain

Let's Test it

I'll train the image classifier on me waving to the webcam (Class 2) and also with me doing nothing (Class 1).

Following are the events are associated with these two classes

Class 1: Do nothing
Class 2: Move mouse pointer slightly to the right

youtu.be

With this, your computer can learn your gestures and can perform a whole lot of different things because you have direct access to your operating system.

The source code of this project can be found at below URL…

github.com

I have shared my simple experiment with you. Now it's your turn try to build something with it and consider sharing that with me as well :D

I have just scratched the surface, any enhancements, improvements to the project are welcome though GitHub pull requests.

Well, that's it for this post… Thank you for reading until the end. My name is Vivek Amilkanthawar, see you soon with another one.

Googleのbertを利用してみました〜!

こんにちは、チナパです!

先日、Word2vecを利用して、単語から数字のための辞書を作成してみました。その続きで、Googleが最近リリースした「bert」(Bidirectional Encoder Representations from Transformers)を利用してみましょう。

f:id:c-pattamada:20190329191856p:plain
人間よりできる!

bertとは?

まずはそもそもこれは何なのか、なんですごいのかを説明します。

現在までの自然言語処理の技術が文章を読みながら、今までの言葉からコンテキストを理解して、順番での次の言葉の意味を今までの言葉によってのコンテキストで影響されるような技術が代表的でした。

それはRNNの構築を利用して行ってました。

去年、ELMoとbertでは「Attention」を利用し、より精度の高い結果を出せるようになってる。

例えば、チャットボットを作成する時に、Attentionの構築では前のメッセージのキーワードを覚えながらお返事を作成するようなイメージです。

Bertはそれをより強化的に使っています。

“Multi-headed attention”と説明されていますが、複数な箇所をバラバラに注目しながら解析しているような巨大モデルです。

英語の自然言語処理のベンチマークの一つのSQuADでは人間より良い結果が簡単に出せるようです。

bertが事前に単語を学習されたままで利用もできますので、試してみましょう!

素晴らしい!では、どうやって使える??

Bertは割と重いので、今回はColabでやってみました。

!pip install bert-tensorflow

from google.colab import drive
drive.mount('/content/gdrive')

driveの許可を与えて、bertの語彙データをtensorflow_hub_からアクセスしましょう。

from bert.tokenization import FullTokenizer
import pandas as pd
import tensorflow_hub as hub

bert_path = "https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1" # 日本語用のモデルがこちら

sess = tf.Session()

def create_tokenizer_from_hub_module():
    """Get the vocab file and casing info from the Hub module."""
    bert_module =  hub.Module(bert_path)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    vocab_file, do_lower_case = sess.run(
        [
            tokenization_info["vocab_file"],
            tokenization_info["do_lower_case"],
        ]
    )

    return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)
  
tokenizer = create_tokenizer_from_hub_module()

これで、bertのtokenizerのインスタンスを作りました。MeCabみたいに、文字列を言葉に分けるためのものです。bertでは、漢字が全部一文字ずつのトークンに変換されます。

tokenizer.tokenize('こんにちは、今日の天気はいかがでしょうか?') すると、

['こ',
 '##ん',
 '##に',
 '##ち',
 '##は',
 '、',
 '今',
 '日',
 'の',
 '天',
 '気',
 'は',
 '##い',
 '##か',
 '##が',
 '##で',
 '##し',
 '##ょう',
 '##か',
 '?']

みたいなとんでもない結果が出ます。

"##" が続いているような風に見える。トークンから文字列に戻そうとするとこのあたりは気をつけないと… 後は、変な分からない文字がありましたら、このように出てきます。

tokenizer.tokenize('゛')

# アウトプット=> ['[UNK]']

さて、ベクトル化しましょう。

bertでトークンをベクトル変えるために、3つのインプットが必要です。

tokenizer.convert_tokens_to_ids(...)

で作成できるid以外にsegment idとinput maskが必要です。

input_idがbertで素早くベクトルに変換するためのものです。segment_idは文の番号です、これでメッセージとその返事が分けられたりできます。padding系のインプットを無視できるように、input_mask_を利用します。

f:id:c-pattamada:20190329200016p:plain
そう、これがいい!

一旦は、一つの文章だけの処理をしましょう。上記の3つのベクトルがこのようなメソッドで作られます。

def convert_string_to_bert_input(tokenizer, input_string, max_length=128):

    tokens = []
    tokens.append("[CLS]")
    tokens.extend(tokenizer.tokenize(input_string))
    if len(tokens) > max_seq_length - 2:
        tokens = tokens[0 : (max_seq_length - 2)]
    tokens.append("[SEP]")
    
    segment_ids = [0] len(tokens)
    input_ids = tokenizer.convert_tokens_to_ids(tokens)
    # これから加えるpaddingが無視できるように
    input_mask = [1] * len(tokens)
    
    while len(input_ids) < max_seq_length:
        input_ids.append(0)
        input_mask.append(0)
        segment_ids.append(0)

    return np.array(input_ids),
            np.array(input_mask),
            np.array(segment_ids)

上記のメソッドで"[CLS]"と"[SEP]"を追加しているのが見えます。これはbertで利用する「開始文字」と「文を分ける文字」になってます。複数の文章がある場合に、間に"[SEP]"を入れます。

Pandasでベクトル化したいデータを読み込み、上記のメソッドで変換しましょう。

data_path = ...
input_column = ...
df = pd.read_csv(data_path)

features = df[input_column].map(
                       lambda my_string:
                           convert_string_to_bert_input(tokenizer, my_string)
                       )

bertをkerasで利用

よっし、これからbertをkerasのモデルに利用する魔法を使いましょう。

https://github.com/strongio/keras-bert/blob/master/keras-bert.ipynb

こちらはかなり参考になりました。

class BertLayer(tf.layers.Layer):
    def __init__(self, n_fine_tune_layers=10, **kwargs):
        self.n_fine_tune_layers = n_fine_tune_layers
        self.trainable = False
        self.output_size = 768
        super(BertLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.bert = hub.Module(
            bert_path,
            trainable=self.trainable,
            name="{}_module".format(self.name)
        )

        trainable_vars = self.bert.variables

        # Remove unused layers
        trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]

        # Select how many layers to fine tune
        trainable_vars = trainable_vars[-self.n_fine_tune_layers :]

        # Add to trainable weights
        for var in trainable_vars:
            self._trainable_weights.append(var)
            
        for var in self.bert.variables:
            if var not in self._trainable_weights:
                self._non_trainable_weights.append(var)

        super(BertLayer, self).build(input_shape)

    def call(self, inputs):
        inputs = [K.cast(x, dtype="int32") for x in inputs]
        input_ids, input_mask, segment_ids = inputs
        bert_inputs = dict(
            input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids
        )
        result = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
            "pooled_output"
        ]
        return result

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_size)

これはkerasのカスタムLayerを作成し、その中にbertを利用するようなコードです。 自分のカスタムLayerを書くときにこちらもとても便利です https://keras.io/layers/writing-your-own-keras-layers/

重要な部分は3箇所あります。

1. build()

def build(self, input_shape):
    self.bert = hub.Module(
            bert_path,
            trainable=self.trainable,
            name="{}_module".format(self.name)
        )

ここでtensor flow hubを使って、bert_path_で定義されたモデルを利用します。

2. call()

input_ids, input_mask, segment_ids = inputs
        bert_inputs = dict(
            input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids
        )
        result = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
            "pooled_output"
        ]

ここでは、先ほど用意したinputをdictに変え、bertのモデルの結果を求める。 現在はsignature="tokens"以外のサポートがありません(29/03/2019)、"pooled_output"と別に"sequence_output"もあります。pooledの方が単語ベクトルを作成する時に使いますので、そちらで行きます。

3. output size

self.output_size = 768

こちらは使ってるモデルに応じて設定する数字です。普通のモデルのそれぞれのレイヤーが768次元で帰ってきます。Largeのモデルでは1024次元です。

これで先ほど作成した3つの変数をdictに入れ、self.bertで結果を得ます。 上記のlayerを普通のkerasのモデルで使えます。

例:

def get_model(max_length=128, num_classes=5):
    input_ids = tf.keras.layers.Input(shape=(max_seq_length,), name="input_ids")
    in_mask = tf.keras.layers.Input(shape=(max_seq_length,), name="input_masks")
    in_segment = tf.keras.layers.Input(shape=(max_seq_length,), name="segment_ids")
    bert_inputs = [input_ids, in_mask, in_segment]
    
    bert_output = BertLayer(n_fine_tune_layers=1)(bert_inputs)
    dense = tf.keras.layers.Dense(128, activation='relu')(bert_output)
    pred = tf.keras.layers.Dense(num_classes=5, activation='sigmoid')(dense)
    
    model = tf.keras.models.Model(inputs=bert_inputs, outputs=pred)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.summary()

ぜひ試してみましょう!

まとめ

今回、 今までの自然言語処理のやり方、とbertの簡単な説明をしました。それから、文字列をbertに適切な形に変換して、モデルで利用できるような形にしました。

今回は、自分も理解し切れてない部分もありますが、ぜひフィードバックを聞きたいです〜

Word2Vecで辞書を作成して見ました

こんにちは!チナパです!日本語の単語が難しくて、辞書が必要だなと思いました。私はhttps://jisho.org をよく使いますが、なかなかパソコンに説明しにくくて...

はい、冗談でした。

自然言語処理では日本語(あるいは他の言語)の「辞書」を利用する必要があります。「辞書」とはなんなのかがご存知かと思いますが、単語を決まった次元の数字ベクトルに変換するための辞書のことです。

f:id:c-pattamada:20190325201950j:plain
ベクトル辞書?!

このタスクは「文字レベル」、「単語レベル」、そして最近「文レベル」にもできるようになってます。

文字レベルと単語レベルの方はGloVeとWord2Vecなどの方法があります、今日はWord2Vecでシンプルな辞書をPythonで作って見ましょう。

注意: 自然言語処理では辞書の最適化によってタスクにかなりの改善も見られますが、今回は一旦デモができるレベルの辞書にはなれます。

必要な材料

  • パソコンで2-3GBぐらいの容量(cloudとColabでもできますが、今回はローカルにします)
  • 以下のライブラリ
    • MeCab, wget, gensim, jaconv
import MeCab
import wget
from gensim.corpora import WikiCorpus
from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence
import jaconv
from multiprocessing import cpu_count
import os
VECTORS_SIZE = 50

wiki_file_name = 'jawiki-latest-pages-articles1.xml-p1p106175.bz2'
ja_wiki_latest_url = 'https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles1.xml-p1p106175.bz2'

# 上記のwikiファイルは部分的なものです。
# 他の種類のデータがいかのurlで見つけられます
# https://dumps.wikimedia.org/jawiki/latest/ 

次元数は文字レベルの場合に20辺りでもいいかもしれませんが、単語レベルで行う際には50次元最低はオススメです。その以下であれば(自分の経験で)言葉の意味が捕まえきれてないケースが多いです。

データを手に入れよう

さて、まずはデータを大量に必要ですので、wikipediaをダウンロードしましょう!

if not os.path.isfile(wiki_file_name):
    wget.download(ja_wiki_latest_url, bar=wget.bar_adaptive)

今回は一部をダウンロードしています、役15分かかりました。全ての記事をダウンロードする場合は、3GBと3時間ぐらいが必要となります。

日本語の場合、同じ言葉や文字が全角か半角だったりするケースがあります。ローマジの場合、大文字と小文字のほとんどの場合に同じ意味を持つ(waterとwater) ここで、jaconvのライブラリを使い、すべてを全角に統一します。

def normalize_text(text):
    return jaconv.h2z(text, digit=True, ascii=True, kana=True).lower()

半角に統一したい場合、jaconv.z2hとメソッドを利用できます。 辞書を利用する際、同じように統一する必要があります。

上記のメソッドを利用して、以前ダウンロードしたwikipediaデータを全角に変えながら、普通のtxtファイルに保存しましょう。

wiki_text_file_name = 'wiki.txt'

def read_wiki(wiki_data, save_file):
    if os.path.isfile(save_file):
        print('Skipping reading wiki file...')
        return
    with open(save_file, 'w') as out:
            wiki = WikiCorpus(wiki_data, lemmatize=False, dictionary={}, processes=cpu_count())
            wiki.metadata = True
            texts = wiki.get_texts()
            for i, article in enumerate(texts):
                text = article[0]  # article[1] は記事名です
                sentences = [normalize_text(line) for line in text]
                text = ' '.join(sentences) + u'\n'
                out.write(text)
                if i % 1000 == 0 and i != 0:
                    print('Logged', i, 'articles')
    print('wiki保存完了')


read_wiki(wiki_file_name, wiki_text_file_name)

ここでは、gensimのWikiCorpusを利用して簡単に解読できます。これで学習できるデータが揃いましたので次のステップに行きましょう。

単語トーケン作成

こちらはメカブの出番です。

def get_words(text: str, mt: MeCab.Tagger) -> List[str]:
    mt.parse('')
    parsed = mt.parseToNode(text)
    components = []
    while parsed:
        if len(parsed.surface) >= 1:  # EOSを覗くためにあります
            components.append(parsed.surface)
        parsed = parsed.next
    return components

get_wordsのメソッドを利用して文書を単語(トーケン)に分けることができます。MeCabのparseToNode(text)メソッドの結果にはsurface とfeatureの二つのものがありますが、ここではsurfaceしか利用しません。

では、先ほど作成したwikipediaのデータをトーケン化しましょう。

token_file = 'tokens.txt'

def tokenize_text(input_filename: str, output_filename: str, mt: MeCab.Tagger):
    lines = 0
    if os.path.isfile(output_filename):
        lines = count_lines(output_filename)  # 続く場合には何行を飛ばすかを調べる
    batch = []
    with open(input_filename, 'r') as data:
        for i, text in enumerate(data.readlines()):
            if i < lines:
                continue
            tokenized_text = ' '.join(get_words(text, mt))
            batch.append(tokenized_text)
            if i % 10000 == 0 and i != 0:
                write_tokens(batch, output_filename)
                batch = []
                print('Tokenized ,', i, 'lines')
    write_tokens(batch, output_filename)
    print('トーケン作成完了')


def write_tokens(batch: List[str], file_name: str):
    with open(file_name, 'a+') as out:
        for out_line in batch:
            out.write(out_line)
            out.write('\n')

            
def count_lines(file: str) -> int:
    count = 0
    with open(file) as d:
        for line in d:
            count += 1
    return count
    
tagger = MeCab.Tagger('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd') #neologd がなければ、別なものを使ってもOK
tokenize_text(wiki_text_file_name, token_file, tagger)

Wikipediaの記事ごとをトーケンに変え、半角スペースを入れ、またファイルに出力しています。メモリーにもつのもありかもしれませんが、RAMを結構使うケースが多いために、ファイルに出力させていただいてます。

ベクトル化

いおいおベクトル作成の時がきました。コード的には簡単にかけます。

vector_file = 'ja-MeCab-50.data.model'
def generate_vectors(input_filename, output_filename):
    if os.path.isfile(output_filename):
        return
    model = Word2Vec(LineSentence(input_filename),
                     size=VECTORS_SIZE, window=5, min_count=5,
                     workers=cpu_count(), iter=5)
    model.save(output_filename)
    print('ベクトル作成完了。')

generate_vectors(token_file, vector_file)

GensimのWord2Vecで、上記のように定義された50次元のベクトルが作成されます。min_count_は辞書に含まれるための最低回数です。使ってるデータに応じて編集してみてください。

ベクトル作成もかなりの時間かけますので、ローカルで行う場合、一晩中学習させてもらう感じではオススメです。また、google datalabやawsで実装してみた方が良いかもしれません。

お試しタイム!

これで作成できましたので、どうやって使えばいいのか、、

from gensim.models import load_model
import pprint

model = load_model(vector_file)
model = model.wv

pprint.pprint(model['東京'])
pprint.pprint(mm.most_similar(positive='東京', topn=5))

で試してみてください!

f:id:c-pattamada:20190326105354p:plain
私のお試しの結果

まとめ

今回はベクトル辞書の作成について説明いたしました。wikipediaのデータで学習し、MeCabを利用してトーケン化して、最後にWord2Vecのベクトル辞書の作成を行いました。

この辞書の結果がEmbeddingとしてニューラルネットにも使えます。でもそちらはまた別に機会に。

トップの写真はこちらからいただきました: https://www.pexels.com/photo/black-and-white-book-business-close-up-267669/

今回のコードは以下です gist.github.com

また、こちらも参考になりました。 GitHub - philipperemy/japanese-words-to-vectors: Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.

Audio Classification using AutoML Vision

For a given audio dataset, can we do audio classification using Spectrogram? well, let's try it out ourselves and let's use Google AutoML Vision to fail fast :D

We'll be converting our audio files into their respective spectrograms and use spectrogram as images for our classification problem.

Here is the formal definition of the Spectrogram

A Spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.

For this experiment, I'm going to use the following audio dataset from Kaggle

www.kaggle.com

go ahead and download the dataset {Caution!! : The dataset is over 5GB, so you need to be patient while you perform any action on the dataset. For my experiment, I have rented a Linux virtual machine on Google Could Platform (GCP) and I'll be performing all the steps from there. Moreover, you need a GCP account to follow this tutorial}

Step 1: Download the Audio Dataset

Training Data (4.1 GB)

curl https://zenodo.org/record/2552860/files/FSDKaggle2018.audio_train.zip?download=1 --output audio_train.zip

upzip audio_train.zip

Test Data (524 MB)

curl https://zenodo.org/record/2552860/files/FSDKaggle2018.audio_test.zip?download=1 --output audio_test.zip

unzip audio_test.zip

Metadata (150 KB)

curl https://zenodo.org/record/2552860/files/FSDKaggle2018.meta.zip?download=1 --output meta_data.zip

unzip meta_data.zip

After downloading and unzipping you should have the following things in your folder
(Note: I have the renamed the folder after unzipping )

f:id:vivek081166:20190325191702p:plain

Step 2: Generate Spectrograms

Now that we have our audio data in place, let's create spectrograms for each audio file.

We'll need FFmpeg to create spectrograms of audio files

ffmpeg.org

Install FFmpeg using the following command

sudo apt-get install ffmpeg

Try it out yourself… go into the folder which has an audio file and run the following command to create its spectrogram

ffmpeg -i audioFileName.wav -lavfi showspectrumpic=s=1024x512 anyName.jpg

For example, "00044347.wav" from training dataset will sound like this

clyp.it

and spectrogram of "00044347.wav" looks like this

f:id:vivek081166:20190326105505j:plain

As you can see, the red area shows loudness of the different frequencies present in the audio file and it is represented over time. In the above example, you heard a hi-hat. The first part of the file is loud, and then the sound fades away and the same can be seen in its spectrogram.

The above ffmpeg command creates spectrogram with the legend, however; we do not require legend for image processing so let's drop legend and create a plain spectrogram for all our image data.

Use the following shell script to convert all your audio files into their respective spectrograms
(Create and run the following shell script at the directory level where "audio_data" folder is present)

gist.github.com

I have moved all the generated image file into the folder "spectro_data"

f:id:vivek081166:20190325192534p:plain

Step 3: Move image files to Storage

Now that we have generated spectrograms for our training audio data, let's move all these image files on Google Cloud Storage (GCS) and from there we will use those files in AutoML Vision UI.

Use the following command to copy image files to GCS

gsutil cp spectro_data/* gs://your-bucket-name/spectro-data/

f:id:vivek081166:20190325192818p:plain

Step 4: Prepare file paths and their label

I have created the following CSV file using metadata that we have downloaded earlier. Removing all the other columns, I have kept only the image file location and its label because that's what is needed for AutoML.

f:id:vivek081166:20190325193037p:plain

docs.google.com

You will have to put this CSV file on your Cloud Storage where the other data is stored.

Step 5: Create a new Dataset and Import Images

Go to AutoML Vision UI and create a new dataset

cloud.google.com

f:id:vivek081166:20190325193335p:plain

Enter dataset name as per your choice and for importing images, choose the second options "Select a CSV file on Cloud Storage" and provide the path to the CSV file on your cloud storage.

f:id:vivek081166:20190325193453p:plain

The process of importing images may take a while, so sit back and relax. You'll get an email from AutoML once the import is completed.

After importing of image data is done, you'll see something like this

f:id:vivek081166:20190325193633p:plain

Step 6: Start Training

This step is super simple… just verify your labels and start training. All the uploaded images will be automatically divided into training, validation and test set.

f:id:vivek081166:20190325193844p:plain

Give a name to your new model and select a training budget
For our experiment let's select 1 node hour (free*) as training budget and start training the model and see how it performs.

f:id:vivek081166:20190325193935p:plain

Now again wait for training to complete. You'll receive an email once the training is completed, so you may leave the screen and come back later, meanwhile; let the model train.

f:id:vivek081166:20190325194009p:plain

Step 7: Evaluate

and here are the results…

f:id:vivek081166:20190325194207p:plain

Hurray … with very minimal efforts our model did pretty well

f:id:vivek081166:20190325194242p:plain

Congratulations! with only a few hours of work and with the help of AutoML Vision we are now pretty much sure that classification of given audio files using its spectrogram can be done using machine learning vision approach. With this conclusion, now we can build our own vision model using CNN and do parameter tuning and produce more accurate results.

Or, if you don't want to build your own model, go ahead and train the same model with more number of node-hours and use the instructions given in PREDICT tab to use your model in production.

That's it for this post, I'm Vivek Amilkanthawar from Goalist. See you soon with one of such next time; until then, Happy Learning :)

goalist.co.jp

Choosing a Deep Learning Framework

Implementing deep learning algorithms from scratch using Python and NumPY is a good way to get an understanding of the basic concepts, and to understand what these deep learning algorithms are really doing by unfolding the deep learning black box.

However, as you start to implement very large or more complex models, such as convolutional neural networks (CNN) or recurring neural networks (RNN), it is increasingly not practical, at least for most of the people like me, is not practical to implement everything yourself from scratch.

Even though you understand how to do matrix multiplication and you are able to implement it in your code. But as you build very large applications, you'll probably not want to implement your own matrix multiplication function but instead, you want to call a numerical linear algebra library that could do it more efficiently for you. Isn't it?

The efficiency of your algorithm will help you fail fast 😃 and thus will help you to complete your iteration throughout the IDEA -> EXPERIMENT -> CODE cycle much more quickly. 

f:id:vivek081166:20190228134309p:plain

I think this is crucially important when you are in the middle of Deep Learning pipeline.

f:id:vivek081166:20190228134333p:plain

So let's take a look at the frameworks out there…

Today, there are many deep learning frameworks that make it easy for you to implement neural networks, and here are some of the leading ones.

f:id:vivek081166:20190228134407p:plain

Each of these frameworks has a dedicated user and developer community and I think each of these frameworks is a credible choice for some subset of applications. However, when I see the below graph my obvious choice goes for TensorFlow.

f:id:vivek081166:20190228134323p:plain

Well, I just said that I would choose TensorFlow. But, is only the above popularity scores matter while choosing a framework for your deep learning project? Turns out not… 

I think many of these frameworks are evolving and getting better very rapidly. If the framework scores top in popularity in 2018 then by the end of 2019 it may not hold the same position. 

There are a lot of people writing articles comparing these deep learning frameworks and how well these deep learning frameworks changes. And because these frameworks are often evolving and getting better month to month, I'll leave you to do a few internet searches yourself, if you want to see the arguments on the pros and cons of some of these frameworks.

So, how can you make a decision about which framework to use?

Rather than strongly endorsing any of these frameworks, I would like to share three factors that Stanford Professor Andrew Ng considers important enough to influence your decision.

1) Ease of programming

This includes developing, iterating, and finally, deploying your neural network to production where it may be used by millions of users.

2) Running Speeds

Training on large data sets can take a lot of time, and differences in training speed between frameworks can make your workflow a lot more time efficient.

3) Openness

This last criterion is not often discussed, but Andrew Ng believes it is also very important. A truly open framework must be open source, of course, but must also be governed well.
So it is important to use a framework from the company that you can trust. As the number of people starts to use the software, the company should not gradually close off what was open source, or perhaps move the functionality into their own proprietary cloud services.

But at least in the short term depending on your preferences of language, whether you prefer Python or Java or C++ or something else, and depending on what application you're working on, whether this can be division or natural language processing or online advertising or something else, I think multiple of these frameworks could be a good choice.

f:id:vivek081166:20190228135807p:plain

So that was just a higher level abstraction of deep learnig programming framework. Any of these frameworks can make you more efficient as you develop machine learning applications.

In a subsequent post, we'll take a step from zero → one to learn TensorFlow

That's it for this post, my name is Vivek from Goalist. See you soon with one of such next time; until then, Happy Learning :)

goalist.co.jp

センチメント分析用の言葉辞書と遊んでみました。綺麗なグラフをいっぱい描けました。

こんにちは、皆さん。ゴーリストのチナパです!

f:id:c-pattamada:20190228110449p:plain

こういうキャラらしいです。

以前はこちら http://developers.goalist.co.jp/entry/2018/11/16/150000 の辺りに顔の気持ちのを機械学習で当ててみるような記事もありますが、 この度は、顔の気持ちではなく、言葉の気持ちについての記事です。

文書の気持ちを理解するためにはまずは言葉の気持ちを理解しないといけません。今日はセンチメント分析によく利用される「辞書」のデータを調べてみます。

こちらの辞書を使わせていただいてます。

http://www.lr.pi.titech.ac.jp/\~takamura/pubs/pn_ja.dic

同じく英語の辞書もありましたので、そちらもみてみたいです!

http://www.lr.pi.titech.ac.jp/\~takamura/pubs/pn_en.dic

こちらの辞書では言葉、ふりがな、品詞、と 1から-1までの数字があります。 最後の数字の方は「ポジティブ性」や「ネガティブ性」を表そうとしてます。では、中身はどうなのかを調べてみましょう!

まずは開く

必要なインポートは以下

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

先ほどのurlでダウンロードし、パイソンに入れておきましょう。

english_words = pd.read_csv(en_dict_path, sep=':', ['words', 'type', 'score']) jp_words = pd.read_csv(jp_dict_path, sep=':', names=['words', 'reading', 'type', 'score'], header=None)

なんとなく何が入っているのかをみてみましょう print(jp_words.type.unique()) => ['動詞' '形容詞' '名詞' '副詞' '助動詞'] print(jp_words.groupby('type').type.count()) =>

type
副詞      1207
助動詞        2
動詞      4252
名詞     48999
形容詞      665

Pandasの力で割と簡単にできますね。 つまり、圧倒的に名詞が多くて、助動詞は無視できそうですね。

グラッフを書こう!

これで、項目名も設定されます。

sns.set(style='ticks', palette='Set2')
sns.despine()

そして、グラフが綺麗に出るような設定をしました。

では全体のデータの偏りなどをみてみましょう。

sns.distplot(jp_words.score)

f:id:c-pattamada:20190228003327p:plain
日本語単語のネガポジグラフ

おー、意外ですね!全体的に言葉が左に片よっれます。なんででしょうね。英語に比較しましょう!

sns.distplot(englishwords.score)

f:id:c-pattamada:20190228003247p:plain
英語単語のネガポジグラフ

こっちは偏ってないですね。でも非常に真ん中に集まってます。これは、、日本語にはちょっとだけネガティブなニュアンスが入っている言葉が多そうですね。理由を当てようとしますと、この辞書が作られた元のサンプルでのネガティブな気持ちが入っていた日本語文章が多かったとか…しか思い当たらないですね。

さて、最後にもう一つの軸でみたいですね。言葉の種類5つありましたが、それぞれの偏りがどうなのかも調べてみたいです。

軸が分けられたグラフ

つまり、動詞、名詞、などのそれぞれのカテゴリー別にもどんな感じなのかもみてみたいということです。

g = sns.FacetGrid(jp_words, hue='type', height=6)
g.map(plt.hist, 'score')
g.add_legend()
new_labels = ['verb', 'adj', 'adv', 'ex']
for t, l in zip(g._legend.texts, new_labels): t.set_text(l)

f:id:c-pattamada:20190228014530p:plain
名詞が多い!

うげ、、、名詞が多い!何も見えないですね。下の辺の副詞の気配を薄くしか感じないですね。もちろん、名詞のデータが全部のデータの形に非常に似ています。量が圧倒的なので、当然ですが。仕方ありません、名詞を抜いてグラフをみましょう。

sns.FacetGrid(jp_words[jp_words.type != '名詞'], hue='type', height=6)

上記のコードをこう編集しますとできます。

f:id:c-pattamada:20190228014611p:plain
名詞なしの図

なるほどですね、形容詞がすごく極端になってます。動詞が割と多く、名詞よりちょっと左に肩よっています。

結論

これで、pandasとseabornを利用し、持っているデータをより把握できるようなったかなと思います。この次には「なんで単語が左偏ってる?」みたいな質問も聞いて調べるのが次の段階です。そこでまた新たな質問が出てくるかも知れませんし、プロジェクトに対してこの辞書を本当に活かせる方法はなんでしょうかを判断できるようになりますが、そこは時と場合に応じて違う答えだと思います。旅みたいなものです。「データ」を利用して「情報」を得る「情報」を利用して「決断」をする。それがデータサイエンスの中心だと思います。

一緒に旅しましょう〜

Data Visualization in Python

こんにちは、 ゴーリストのビベックです。
Hello World! This is Vivek from Goalist.

If you want to build a very powerful machine learning algorithm on structured data then the first step to take is to explore the data every which way you can. You draw a Histogram; you draw a Correlogram; you draw Cross Plots, you really want to understand what is in that data what does each variable mean, what's its distribution ideally how was it collected.

Once you have a real rock solid understanding of what's in the data, only then can you smoothly move into creating your machine learning model.

f:id:vivek081166:20190227141304j:plain

In this post, let's go through the different libraries in Python and gain some insight into our data by visualizing it.

Along with me, you may want to try and experiment with the packages that we are about to explore. Use Google Colab to follow along.

Well, you may ask what is Google Colab?

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.
With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Learn more about Google Colab here…

youtu.be

Follow this URL to create your notebook

colab.research.google.com

So let's get started…
We'll be using the following packages to plot different graphs

  • Matplotlib
  • Seaborn
  • Bokeh

Out of these, Matplotlib is the most common charting package, see its documentation for details, and its examples for inspiration.

1) Line Graphs

A line graph is commonly used to show trends over time. It can help readers to understand your data in a few seconds.

f:id:vivek081166:20190227145617p:plain

Sample code to generate a Line Graph is given below. Go ahead and open the sample code in Colab and experiment with it.

gist.github.com

2) Bar Charts

Bar graphs, also known as column charts, use vertical or horizontal bars to represent data along both an x-axis and a y-axis visually. Each bar represents one value. When the bars are stacked next to one another, the viewer can compare the different bars, or values, at a glance.

f:id:vivek081166:20190227201142p:plain

Sample code to generate a Bar Chart and 3D Bar Chart is given below.

gist.github.com

3) Histogram

A histogram is used to summarize discrete or continuous data. In other words, it provides a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values (called "bins"). 

It is similar to a vertical bar graph. However, a histogram, unlike a vertical bar graph, shows no gaps between the bars.

f:id:vivek081166:20190227201314p:plain

gist.github.com

4) Scatter Plots

Scatter Plot is a graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present.

f:id:vivek081166:20190227201410p:plain

Sample code to generate a Scatter Plot and 3D Scatter Plot is given below.

gist.github.com

5) Pie Charts

A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.

f:id:vivek081166:20190227201516j:plain
(A Donut Chart is a variation of a Pie Chart but with a space in the center.)

gist.github.com

6) Wireframe Plots

Wireframe plots are used to graphically represent skeletal sketches of functions defined over a rectangular grid. Geographic data is an example of where this type of graph would be used.

f:id:vivek081166:20190227201611j:plain

gist.github.com

7) Regplot

Regplot is a simple scatterplot with a nice regression line fit to it. 
We'll use Seaborn's regplot to draw Regplot

f:id:vivek081166:20190227202106p:plain

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Here are some more examples of Seaborn for inspiration

gist.github.com

8) HeatMap

A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors. Github's contribution calendar is an example of a heatmap.

f:id:vivek081166:20190227202430p:plain

gist.github.com

9) Bubble Chart

A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There is a third data set, which is indicated by the size of the bubble or circle.
f:id:vivek081166:20190227202519p:plain

To draw an interactive bubble chart let's use bokeh

Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

gist.github.com

That's all for this post see you soon with one of such next time; until then, 
Happy Learning :)