3番目の「気づき」は上から下の階層一番上にはオレンジ、次は青、次はオレンジ、一番したは青。この中には、青が左肩よりだとオレンジは右肩よりだとも気づいているみたです、面白い。
どうやってこれができたのかが気になりましたら、このニューロンに入っていく線をホバーしてみると”weight is 〇〇”みたいのものがあります。
ここでは、(-0.33 x X1) + (0.3 x X2) + (sin(X1) x 1.6) + (sin(X2) x 2.7)のようにこのグラフが作られてますが、数学はそこまでとしましょう。
// Feed raw audio files directly into the deep neural network without any feature extraction. //
If you have observed, conventional audio and speech analysis systems are typically built using a pipeline structure, where the first step is to extract various low dimensional hand-crafted acoustic features (e.g., MFCC, pitch, RMSE, Chroma, and whatnot).
Although hand-crafted acoustic features are typically well designed, is still not possible to retain all useful information due to the human knowledge bias and the high compression ratio. And of course, the feature engineering you will have to perform will depend on the type of audio problem that you are working on.
But, how about learning directly from raw waveforms (i.e., raw audio files are directly fed into the deep neural network)?
In this post, let's take learnings from this paper and try to apply it to the following Kaggle dataset.
The downloaded dataset will have a label either "normal", "unlabelled", or one of the various categories of abnormal heartbeats.
Our objective here is to solve the heartbeat classification problem by directly feeding raw audio files to a deep neural network without doing any hand-crafted feature extraction.
Prepare Data
Let's prepare the data to make it easily accessible to the model.
extract_class_id(): Audio file names have its label in it, so let's separate all the files based on its name and give it a class id. For this experiment let's consider "unlabelled" as a separate class. So as shown above, in total, we'll have 5 classes.
convert_data(): We'll normalize the raw audio data and also make all audio files of equal length by cutting them into 10s if the file is shorter than 10s, pad it with zeros. For each audio file, finally put the class id, sampling rate, and audio data together and dump it into a .pkl file and while doing this make sure to have a proper division of train and test dataset.
As written in the research paper, this architecture takes input time-series waveforms, represented as a long 1D vector, instead of hand-tuned features or specially designed spectrograms.
There are many models with different complexities explained in the paper. For our experiment, we will use the m5 model.
m5 has 4 convolutional layers followed by Batch Normalization and Pooling. a callback keras.callback is also assigned to the model to reduce the learning rate if the accuracy does not increase over 10 epochs.
Let's start training our model and see how it performs on the heartbeat sound dataset.
As per the above code, the model will be trained over 400 epochs, however, the loss gradient flattened out at 42 epochs for me, and these were the results. How did yours do?
Congratulations! You’ve saved a lot of time and effort extracting features from audio files. Moreover, by directly feeding the raw audio files the model is doing pretty well.
With this, we learned how to feed raw audio files to a deep neural network. Now you can take this knowledge and apply to the audio problem that you want to solve. You just need to collect audio data normalize it and feed it to your model.
The above code is available at following GitHub repository
// Teach your computer to recognize gestures and trigger a set of actions to perform after a certain gesture is recognized.//
Hello World! I'm very excited to share with you my recent experiment wherein I tried to teach my computer certain gestures and whenever those gestures are recognized certain actions will be performed.
In this blog, I'll explain all you need to do to achieve the following
IF: I wave to my webcam THEN: move the mouse pointer a little to its right
I have used power for Node js to achieve this. The idea is to create a native desktop app which will have access to the operating system to perform certain actions like a mouse click or a keyboard button press and also on the same native desktop app we'll try to train our model and draw inferences locally.
To make it work, I thought of using tensorflow.js and robotjs in an Electron App created using Angular.
So, are you ready? let's get started…
Generate the Angular App
Let's start by creating a new Angular project from scratch using the angular-cli
npm install -g @angular/cli
ng new teachable-desktop-automation
cd teachable-desktop-automation
Install Electron
Add Electron and also add its type definitions to the project as dev-dependency
npm install electron --save-dev
npm install @types/electron --save-dev
Configuring the Electron App
Create a new directory inside of the projects root directory and name it as "electron". We will use this folder to place all electron related files.
Afterward, make a new file and call it "main.ts" inside of the "electron" folder. This file will be the main starting point of our electron application.
Finally, create a new "tsconfig.json" file inside of the directory. We need this file to compile the TypeScript file into JavaScript one.
Use the following as the content of "tsconfig.json" file.
Create a custom build command for compiling main.ts & starting electron. To do this, update "package.json" in your project as shown below
{
"name": "teachable-desktop-automation",
"version": "0.0.0",
"main": "electron/dist/main.js", // <-- this was added
"scripts": {
"ng": "ng",
"start": "ng serve",
"build": "ng build",
"test": "ng test",
"lint": "ng lint",
"e2e": "ng e2e",
"electron": "ngbuild --base-href ./ && tsc --pelectron && electron ." // <-- this was added},
// ...omitted}
We can now start our app using npm:
npm run electron
There we go… our native desktop app is up and running! However, it is not doing anything yet.
Let's make it work and also add some intelligence to it…
Add Robotjs to the project
In order to simulate a mouse click or a keyboard button press, we will need robotjs in our project.
I installed robotjs with the following command
npm install robotjs
and then tried to use in the project by referring to some examples on their official documentation. However, I struggled a lot to make robotjs work on the electron app. Finally here is a workaround that I came up with
Add ngx-electron to the project
npm install ngx-electron
And then inject its service to the component where you want to use the robot and use remote.require() to capture the robot package.
A quick reference for KNN Classifier and MobileNet package
Here is a quick reference to the methods that we'll be using in our app. You can always refer tfjs-models for all the details on implementation.
KNN Classifier
knnClassifier.create() : Returns a KNNImageClassifier.
.addExample(example, classIndex) : Adds an example to the specific class training set.
.predictClass(image) : Runs the prediction on the image, and returns an object with a top class index and confidence score.
MobileNet
.load() : Loads and returns a model object.
.infer(image, endpoint) : Get an intermediate activation or logit as Tensorflow.js tensors. Takes an image and the optional endpoint to predict through.
Finally make it work
For this blog post, I'll keep aside the cosmetics part (CSS I mean) apart and concentrate only on the core functionality
Using some boilerplate code from Teachable Machine and injecting robotjs into app component here is how it looks
With this, your computer can learn your gestures and can perform a whole lot of different things because you have direct access to your operating system.
The source code of this project can be found at below URL…
import MeCab
import wget
from gensim.corpora import WikiCorpus
from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence
import jaconv
from multiprocessing import cpu_count
import os
wiki_text_file_name = 'wiki.txt'defread_wiki(wiki_data, save_file):
if os.path.isfile(save_file):
print('Skipping reading wiki file...')
returnwithopen(save_file, 'w') as out:
wiki = WikiCorpus(wiki_data, lemmatize=False, dictionary={}, processes=cpu_count())
wiki.metadata = True
texts = wiki.get_texts()
for i, article inenumerate(texts):
text = article[0] # article[1] は記事名です
sentences = [normalize_text(line) for line in text]
text = ' '.join(sentences) + u'\n'
out.write(text)
if i % 1000 == 0and i != 0:
print('Logged', i, 'articles')
print('wiki保存完了')
read_wiki(wiki_file_name, wiki_text_file_name)
from gensim.models import load_model
import pprint
model = load_model(vector_file)
model = model.wv
pprint.pprint(model['東京'])
pprint.pprint(mm.most_similar(positive='東京', topn=5))
For a given audio dataset, can we do audio classification using Spectrogram? well, let's try it out ourselves and let's use Google AutoML Vision to fail fast :D
We'll be converting our audio files into their respective spectrograms and use spectrogram as images for our classification problem.
Here is the formal definition of the Spectrogram
A Spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.
For this experiment, I'm going to use the following audio dataset from Kaggle
go ahead and download the dataset {Caution!! : The dataset is over 5GB, so you need to be patient while you perform any action on the dataset. For my experiment, I have rented a Linux virtual machine on Google Could Platform (GCP) and I'll be performing all the steps from there. Moreover, you need a GCP account to follow this tutorial}
As you can see, the red area shows loudness of the different frequencies present in the audio file and it is represented over time. In the above example, you heard a hi-hat. The first part of the file is loud, and then the sound fades away and the same can be seen in its spectrogram.
The above ffmpeg command creates spectrogram with the legend, however; we do not require legend for image processing so let's drop legend and create a plain spectrogram for all our image data.
Use the following shell script to convert all your audio files into their respective spectrograms
(Create and run the following shell script at the directory level where "audio_data" folder is present)
I have moved all the generated image file into the folder "spectro_data"
Step 3: Move image files to Storage
Now that we have generated spectrograms for our training audio data, let's move all these image files on Google Cloud Storage (GCS) and from there we will use those files in AutoML Vision UI.
Use the following command to copy image files to GCS
I have created the following CSV file using metadata that we have downloaded earlier. Removing all the other columns, I have kept only the image file location and its label because that's what is needed for AutoML.
Enter dataset name as per your choice and for importing images, choose the second options "Select a CSV file on Cloud Storage" and provide the path to the CSV file on your cloud storage.
The process of importing images may take a while, so sit back and relax. You'll get an email from AutoML once the import is completed.
After importing of image data is done, you'll see something like this
Step 6: Start Training
This step is super simple… just verify your labels and start training. All the uploaded images will be automatically divided into training, validation and test set.
Give a name to your new model and select a training budget
For our experiment let's select 1 node hour (free*) as training budget and start training the model and see how it performs.
Now again wait for training to complete. You'll receive an email once the training is completed, so you may leave the screen and come back later, meanwhile; let the model train.
Step 7: Evaluate
and here are the results…
Hurray … with very minimal efforts our model did pretty well
Congratulations! with only a few hours of work and with the help of AutoML Vision we are now pretty much sure that classification of given audio files using its spectrogram can be done using machine learning vision approach.
With this conclusion, now we can build our own vision model using CNN and do parameter tuning and produce more accurate results.
Or, if you don't want to build your own model, go ahead and train the same model with more number of node-hours and use the instructions given in PREDICT tab to use your model in production.
That's it for this post, I'm Vivek Amilkanthawar from Goalist. See you soon with one of such next time; until then, Happy Learning :)
Implementing deep learning algorithms from scratch using Python and NumPY is a good way to get an understanding of the basic concepts, and to understand what these deep learning algorithms are really doing by unfolding the deep learning black box.
However, as you start to implement very large or more complex models, such as convolutional neural networks (CNN) or recurring neural networks (RNN), it is increasingly not practical, at least for most of the people like me, is not practical to implement everything yourself from scratch.
Even though you understand how to do matrix multiplication and you are able to implement it in your code. But as you build very large applications, you'll probably not want to implement your own matrix multiplication function but instead, you want to call a numerical linear algebra library that could do it more efficiently for you. Isn't it?
The efficiency of your algorithm will help you fail fast 😃 and thus will help you to complete your iteration throughout the IDEA -> EXPERIMENT -> CODE cycle much more quickly.
I think this is crucially important when you are in the middle of Deep Learning pipeline.
So let's take a look at the frameworks out there…
Today, there are many deep learning frameworks that make it easy for you to implement neural networks, and here are some of the leading ones.
Each of these frameworks has a dedicated user and developer community and I think each of these frameworks is a credible choice for some subset of applications. However, when I see the below graph my obvious choice goes for TensorFlow.
Well, I just said that I would choose TensorFlow. But, is only the above popularity scores matter while choosing a framework for your deep learning project? Turns out not…
I think many of these frameworks are evolving and getting better very rapidly. If the framework scores top in popularity in 2018 then by the end of 2019 it may not hold the same position.
There are a lot of people writing articles comparing these deep learning frameworks and how well these deep learning frameworks changes. And because these frameworks are often evolving and getting better month to month, I'll leave you to do a few internet searches yourself, if you want to see the arguments on the pros and cons of some of these frameworks.
So, how can you make a decision about which framework to use?
Rather than strongly endorsing any of these frameworks, I would like to share three factors that Stanford Professor Andrew Ng considers important enough to influence your decision.
1) Ease of programming
This includes developing, iterating, and finally, deploying your neural network to production where it may be used by millions of users.
2) Running Speeds
Training on large data sets can take a lot of time, and differences in training speed between frameworks can make your workflow a lot more time efficient.
3) Openness
This last criterion is not often discussed, but Andrew Ng believes it is also very important. A truly open framework must be open source, of course, but must also be governed well.
So it is important to use a framework from the company that you can trust. As the number of people starts to use the software, the company should not gradually close off what was open source, or perhaps move the functionality into their own proprietary cloud services.
But at least in the short term depending on your preferences of language, whether you prefer Python or Java or C++ or something else, and depending on what application you're working on, whether this can be division or natural language processing or online advertising or something else, I think multiple of these frameworks could be a good choice.
So that was just a higher level abstraction of deep learnig programming framework. Any of these frameworks can make you more efficient as you develop machine learning applications.
In a subsequent post, we'll take a step from zero → one to learn TensorFlow.
That's it for this post, my name is Vivek from Goalist. See you soon with one of such next time; until then, Happy Learning :)