Nowadays Python is everywhere - academics, data science, machine learning, enterprise application, web application, scripting... you name it python is everywhere. Whatever you do, python is there either to help you or give you a headache.
Let's say, you have learned python programming and ready to use to develop applications, surely, as that sounds great, you jump into coding python scripts and eventually start installing python packages. From there one follows a dangerous path into a developer’s nightmare.
Package installation may lead to having incompatibility issues or make other applications unworkable. And you may discover that your code does not work on some machines while it just works flawlessly on your local machine. Why??? It's because of the Python environment.
To save yourself from incompatibility issues, a separate virtual python environment needs to be created for a project.
A virtual environment is a bunch of scripts and directories that can run python isolated. By using a virtual environment, each python project can have its own dependencies regardless of other projects and system python environments.
In this blog post, I would like to share with you my environment for working with data and doing machine learning. You most definitely do not need to copy anyone's setup but perhaps use the one that best fit for you.
Every programmer has different preferences when it comes to their programming environment vim versus emacs, tabs versus spaces, virtualenv versus anaconda.
To start with, we need to talk about pip. A python person {what :O} knows that pip is Python's package manager.
It has come built into Python for quite a while now, so if you have Python, you likely have pip already.
pip installs packages like tensorflow and numpy, pandas and jupyter, and many more along with their dependencies.
Many Python resources are delivered in some form of pip packages. Sometimes you may see a file called requirements.txt in someone's folder of Python scripts. Typically, that file outlines all of the pip packages
that the project uses, so you can easily install everything needed by using
pip install -r requirements.txt
As part of this ecosystem, there's a whole world of version numbers and dependencies. You sometimes need to use different versions of a given library for different projects that you are working on.
So you need a way to organize groups of packages into different isolated environments. Otherwise, looking at the version errors you would want to bang your head against the wall.
There are two popular options currently for taking care of managing your different pip packages
virtualenv and anaconda.
1) Virtualenv
Virtualenv is a package that allows you to create named virtual environments where you can install pip packages in an isolated manner. This tool is great if you want to have detailed control over which packages you install for each environment you create.
For example, you could create an environment for web development with one set of libraries, and a different environment for data science. This way, you won't need to have unrelated libraries interacting with each other, and it allows you to create environments dedicated to specific purposes.
Now, if you're primarily doing data science work, Anaconda is also a great option. Anaconda is created by Continuum Analytics, and it is a Python distribution that comes preinstalled with lots of useful Python libraries
for data science. Anaconda is popular because it brings many of the tools used in data science and machine learning with just one install, so it's great for having a short and simple setup.
Like Virtualenv, Anaconda also uses the concept of creating environments so as to isolate different libraries
and versions. Anaconda also introduces its own package manager called conda from where you can install libraries.
Additionally, Anaconda still has a useful interaction with pip that allows you to install any additional libraries which
are not available in the Anaconda package manager.
Follow the instructions to download and install anaconda from here
To add more you have a nice UI to manage your projects and environment
So... which one to use, virtualenv or anaconda?
Well, it's nice to try out different libraries on both virtualenv and anaconda, but sometimes those two package managers don't necessarily play nicely with each other on one system.
In my case, I have opted to use both, but I manage the whole thing using a library called pyenv.
Conceptually, pyenv sits on top of both virtualenv and anaconda and it can be used to control not only which virtualenv environment or Anaconda environment is in use, but it also easily controls whether I'm running Python 2 or Python 3.
One final aspect of pyenv that it has an ability to set a default environment for a given directory. This causes that desired environment to be automatically activated when you enter a directory.
~/code $ cd myproject
(py35_tf12) ~/code/myproject $
I find this to be way easier than trying to remember which environment I want to use every time I work on a project.
So which package manager do you use?
It really comes down to your workflow and preferences. If you typically just use the core data science tools
and are not concerned with having some extra libraries installed that you don't use, Anaconda can be a great choice since it leads to a simpler workflow for your needs and preferences.
But if you are someone who loves to customize your environment and make it exactly like how you want it,
then perhaps something like virtualenv or even pyenv maybe more to your liking.
There's no one right way to manage Python libraries, and there's certainly more out there than the options
that I just presented.
As different tools come and go, it's important to remember that everyone has different needs
and preferences, so choose for yourself the best one that fits your needs.
That's it for this post, my name is Vivek Amilkanthwar. See you soon with one of such next time; until then, Happy Learning :)
file_name = ... #あなたのモデルのパス
model = load_model(file_name)
そのようなファイルはすでに持っていない方は以下のようにモデルを作成してましたら、チュートリアルをやってみれます。
from keras.applications.resnet50 import ResNet50
model = ResNet50(weights='imagenet')
3番目の「気づき」は上から下の階層一番上にはオレンジ、次は青、次はオレンジ、一番したは青。この中には、青が左肩よりだとオレンジは右肩よりだとも気づいているみたです、面白い。
どうやってこれができたのかが気になりましたら、このニューロンに入っていく線をホバーしてみると”weight is 〇〇”みたいのものがあります。
ここでは、(-0.33 x X1) + (0.3 x X2) + (sin(X1) x 1.6) + (sin(X2) x 2.7)のようにこのグラフが作られてますが、数学はそこまでとしましょう。
// Feed raw audio files directly into the deep neural network without any feature extraction. //
If you have observed, conventional audio and speech analysis systems are typically built using a pipeline structure, where the first step is to extract various low dimensional hand-crafted acoustic features (e.g., MFCC, pitch, RMSE, Chroma, and whatnot).
Although hand-crafted acoustic features are typically well designed, is still not possible to retain all useful information due to the human knowledge bias and the high compression ratio. And of course, the feature engineering you will have to perform will depend on the type of audio problem that you are working on.
But, how about learning directly from raw waveforms (i.e., raw audio files are directly fed into the deep neural network)?
In this post, let's take learnings from this paper and try to apply it to the following Kaggle dataset.
The downloaded dataset will have a label either "normal", "unlabelled", or one of the various categories of abnormal heartbeats.
Our objective here is to solve the heartbeat classification problem by directly feeding raw audio files to a deep neural network without doing any hand-crafted feature extraction.
Prepare Data
Let's prepare the data to make it easily accessible to the model.
extract_class_id(): Audio file names have its label in it, so let's separate all the files based on its name and give it a class id. For this experiment let's consider "unlabelled" as a separate class. So as shown above, in total, we'll have 5 classes.
convert_data(): We'll normalize the raw audio data and also make all audio files of equal length by cutting them into 10s if the file is shorter than 10s, pad it with zeros. For each audio file, finally put the class id, sampling rate, and audio data together and dump it into a .pkl file and while doing this make sure to have a proper division of train and test dataset.
As written in the research paper, this architecture takes input time-series waveforms, represented as a long 1D vector, instead of hand-tuned features or specially designed spectrograms.
There are many models with different complexities explained in the paper. For our experiment, we will use the m5 model.
m5 has 4 convolutional layers followed by Batch Normalization and Pooling. a callback keras.callback is also assigned to the model to reduce the learning rate if the accuracy does not increase over 10 epochs.
Let's start training our model and see how it performs on the heartbeat sound dataset.
As per the above code, the model will be trained over 400 epochs, however, the loss gradient flattened out at 42 epochs for me, and these were the results. How did yours do?
Congratulations! You’ve saved a lot of time and effort extracting features from audio files. Moreover, by directly feeding the raw audio files the model is doing pretty well.
With this, we learned how to feed raw audio files to a deep neural network. Now you can take this knowledge and apply to the audio problem that you want to solve. You just need to collect audio data normalize it and feed it to your model.
The above code is available at following GitHub repository