Image Recognition in Ruby
A few steps you can use to help you get started with image recognition in ruby
This blog post is documentation through my journey through the Google API documentation, which you can view here
Google Cloud Vision is an API allowing developers to understand the content of an image, by implementing Google Machine learning and artificial intelligence learning models. Some interesting use cases of image recognition include.
Competitive Landscape
courtesy of kairos
Google, Amazon, and Microsoft are the most popular vision API’s and although each of the companies has the same end-user result, I choose to work with Google because it connects to the entire google cloud, and once you have it set up, it gives you access to many different artificial intelligence capabilities. I also found their documentation easy to follow, so if you’re reading this and haven’t yet take a look and then come back to this article, for my own personal experience.
You can get started for free with up to $300, but with a pro plan
- they offer Tech Support
- You’re killing it
Many smaller companies offer compelling use cases as well.
Use Cases
Authentication and Emotion
Microsoft Project Oxford AI
Microsoft’s Project Oxford offers facial recognition to easily recognize users and certain photos and allow users to log into phones using authentication. Used as a tool for marketing companies to gauge success metrics.
A Self Driving Future
Self-driving car image classification
Self-driving cars perform something known as image classification, which is trained to recognize various objects such as other cars and pedestrians. Images are detected or classified. The classified images get passed to the neural network and get localized where the artificial intelligence connects the input with a valid suggestion or result.
Augmented Reality
Marker Based Augmented Reality
Marker Based augmented reality works of triggers in your local environment. These markers, which work off of image recognition, can provide users with entire experiences based on their location and in-situ environments.
Getting Started
Create a new Google Cloud Project
a) Sign up quickly using your GMAIL account.
b) On the dashboard tap the “My First Project” dropdown and select “New Project”
c) Enter your project name and hit create
d) You should see an empty dashboard
Google Cloud Platform Dashboard
Start cloud shell
In order to run the cloud environment, you need to have a server running, instead of downloading a server, for the purposes of this article you can use the Google Cloud Shell, which is a command-line environment that runs in the cloud
Run the following command to confirm that you are authenticated
gcloud auth list
Command Output
Credentialed accounts:
- <myaccount>@<mydomain>.com (active)
Run this code next
gcloud config list project
Command Output
project = <PROJECT_ID>
3. Enable the vision API
Run this in the cloud shell
gcloud services enable vision.googleapis.com
Command Output
4. Create a service account
In order to use the API, you need to create a service account to make requests. There are a few different ways to go about this step, and it is the most complex, but here’s what worked best for me.
- Go to the Service Account Key Page
- From the Service account list, select New service account.
- In the Service account name field, enter a name.
- From the Role list, select Project > Owner.
- Click Create. A JSON file that contains your key downloads to your computer.
- Remember the save location as you will need to access it in the next step
After creating and downloading your service account key in JSON. Locate your file path in the terminal and copy it into your clipboard
After confirming the file works and is opening properly set this file two proper keys in your terminal input the following
export GOOGLE_APPLICATION_CREDENTIALS="/Users/AndyXcode/Downloads/Mod-3-Blogpost-fff77bdf1b10.json"t
This sets your application credentials
export GOOGLE_CLOUD_PROJECT="/Users/AndyXcode/Downloads/Mod-3-Blogpost-fff77bdf1b10.json"
This sets your cloud project credentials, note both of these are pointed to the same JSON file. Note if your file name has any spaces, make sure to edit the file name before setting the credentials. Once you have this set up the rest is pretty straightforward.
5. Install the Cloud Vision Gem
gem install google-cloud-vision -v 0.27.0
6. Download the sample ruby repository
git clone https://github.com/GoogleCloudPlatform/ruby-docs-samples.git
cd ruby-docs-samples
git checkout "a902f30dd449ce82469cc315610c8a3d4888ff5a"
change directory into ruby-docs-samples/vision
cd vision
Run bundle install in your vision folder
7. Start IRB
irb --noecho
8. Sample an Image
Run the following code in your IRB
require "google/cloud/vision"vision = Google::Cloud::Vision.new
**select an image**
image = vision.image "images/otter_crossing.jpg"
9. Explore API
The vision API offers many different options of ways to analyze and image and they also provide ruby files with information on how to use each of its different features.
10. Perform image analysis
Here’s the code Google Provides for us to identify landmarks. The goes into the image and calls landmarks, and pastes the landmark’s description. The code also then puts both the latitude and longitude of landmark
image.landmarks.each do |landmark|
puts landmark.descriptionlandmark.locations.each do |location|
puts "#{location.latitude}, #{location.longitude}"
end
end
Let’s run this image
A caption will be provided by Google… shortly
image = vision.image "images/imagine.png" => [#<Google::Cloud::Vision::Annotation::Entity mid: "/m/013397", locale: "", description: "Strawberry Fields (memorial)", score: 0.416331022977829, confidence: 0.0, topicality: 0.0, bounds: 4, locations: 1, properties: {}>]
It successfully pulled the image description — and it’s location property
Strawberry Fields (memorial)40.775744, -73.975281
Let’s paste that location into google maps
Strawberry Fields (memorial) 40.775744, -73.975281
Let’s try and perform text detection on that image as well, see if the API can pick up on the text “imagine”
puts image.text
close…
IMAG NE=> nil
it’s missing the ‘i’ in imagine.
Now for the final test
Take screenshot of student in the class and run an emotion detection
image coming soon
JOY:
ANGER:
SORROW:
SUPRISE:
As you can tell once you have the available data from the image input there are a lot of different directions that you can take this app. The API offers unlimited possibilities. The next step for me would be to take the API from IRB and bring it into a functional javascript application that I can work with. All of the possibilities make Google Vision API one of the most exciting developments in the industry.
As you can tell once you have the available data from the image input there are a lot of different directions that you can take this app. The API offers unlimited possibilities. The next step for me would be to take the API from IRB and bring it into a functional javascript application that I can work with. All of the possibilities make Google Vision API one of the most exciting developments in the industry.