Designing a Lab for ELEC 474

2 Weeks ago my supervisor asked if I would be interested in designing the 5th lab exercise for his 4th year course ELEC 474 Machine Vision to which I accepted. My task was very open ended all he wanted was a lab centered around Eigenface that would be appropriate for the students.

Designing the lab was an interesting experience on one monitor I was working on the program the students would eventually have to write and on the other monitor I had a LaTeX file of the lab handout.  It was tough to figure out how long to make the lab and what the right difficulty would be. I decided to split it into 2 main components - in the pre lab the students would prepare the face data and learn how to use cv::FileStorage while the actual Eigenface implementation would be done in the lab period.  I ended up coding both the pre-lab and the lab in their entirety and then removing the code from certain functions for the students to fill in.

The lab itself went quite well! As part of the pre-lab they had to code a function that would combine their face database with another classmates database and of course this process could be continued so they could all share data. This ended up working way better than I expected and some students who coded the function generically were able to use it for all sorts of handy data manipulation to quickly generate the test cases we wanted to see during marking. I think the students in particular really enjoyed visualizing the Eigenfaces and some of them really got into trying the different ColorMaps available in OpenCV - I think Hot and Bone produced the coolest results!

Eigenfaces using JET ColorMap

The one thing I might do differently would be the effort marks I put in the grading scheme. I put the effort marks in because I wanted the students to put some time into the “softer” part of the lab like choosing good and interesting faces to test on and labelling their data nicely etc. But in the end it just became tough to rationalize what grade a student was getting. In my teaching assistant positions I have noticed that when a grade is given for a demonstration with the student present it’s a lot harder to give less than perfect grade as they will debate it with you but if it’s an assignment that is handed in students seem to accept non-perfect grades without saying anything, not sure why this is.

All in all it was an interesting experience that I would definitely do again. It was particularly interesting to see where the common pitfalls and misunderstandings were, gave me some more insight into teaching. I enjoyed designing the lab - I like teaching others especially when it is on a topic I like so much.

For anyone who is interested here are the lab materials!

lab5_prelab

lab5

Pre-Lab skeleton code:

#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/objdetect/objdetect.hpp>
#include <opencv2/highgui/highgui.hpp>

#include
#include
#include

using namespace cv;
using namespace std;

Mat detectFace(const Mat &image, CascadeClassifier &faceDetector);
void resizeFace(Mat &face);
void addToDataSet(Mat &data, vector &labels, Mat &newData, vector &newLabels);

int main()
{
  // Set up face detector
  CascadeClassifier faceDetector;
  if(!faceDetector.load("lbpcascade_frontalface.xml"))
  {
    cerr << "ERROR: Could not load classifier cascade" << endl;
    return -1;
  }

  // fill these variable with your data set
  Mat samples;
  vector labels;

  return 0;
}

Mat detectFace(const Mat &image, CascadeClassifier &faceDetector)
{
  vector faces;
  faceDetector.detectMultiScale(image, faces, 1.1, 2, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30));

  if(faces.size() == 0)
  {
    cerr << "ERROR: No Faces found" << endl;     return Mat();   }   if(faces.size() > 1)
  {
    cerr << "ERROR: Multiple Faces Found" << endl;
    return Mat();
  }

  //Mat detected = image.clone();
  //for(unsigned int i = 0; i < faces.size(); i++)
  //{
  //    rectangle(detected, faces[i].tl(), faces[i].br(), Scalar(255,0,0));
  //}

  //imshow("faces detected", detected);
  //waitKey();

  return image(faces[0]).clone();
}

void resizeFace(Mat &face)
{
  // code here
}

void addToDataSet(Mat &samples, vector &labels, Mat &newSamples, vector &newLabels)
{
  // code here
}

Lab skeleton code

#include <opencv2/core/core.hpp>
#include <opencv2/contrib/contrib.hpp>
#include <opencv2/ml/ml.hpp>
#include <opencv2/highgui/highgui.hpp>

#include
#include
#include

using namespace cv;
using namespace std;

void addToDataSet(Mat &data, vector &labels, Mat &newData, vector &newLabels);
Mat norm_0_255(Mat src);
string recognizeFace(Mat query, Mat samples, vector labels);

int main()
{
  // Load your data and combine it with the data set of several of your peers using:
  // addToDataSet

  // Perform PCA
  // cv::PCA pca(....);

  // Visualize Mean
  Mat meanFace = pca.mean;
  // normalize and reshape mean
  imshow("meanFace", meanFace);
  waitKey();

  // Visualize Eigenfaces
  for(unsigned int i = 0; i < pca.eigenvectors.rows; i++)
  {
    Mat eigenface;
    eigenface = pca.eigenvectors.row(i).clone();
    // normalize and reshape eigenface
    //applyColorMap(eigenface, eigenface, COLORMAP_JET);

    imshow(format("eigenface_%d", i), eigenface);
    waitKey();
  }

  // Project all samples into the Eigenspace
  // code..

  // ID Faces
  // code..

  return 0;
}

void addToDataSet(Mat &samples, vector &labels, Mat &newSamples, vector &newLabels)
{
  // your code from the pre lab
}

Mat norm_0_255(Mat src)
{
  // Create and return normalized image
  // should work for 1 and 3 channel images
}

string recognizeFace(Mat query, Mat samples, vector labels)
{
  // given a query sample find the training sample it is closest to and return the proper label
  // implement a nearest neighbor algorithm to achieve this
}

Feel free to email me if you want the solutions!

read more

ARPool in London UK!

Jeremy Clarkson, Me, Stephen Fry, Salar on the set of Gadget Man

Just got back from taking ARPool to London England (wow what a trip!) I’ll write a blog about my trip later but right now I just want to focus on ARPool.

On Thursday Chris and Alex from North One Television were about an hour late picking us up at our hotel so we were starting to set up behind schedule. I’m pretty thankful that the schedule was changed so that filming was late afternoon on Friday rather than morning or we would have been very pressed for time. I had to help build the Truss – so much for this being a show up and plug in a computer gig, anyways it wasn’t too big a deal and at least I didn’t have to carry and assemble a pool table like past events. We got the structure up and mounted the camera and projector (I am a wizard with zip-ties!) and fired up the calibration.

Things were not as smooth as I had expected – the table calibration step, the final step that creates the transformation matrices was giving us grief - the final image after the transformation was either rotated 90 degrees or had some really strange jagged distortion. I thought I had fixed this step but it needs more work – I expect the problem is in the line intersection code possible a bad solution or an overflow problem. It is really weird because you can perform this step multiple times without altering the code and it can give wildly different results. I think that drawing more intermediate images will fix this problem especially during the line intersection process.

* edit * The week after we got back I added more intermediate debugging visuals which are nice to confirm the calibration is doing what we expect but didn’t solve the issue. Still can’t believe this but our source of grief was from Qt being unable to display images with an odd number of rows or columns. That alone makes sense but what still confounds me is how this hadn’t come before…

Anyways thanks to the randomness of the table calibration procedure it eventually worked for us and we moved on to finish the calibration. We skipped ball training and simply ran the system, surprisingly the classifier was still able to get the cue ball!

Stephen Fry playing ARPool - also I swear I am not photo-shopped into this photo it's just a weird affect from the light behind me!

The filming process was very interesting, there were tons of crew members and I could never really figure out who was in charge with the exception of the director. I got to briefly explain to Stephen Fry and Jeremy Clarkson how the system worked and how to use it before they played a game on film. Stephen Fry got to use ARPool while Jeremy was on his own. The shoot went pretty well but after the first shot Stephen Fry sort of forgot how to use it and didn’t have the cue close enough to the cue ball for it to detect the shot. They also both kept bumping into the truss which of course can throw off the whole calibration -wince! After their game we filmed a few sequences of Stephen using ARPool which were set up and I was there to guide him through making the shot. ARPool isn’t quite a hands off demo at this point! We got a few more shots after the celebrities left and everyone was pretty happy with how it went and I am sure that it will look great!

read more

ARPool London Preparation

I’ve spent a great deal of time this past month getting ARPool ready for our trip to London UK to be on Stephen Fry Gadget Man. So much has happened it is going to be tough for me to remember it all!

It begins by testing all the code I wrote after OCE which actually essentially worked on the first try a minor miracle! After confirming that ARPool 2.0 ran we spent the next week improving the calibration process. I added much better debugging visuals and several new functions including a really handy calibration test function which tests the mapping from image coordinates to projector coordinates by drawing test points on the table so you can confirm the calibration accuracy.

With calibration looking pretty good I moved onto the ARRecogntion Class. This class is the work horse of ARPool all the vision algorithms are in here. I started by cleaning the code and adding documentation – you heard right in ~4 years of the project no one has documented it… I made a nice comment block above each function clearly explaining its purpose, its inputs (in detail such as if the image is rectified or not etc.), outputs. There is also a debugging tips section.

I made many small improvements to ARRecognition and a few major ones – notably I re-did the motionDetection algorithm which detects when the shot is finished restarting the process. I also worked on the detectShot algorithm to fix a bug where another ball next to the cue ball would cause a shot to not be detected. The first major change was in the detectBalls function where I added code for detecting balls whose blobs are joined after background subtraction. The final algorithm here is quite slick, if the blob is bigger than a single ball a matchTemplate is ran to find the 2,3,4 etc. best balls in the image. A max search is performed on the output of matchTemplate and the location is recorded and then a black circle is drawn centered at the point before running another max search. This ensures the balls that are found are not overlapping at all. This change made a big difference in the performance of the system – especially since we could now be more relaxed on the background subtraction threshold since ball blobs joining is no longer an issue. I also added some code for the special case of finding the balls when they are set up for the break shot.

That explanation was kind of brutal and if you’re actually reading this your probably like code so here is the source!

/************************************************************************
/   segmentBallClusters
/
/   Input: blobs image and the contour to be segmented
/          numBalls - the number of balls the contour should be segmented into
/          centers - a vector of ball centers not an input but an output via reference
/
/   Return: the ball centers in the vector by reference
/
/   Purpose: Segment a blob into a number of balls, gets called by detectBallCenters
/            when a blob is the the area of nultiple balls, e.g. two balls are touching
/
/   Debugging Tips: Check the blobs image saved by detectBallCenters
/                   watch what the erode is doing
/                   imshow the "results" image
/
************************************************************************/
void ARRecognition::segmentBallClusters(cv::Mat &blobs, std::vector<cv::Point> &contour, int numBalls, std::vector<cv::Point2f> &centers)
{
    // draw the blob onto the blobs image
    std::vector<std::vector<cv::Point> > contours;
    contours.push_back(contour);
    cv::drawContours(blobs, contours, -1, cv::Scalar(255,255,255), CV_FILLED);

    // get an ROI of the blob we need to segment
    cv::Rect rect = cv::boundingRect(contour);
    cv::Mat roi = blobs(rect).clone();
    cv::cvtColor(roi,roi,CV_RGB2GRAY);

    // template for the ball
    cv::Mat templ = cv::Mat::zeros(m_ball_radius*2,m_ball_radius*2,CV_8U);
    cv::circle(templ, cv::Point(m_ball_radius, m_ball_radius), m_ball_radius, cv::Scalar(255,255,255), CV_FILLED);

    cv::Mat result;
    cv::matchTemplate(roi,templ,result,CV_TM_CCOEFF_NORMED);

    cv::Point offset(rect.x, rect.y);

    if(numBalls != 15)
    {
        for(int i = 0; i < numBalls; i++)
        {
            // find the maximum
            double minVal;
            double maxVal;
            cv::Point minLoc;
            cv::Point maxLoc;
            cv::minMaxLoc(result,&minVal,&maxVal,&minLoc,&maxLoc);

            // add ball center at max loc
            cv::Point center(maxLoc.x + offset.x + m_ball_radius, maxLoc.y + offset.y + m_ball_radius);
            centers.push_back(center);

            // remove this max by drawing a black circle
            cv::circle(result,maxLoc,m_ball_radius,cv::Scalar(0,0,0),CV_FILLED);

            //cv::imshow("result",result);
            //cv::waitKey();
        }
    }
    // numBalls == 15 -- Special case for the "break"
    else
    {
        cv::erode(roi,roi,cv::Mat(),cv::Point(-1,1),(int)m_ball_radius/2);
        cv::findContours(roi.clone(), contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);

        std::vector<cv::Point> polygon;
        double precision = 1.0;

        while(polygon.size() != 3)
        {
            //std::cout << "precision" << precision << " num polygon vertices" << polygon.size() << std::endl;
            cv::approxPolyDP(contours[0], polygon,precision,true);
            precision += 0.1;
        }

        // add offset to polygon points
        for(int k = 0; k < 3; k++)
        {
            polygon[k] += offset;
        }

        cv::polylines(blobs,polygon,true,cv::Scalar(0,255,0),3);

        // interpolate ball centers
        centers.push_back(polygon[0]);
        centers.push_back(polygon[0] - 0.25*(polygon[0] - polygon[1]) - 0.25*(polygon[0] - polygon[2]));

        centers.push_back(polygon[0] + 0.7 * (polygon[1] - polygon[0]));
        centers.push_back(polygon[0] + 0.5 * (polygon[1] - polygon[0]));
        centers.push_back(polygon[0] + 0.3 * (polygon[1] - polygon[0]));

        centers.push_back(polygon[1]);
        centers.push_back(polygon[1] - 0.25*(polygon[1] - polygon[2]) - 0.25*(polygon[1] - polygon[0]));

        centers.push_back(polygon[1] + 0.7 * (polygon[2] - polygon[1]));
        centers.push_back(polygon[1] + 0.5 * (polygon[2] - polygon[1]));
        centers.push_back(polygon[1] + 0.3 * (polygon[2] - polygon[1]));

        centers.push_back(polygon[2]);
        centers.push_back(polygon[2] - 0.25*(polygon[2] - polygon[1]) - 0.25*(polygon[2] - polygon[0]));

        centers.push_back(polygon[2] + 0.7 * (polygon[0] - polygon[2]));
        centers.push_back(polygon[2] + 0.5 * (polygon[0] - polygon[2]));
        centers.push_back(polygon[2] + 0.3 * (polygon[0] - polygon[2]));
    }

    return;
}

The biggest change I made was to re-do the ball identification system. The current one was not working great and was very messy code wise which I didn’t like. I wanted to make it cleaner, easier to maintain and I wanted it to use the OpenCV machine learning module. I gathered a bunch of training and test data and started playing around with the data. I found great success with cvBoost classifiers. The new system is a bit different as it employs different classifiers for different purposes. First a classifier picks the cue ball from all the other balls (1 vs all) and another does the same for the 8 ball. Another classifier classifies the remaining balls as either stripes or solids before finally 2 separate classifiers assign actual numbers. This is great because each level is a fail safe. Our primary concern is finding the cue ball properly, next is the 8 ball. The actual number is the least important but it is good to know if the ball is a stripe or a solid.

With ARRecognition in the best shape it’s ever been in I started investigating the graphics code and why our drawImage function was not working properly any more.  I wasted the whole first afternoon just trying to find a solution worked out for me online – it was one of those things where I just didn’t feel like diving in and as a result I got no where. The next day though I had motivation again – I started the graphics code from scratch adding back one function at a time. This turned out to be a really good thing to do because now I understand the graphics code. I was also able to consolidate all the opengl code from about 4 different .cpp files into a single file, now all the drawing code was in one place brilliant! I added the same documentation as I had for ARRecognition – man this project is starting to look very nice!

All of this work has been very satisfying for me but it makes me a bit sad that no one truly appreciates all the vast improvements I have made except for me. The system looks the same when it is running but the code behind is so much more reliable and easy to work with. I’m sure it will pay off in London!

read more