Hand-to-Target Guidance

Sunday, May 1, 2011

Plan B

My next step was going to be creating a C++ wrapper for OpenTLD by compiling it and generating a shared library with the Matlab compiler, mcc. The standard way to call C/C++ functions from Matlab is via mex files, which is what OpenTLD does. I incorrectly assumed that mcc came with my student version of Matlab. After prepping the environment I was disappointed (understatement) to find out that mcc is not only unavailable with the current student version of Matlab but that it is actually incompatible with it. Since Matlab is not designed for multithreading I will continue looking for a pre-compiled C++ wrapper, but in the meantime I've decided to interleave the two trackers within a single thread. This will involve switching between updating the two different trackers, which are both running on the same frame, and updating the bounding boxes on the common frame.

In case I have access to mcc in the future, creating C/C++ wrappers for Matlab functions would be pretty cool. Here's an example of how to do it:
http://www.codeproject.com/KB/DLL/MatlabSharedLib.aspx

Friday, April 29, 2011

new scenarios

Here are some new scenarios I filmed at a very popular market that begins with the letter 'R':

Crispy Fox:

Jello:

Snuggle:

Refrigerated:

pop tarts:

I am compiling a library of scenarios in order to gain some intuition about what kinds of situations may require some tweaking of the trackers in order to make them perform better. My next step is to compile the MATLAB portion of OpenTLD into a library and use it in a C++ application that will instantiate two trackers and integrate the data harvested from each in order to provide feedback to the user.

Friday, April 22, 2011

Slow predator ?

I currently have both OpenCV's Camshift tracker and OpenTLD (nicknamed "Tracker") running on my in-house grocery footage. The Camshift tracker runs in real time but is imprecise because it is only based on colorspace information, since it was primarily designed to track faces. By changing some parameters around it was easily adaptable to track hands. One major problem, however, is that since there is no simultaneous learning, when the tracker picks up on another object that obscures the view of the one it is currently tracking (any other mass skin), if the color palette is roughly the same then it could start tracking the new object in the foreground. Also due to the fact that there is no simultaneous learning, Camshift isn't able to "remember" the object and re-recognize it if it disappears from view for a few frames. Using OpenTLD instead would offer a better recovery from such situations. Unfortunately, TLD's performance is pretty dismal when run on my footage because the processing frame rate is so low. Some Googling informed me that others had similar problems, and the decrease in performance seems to be partially attributed to the fact that half the tracker is still written in Matlab, as it hasn't been fully translated to C yet. Since translating the tracker to C myself isn't feasible in the given time frame, I tried lowering the resolution in order to morph my video to resemble the demo input as much as possible. I haven't achieved real time detection yet, and even if I did, the real test would be to get the tracker working on live data. After playing around with the canned data I was able to get it to operate at a higher frame rate, albeit with many tweaks and cheats:

(1) I reduced the resolution by 50%
(2) I removed every other frame (my footage was collected at 30 fps so it is now reduced to 15 fps)
(3) I turned off the learning aspect of the tracker after some initial learning had been completed and the frame rate jumped up, while maintaining the increased accuracy it had accrued as a result of the learning.

Since the GroZi user is cooperating with the system and not trying to "lose" the tracker, it may be acceptable to use less data. However, this must be kept in check by the need to deliver feedback to the user frequently in order to facilitate smooth guidance.

The following video demonstrates the ability of TLD to improve its tracking after some initial learning. The first time through the sequence, the fruit bars are tracked pretty well--until the hand occludes them. By the second time through the footage, however, the recovery after the occlusion is improved. During the third time through the sequence I turned off the learning, and the accuracy gain attributable to the learning portion of the tracker is maintained while the frame rate increases.

My next step is to try to harness this recovery power of TLD while minimizing the time it takes to learn in order to ensure good performance. One approach could be to try to do some of the learning offline, but that pretty much eliminates the portion of Predator that sets it apart from other trackers.

Some notes

Compilation note:

While trying to get Matlab to work with the OpenCV library in order to run OpenTLD I encountered a problem: Matlab wasn't able to find g++ when it tried executing the mex files. This is due to an incorrect symbolic link (Matlab was looking for the wrong g++ library since I'm using a newer version of g++ than Matlab R2010a expected). In order to fix this problem I had to create a new symbolic link so that Matlab could find the correct library:

cd ~/Matlab/sys/os/glx86
ln -s /usr/lib/libstdc++.so.6.0.13 ~/Matlab/sys/os/glnx86/libstdc++.so.6

Notes about tools:

Avidemux, a video editing utility for Ubuntu, has proven to be very useful in dealing with my raw footage. It has proven very useful for...
(1) converting movies into a series of jpgs (file -> save -> save selection as JPEG images)
(2) adjusting the resolution to more closely mimic the demo footage conditions. My movies were running at less than half the speed that the demo was running at (about 2-4 fps instead of 6-7 fps). In order to test whether or not I could attribute this to the disparity in resolutions between the two movies, I adjusted the resolution of a few of my sample videos to 50% of their original resolution. In order to use Avidemux to do this, open the file you wish to convert, and in the left toolbar under "Video" select MPEG-4 ASP, click "Filters" and add MPlayerResize. I then set the new size to be 50% of the old video. Click "done" and then click "Configure." For encoding mode select "two pass - average bitrate" and set kb/sec to 1050. Save. Done.

RecordMyDesktop is a serviceable utility for recording within a designated window (others were too slow and seemed to be affecting how fast the tracker ran because they were using too many resources). Even RMD may be slowing the processing down but if it is, it's not terribly noticeable.

Even though it ended up not being useful for this project, I played around with a tool to convert a large amount of image files to other formats and came across Imagemagick. Pretty cool.

Monday, April 11, 2011

Camshift vs Predator

I got opencv to build correctly on Friday. I immediately tried out the camshiftdemo and it worked like a charm! Sort of. Some problems I encountered right away were:

(1) incorrect parameters for the lighting of the room which would have been approximate to that of a grocery store.
(2) when I moved out of the field of view and returned, the tracker was not able to re-recognize my face.
(3) this tracker would only be appropriate for the hand portion of the tracking because it is dependent on colorspace information.
(4) there may be issues with using this as a hand tracker because the arm will be included and since an ellipse is usually fitted to the face while tracking, it may not be directly adaptable to the arm

If I do use a camshift tracker for tracking the hand I may try to fit the ellipse to the hand alone and figure out how to ignore the arm

I wanted to use the markerless AR tracking displayed by Taehee Lee here, but it looks like I won't have time to use this approach for determining the coordinate system because not only is it relatively involved, but it would require quite a bit of revision to make it work for my needs, since the computations rely heavily on "seeing" all 5 fingers at once, which isn't realistic if the data is going to come from someone reaching naturally for an object (and many of their fingers, if not all of them, will be obscured by the hand itself).

I have acquired the Predator code for fast scale-invariant and very adaptable tracking of objects and my try to use this for tracking grocery products. This would be ideal because
(1) it does not rely on colorspace information like camshift does
(2) if the product goes out of the field of view and then returns, Predator will recover just fine--this would be helpful in the case of occlusion (if a person walks by while they are searching for the product or if they simply move too far away from a product).

I have been trying to figure out how to use git to access this code.Hopefully I'll figure that out in the next few days.

I have downloaded a very simple camshift wrapper to get started on the hand tracking (here). There was a useful explanation about how the tracker works and how to use it here. The demo works with captured footage from a webcam, so to get a quick jumpstart I should be able to adapt it pretty easily by just capturing from an .avi file (one of the test movies I've taken). I have this coded but am having trouble compiling with pkg-config because the newly introduced .h files still aren't found... hopefully this gets resolved soon.

next steps:
(1) get a git repository and unpack the Predator code!
(2) get camshift running on my home movies
(3) record new movies in which
- my hand is not obscured by a target (since camshift relies on colorspace data)
- the background is not a green whiteboard, but a real product on a shelf
- still no occlusion, not much clutter

(4) get the Predator code running on the same movies, but not yet at the same time
(5) figure out how to integrate the two trackers, each tracking their respective targets in the same movie!

Friday, April 1, 2011

Captured Data

While I'm figuring out what machine + environment I'm going to end up working with I decided to gather some preliminary data, as the first baby steps are going to be performed offline with canned data. I am almost definitely going to change this around but wanted to start with something to be able to tweak. I nailed a green whiteboard to a white wall and placed a red block with a black "X" on the back of my hand. Some aspects I've noticed right away about these videos that will require thought to deal with / alter are (a) standardizing the time of day I take the movies so the lighting will be constant(ish), (b) how far away my hand is from the whiteboard (depth), (c) whether my camera's field of view is wide enough (I'm currently using a 10 mega pixel Canon PowerShot... this may have to change).

Sunday, March 27, 2011

Uh-oh

Hmm, I seem to have doomed myself from the start, as I am running eclipse in ubuntu in a vm in windows and it appears that this setup will be too slow for me to be productive in. the laptop I'm currently using is not mine so I don't have the option to dual boot. I'll need to figure something else out.