Friday, April 22, 2011

Slow predator ?

I currently have both OpenCV's Camshift tracker and OpenTLD (nicknamed "Tracker") running on my in-house grocery footage. The Camshift tracker runs in real time but is imprecise because it is only based on colorspace information, since it was primarily designed to track faces. By changing some parameters around it was easily adaptable to track hands. One major problem, however, is that since there is no simultaneous learning, when the tracker picks up on another object that obscures the view of the one it is currently tracking (any other mass skin), if the color palette is roughly the same then it could start tracking the new object in the foreground. Also due to the fact that there is no simultaneous learning, Camshift isn't able to "remember" the object and re-recognize it if it disappears from view for a few frames. Using OpenTLD instead would offer a better recovery from such situations. Unfortunately, TLD's performance is pretty dismal when run on my footage because the processing frame rate is so low. Some Googling informed me that others had similar problems, and the decrease in performance seems to be partially attributed to the fact that half the tracker is still written in Matlab, as it hasn't been fully translated to C yet. Since translating the tracker to C myself isn't feasible in the given time frame, I tried lowering the resolution in order to morph my video to resemble the demo input as much as possible. I haven't achieved real time detection yet, and even if I did, the real test would be to get the tracker working on live data. After playing around with the canned data I was able to get it to operate at a higher frame rate, albeit with many tweaks and cheats:

(1) I reduced the resolution by 50%
(2) I removed every other frame (my footage was collected at 30 fps so it is now reduced to 15 fps)
(3) I turned off the learning aspect of the tracker after some initial learning had been completed and the frame rate jumped up, while maintaining the increased accuracy it had accrued as a result of the learning.

Since the GroZi user is cooperating with the system and not trying to "lose" the tracker, it may be acceptable to use less data. However, this must be kept in check by the need to deliver feedback to the user frequently in order to facilitate smooth guidance.

The following video demonstrates the ability of TLD to improve its tracking after some initial learning. The first time through the sequence, the fruit bars are tracked pretty well--until the hand occludes them. By the second time through the footage, however, the recovery after the occlusion is improved. During the third time through the sequence I turned off the learning, and the accuracy gain attributable to the learning portion of the tracker is maintained while the frame rate increases.


My next step is to try to harness this recovery power of TLD while minimizing the time it takes to learn in order to ensure good performance. One approach could be to try to do some of the learning offline, but that pretty much eliminates the portion of Predator that sets it apart from other trackers.

No comments:

Post a Comment