Tesla needs to do a better job predicting pedestrian intent.
Diffusion seems to be more compute-efficient than transformers for vision.
From 2019:
Develop an effective variant for estimating
visibility statuses of objects while tracking them in videos. Dealing
with partial or full occlusions is a long standing problem in
computer vision but largely remains unsolved. In this work,
we cast the above problem as a Markov Decision Process
and develop a policy-based jump-diffusion method to jointly
track object locations in videos and estimate their visibility
statuses. Our method employs a set of jump dynamics to change
object’s visibility statuses and a set of diffusion dynamics to
track objects in videos. Different from traditional jump-diffusion
process that stochastically generates dynamics, we utilize deep
policy functions to determine the best dynamic for the present
state and learn the optimal policies using reinforcement learning
methods. Our method is capable of tracking objects with full or
partial occlusions in crowded scenes.