Currently when receiving video over RTP we store only
a calculated samples on the frame. When starting the video
it can take some time for this calculation to actually yield
a value as it requires constant changing timestamps. As well
if a video frame passes over multiple RTP packets this calculation
will fail as the timestamp is the same as the previous RTP
packet and the number of samples calculated will be 0.
This change preserves the timestamp on the frame and allows
it to pass through the core. When sending the video this timestamp
is used instead of a new one being calculated.