We present a computational model based on the heat conduction equation, which can well explain human performance of depth interpolation. The model assumes that the depth information is locally represented and spatial integration is made by iterative processing of mutual interaction of neighbors. It reconstructs a dynamically transforming surface which is in good agreement with the results of psychophysical experiments on depth perception of untextured (uniform-colored) surface moving in depth. The model can also explain a temporal-frequency property of human percept. We conclude that the local ambiguity, which is quite common in everyday visual scenes, is solved by an interpolation mechanism based on iterative local interaction of locally represented visual information.