Why feed_dict is constructed when running epoch in PTB tutorial on Tensorflow?

Question

Q1:&#160;I am following&#160;this tutorial&#160;on Recurrent Neural Networks, and I am wondering why do you need to create&#160;feed_dict&#160;in the following part of the code:def run_epoch(session, model, eval_op=None, verbose=False):

state = session.run(model.initial_state)

fetches = {
      "cost": model.cost,
      "final_state": model.final_state,
  }
  if eval_op is not None:
    fetches["eval_op"] = eval_op

for step in range(model.input.epoch_size):
    feed_dict = {}
    for i, (c, h) in enumerate(model.initial_state):
      feed_dict[c] = state[i].c
      feed_dict[h] = state[i].h

vals = session.run(fetches, feed_dict)I tested and it seems that if you remove this part of the code, the code also runs:def run_epoch(session, model, eval_op=None, verbose=False):

fetches = {
      "cost": model.cost,
      "final_state": model.final_state,
  }
  if eval_op is not None:
    fetches["eval_op"] = eval_op

for step in range(model.input.epoch_size):
    vals = session.run(fetches)So my question is why do you need to reset the initial state to zeros after you feed a new batch of data?Q2:&#160;Also, from what I understand using&#160;feed_dict&#160;is considered to be slow. That is why it is recommended to feed data using&#160;tf.data&#160;APIs. Is using&#160;feed_dict&#160;also an issue in this case? If so, how is it possible to avoid using&#160;feed_dict&#160;in this example.UPD:&#160;Thank you a lot @jdehesa for your detailed response. It helps a lot! Just before I close this question and accept your answer, could you clarify one point that you mentioned answering Q1.I see now the purpose of&#160;feed_dict. However, I am not sure that it is something that is implemented in the tutorial. From what you say:At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.I just looked again into&#160;the source code&#160;of the tutorial, and I do not see where the the output state is set as new current state for the next iteration. Is it done somewhere implicitly or do I miss something?I maybe also missing something on theoretical side. Just to make sure that I understand it correctly, here there is a quick example. Assume the input data is an array that stores integer values from 0 to 120. We set the batch size is&#160;5, the number of data points in one batch is&#160;24, and the number of time steps in unrolled RNN is&#160;10. In this case you, you only use data points at time points from&#160;0to&#160;20. Then you process the data in two steps (model.input.epoch_size = 2). When you iterate over&#160;model.input.epoch_size:state = session.run(model.initial_state)
# ...
for step in range(model.input.epoch_size):
  feed_dict = {}
  for i, (c, h) in enumerate(model.initial_state):
    feed_dict[c] = state[i].c
    feed_dict[h] = state[i].h

vals = session.run(fetches, feed_dict)you feed a batch of data like this:> Iteration (step) 1:
x:
 [[  0   1   2   3   4   5   6   7   8   9]
 [ 24  25  26  27  28  29  30  31  32  33]
 [ 48  49  50  51  52  53  54  55  56  57]
 [ 72  73  74  75  76  77  78  79  80  81]
 [ 96  97  98  99 100 101 102 103 104 105]]
y:
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 25  26  27  28  29  30  31  32  33  34]
 [ 49  50  51  52  53  54  55  56  57  58]
 [ 73  74  75  76  77  78  79  80  81  82]
 [ 97  98  99 100 101 102 103 104 105 106]]
At each iteration, you construct a new&#160;feed_dict&#160;with the initial state of he recurrent units at zero. So you assume at each step that you start processing the sequence from scratch. Is it correct?

Priyaj · Answer

Q1.&#160;feed_dict&#160;is used in this case to set the initial state of the recurrent units. By default, on each call to&#160;run&#160;recurrent units process data with an initial "zero" state. However, if your sequences are long you may need to split them into several steps. It is important that, after each step, you save the final state of the recurrent units and input as initial state for the next step, otherwise it would be as if the next step were the beginning of the sequence again (in particular, if your output is only the final output of the network after processing the whole sequence, it would be like discarding all the data prior to the last step). At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.Q2.&#160;The claim the "feed_dict&#160;is slow" can be somewhat misleading, taken as a general truism (I am not blaming you for saying it, I have seen it many times too). The problem with&#160;feed_dictis that its function is to take non-TensorFlow data (typically NumPy data) into TensorFlow world. It is not that it is terrible at that, it is just that it takes some extra time to move the data around, which is especially notable when a lot of data is involved. For example, if you want to input a batch of images through&#160;feed_dict, you need to load them from disk, decode them, convert it to a big NumPy array and pass it into&#160;feed_dict, then TensorFlow would copy all the data into the session (GPU memory or whatever); so you would two copies of the data in memory and additional memory exchanges.&#160;tf.data&#160;helps because it does everything within TensorFlow (which also reduces the number of Python/C trips and is sometimes more convenient in general). In your case, what is being fed through&#160;feed_dict&#160;are the initial states of the recurrent units. Unless you have several quite big recurrent layers I'd say the performance impact is probably rather small. It&#160;is&#160;possible, though, to avoid&#160;feed_dict&#160;in this case too, you would need to have a set of TensorFlow variables holding the current state, set up the recurrent units to use their output as initial state (with the&#160;initial_state&#160;parameter of&#160;tf.nn.dynamic_rnn) and use their final state to update the variable values; then on each new batch you would have to reinitialize the variables to the "zero" state again. However, I would make sure that this is going to have a significant benefit before going down that route (e.g. measure runtime with and without&#160;feed_dict, even though the results will be wrong).

Why feed dict is constructed when running epoch in PTB tutorial on Tensorflow

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Python

How do I determine if my python shell is executing in 32bit or 64bit mode on OS X?

When is the perfect time to use Tornado in python?

Why am I getting a error when I am printing two different data types in python?

How to iterate over a string when there is successive increase in its length?

how do i change string to a list?

how can i randomly select items from a list?

how can i count the items in a list?

how do i use the enumerate function inside a list?

Why feed_dict is constructed when running epoch in PTB tutorial on Tensorflow?

Why there is no do while loop in python

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES