Hey, Thank you so much for your kind words. Yes, your understanding of the concept is correct.
You can try to use transfer learning models like VGG-16 or Resnet50. However, you need to make sure you reshape the sizes since the dimensions of the images we have are of shapes (48,48,1). If you want to use them you will have to stack the layers or convert them into RGB images because the transfer learning models are trained on imagenet dataset which is of shape (224,224,3). Unfortunately, I could not achieve very good results with these transfer learning models on the emotions dataset. You can try to experiment with them though. In the 2nd part of this series, I have trained the gestures model using VGG-16 and achieved a very good accuracy.
I ran the code using a GTX-1660 and I think it took me around 30 minutes.
Thank you.