To train a stylegan2 model you would need at least tens of thousands of images to get a good result. With transfer learning, the amount is much less. However, for food which is a general object, theres no model that I know of. There are various ways I could go about doing this, separating food images into types of food and training on those. These will result on different models for different types of food. Or I could combine all of them and train and hopefuly get a model that can generate all types of food.


The dataset is scraped using reddit. I tried instagram but was stopped quickly since i didnt use instagram api (i dont think they ahve an api to get images tho/ just comments and stuff right? idk). Reddit thoguh, I had access to their api so I could scrape food subreddits fine. The reddit api didnt allow me to iterate through all the posts from years back, so I decided to just scrape the current posts everyday. This led to a long time until I finally got enough photos.


After getting all the photos, I needed to make sure each of the photos are valid. The photos must have a food and be relatively high quality. Preferably only one type of food can be segmented. So I would have to detect food in photos (yolov5) and then segment each food into squares, making sure they are relatively high quality. Then maybe upsampling it to get better resolutions. I have not yet tried yolov5 on a general thing such as food, but it works well with burgers and ramen im assuming and stuff.


To detect food i used yolov5, and manually labeled a few thousand of images for it to train on.


Thje result is __ photos