I have finished the first working version of a system to autonomously gather data. This system is built on two policies that each resets the other's environment allowing for a closed loop that gathers data without intervention. A separate YOLO vision model verifies the success or failure of each attempted episode to ensure all data collected is valid. I am now gathering tons of data from this system and experimenting on training policies partially and fully on this autonomously gathered data. I also plan on extending this system to other tasks as currently the task (taking an object in and out of a box) is not a useful or difficult task, and was chosen as a first test for ease of implementation.