These two weeks, I took the object detector model and began finding ways to improve it. I probably trained a hundred different models this week, on various learning rate schedules, number of frozen layers in the base network, and batch sizes. The object detection benchmark I used is validation/test mAP. Within the first week, I improved this from last week’s ~0.1 baseline to ~0.5.
I wrote my own scripts to automate training the model on various hyperparameters, split data in stratified way, store training summaries, and make pretty matplotlib plots. I learned a few tips about multiprocessing, CPU/GPU, and how my model utilizes those resources so I can run multiple models in parallel, depending on the resource bottleneck.
My main task the second week was retrieving, cleaning and processing a new dataset of about ~6000 images on over 400+ kinds of logos. The creators of the dataset only gave me a list of url’s as annotated by MTurk. To get an idea of how much dirty work can be in machine learning, the dumb MTurk job organizer didn’t specify instructions well and the resulting labels came in many aććentś and languages (reflecting the diverse background of MTurk annotators), so (fortunately not me) the dataset creators had to write a 400+ block of string parsing if-statements to sort the annotations to the right classes. The painful part for me was modifying the existing pipeline to process the format the new dataset came in.
The only moments of fun, intellectual work I do is analyzing the training summary/plot (which usually looks like chronological-inverted bitcoin price history graph) to think about what changes to make for the next training session.
I used to think (perhaps due to public belief) software engineering as a sort of Bar Mitzvah you must pass to earn the holy reins of machine learning. In reality, I’m starting to think of the two jobs as two tips of a fork, the common stem being good engineering practices. One uses abstraction, the other intuition. In MBTI analogy, good software engineering requires Ne (abstraction) and Ti (decomposition). Building high-performing machine learning requires Ni (intuition) and Te (execution). I prefer the latter, so this internship is affirming my decision to pursue this field.