For detecting deformable linear objects (DLO), such as cables, CNN-based methods are insufficient for industrial use, such as for bin-picking tasks. In this paper, based on testing various baseline models from state of the art (s.o.t.a), the reasons of failures while detecting cables are found and depending on those, we’ve suggested a 2 stage method, which is a combination of vision Transformer and HTC (Fig. 1) as an example, to reach the best performance in case of detecting cable with s.o.t.a.
Without bells and whistles, the proposed method obtains 21.5% AP and 4.7% segm AP gains compared to s.o.t.a methods.
Based on the classes' scales of the target, local spikes on the feature maps are observed.
Based on the classes' quantity, special loss function for imbalance dataset is used for testing the assumption (whether the “imbalance” affect the accuracy)
Comparisons between the Swin Transformer and ResNet are used for proving the negative effect of crossing side to side objects.
Implemented anchor-based or anchor-free Region Proposal Network (RPN), observed whether the regression distance affects the accuracy.
Implemented oriented proposal based (i.a. rotatable box) RPN, observed whether the one-to-one relation between box and segmentation affect the accuracy.