detect-system-externals: add threading
Uses concurrent.futures interface. Add --thread (-t) option for threading and -np option to select number of threads.
By default, uses threading with 4 threads.
multiprocessing is not implemented due to difficulty with managing shared memory with nested dicts.
np=16 seems to work best out of 4, 8, 16, 32 threads.