Problem on NVIDIA Jetson Development Kit

Refer enviroment variables:

os.environ['TF_CPP_MIN_VLOG_LEVEL'] = '1' # Low level of Tensorflow, if need locating bugs, raise it up to 10.
os.environ['CUDA_CACHE_MAXSIZE'] = "2147483648" # Enable CUDA_CACHE to avoid multiple long-last JIT compiling.
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' # Avoid allocating all memory rapidly in one process.
os.environ['TF_FORCE_UNIFIED_MEMORY'] = '1' # Use unified memory to reduce data transfer time.
os.environ['TF_ENABLE_GPU_GARBAGE_COLLECTION'] = '0' # Boost performance by disable GPU_GARBAGE_COLLCETION, enable only when meeting OOM.

pip3 install h5py very slow/compilation failed: use root account to install pip dependencies.
local_cuda_not_found: switch to verified known version in the above.
c++ compiling error+cannot write file: need at least 32GB extra disk storage，NVIDIA Jetson internal storage is insufficient to a large amount of intermediate file while compiling tensorflow.
c++ compiling error+process xx killed: OOM error, for NVIDIA Jetson TX2/Nano, set 8GB swap first.
cannot import name ‘function_pb2’: switch current path, don’t try to run import tensorflow under Tensorflow source code path.
can compile and pip installation, but cannot pass test, stuck when executing testing files: since Jetpack default CUDA/CUDNN versions may be incompatible to tensorflow official guidance version. Possible solution: (1) use known tested version above (2) from Jetpack downloading corresponding version of CUDA/CUDNN then compiling tensorflow (3) goto NVIDIA forumto ask official help.
C++ compilation of rule ‘//tensorflow/python:bfloat16_lib’ failed (Exit 1): For tensorflow<=2.2, need to downgrade numpy version
1
2
pip install 'numpy<1.19.0'
# conda install 'numpy<1.19.0'
Runtime error: “CUDA driver version is insufficient for CUDA runtime version”. cuda10.2+cudnn7.0 are incompatible, re-create soft link to cuda9.0+cudnn7.0 and compile again.
Executing long time python tensorflow script, may occur CUDA_UNKNOWN_ERROR: Maybe Tensorflow internal bug or memory problem. Possible solution: reboot the board; pip uninstall tensorflow; pip install tensorflow-xxx.whl
Performance bug of tensorflow: need extremely long time to initialize GPU on TX2 (e.g., on TX2 initialize ResNet50 training requires over 20min): set environment variables export CUDA_CACHE_MAXSIZ="2147483648" and run the tensorflow code twice.

When using unified memory, display ‘NvMapReserveOp 0x80000001 failed [22]’: limit Tensorflow allocating memory.

config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2 # or other small values less than 1.0
config.gpu_options.experimental.use_unified_memory= True
with tf.compat.v1.Session(config=config) as s:
    your_program

Performance bug: W tensorflow/core/common_runtime/bfc_allocator.cc:311] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.: define enviroment variables
1
os.environ['TF_ENABLE_GPU_GARBAGE_COLLECTION'] = '0'

Zexin Li

Tensorflow issues & solution

Problem on NVIDIA Jetson Development Kit