Please keep honest, open, patient, happy and visionary.
0%
Jetson PyTorch Bypass Distributed Errors
Posted onEdited onInconfig Symbols count in article: 1.4kReading time ≈2 mins.
Synopsis
To whom needs Running PyTorch HuggingFace based model inference on NVIDIA Jetson.
Problem
When running PyTorch HuggingFace based model inference on NVIDIA Jetson, the following error may occur:
1 2 3 4 5 6 7 8 9
File "BERT.py", line 60, in <module> model = AutoModelForCausalLM.from_pretrained("bert-base-uncased").to(device) File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2359, in from_pretrained if is_fsdp_enabled(): File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 118, in is_fsdp_enabled return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1 AttributeError: module 'torch.distributed' has no attribute 'is_initialized'
Solution
The reason why this bug occurs is that the PyTorch wheel compiled for Jetson is not compiled with distributed support. One quick solution is to bypass the distributed module.
Change file "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py" line 118 from
1
return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1