0%

Jetson PyTorch Bypass Distributed Errors

Posted on 2023-09-26 Edited on 2024-03-30 In config
Symbols count in article: 1.4k Reading time ≈ 2 mins.

Synopsis

To whom needs Running PyTorch HuggingFace based model inference on NVIDIA Jetson.

Problem

When running PyTorch HuggingFace based model inference on NVIDIA Jetson, the following error may occur:

  File "BERT.py", line 60, in <module>
    model = AutoModelForCausalLM.from_pretrained("bert-base-uncased").to(device)
  File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2359, in from_pretrained
    if is_fsdp_enabled():
  File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 118, in is_fsdp_enabled
    return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1
AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

Solution

The reason why this bug occurs is that the PyTorch wheel compiled for Jetson is not compiled with distributed support. One quick solution is to bypass the distributed module.

Change file "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py" line 118 from

1	return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1

to

1	return False

Welcome to my other publishing channels