Skip to content
Snippets Groups Projects
Commit 5f22779c authored by Bethany Lusch's avatar Bethany Lusch
Browse files

Update README.md with VTune

parent e3945cd6
No related branches found
No related tags found
No related merge requests found
......@@ -136,3 +136,26 @@ with torch.profiler.profile(
prof.step()
```
For convenience, you can try these edits by commenting and uncommenting lines in `ipex_example.py` and `run_ipex_example.sh`. It runs in a few minutes.
## VTune
[VTune](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html#gs.9ocuuy) is a profiler by Intel. There is a tutorial specifically on using VTune to profile PyTorch [here](https://pytorch.org/tutorials/recipes/profile_with_itt.html). VTune provides a Instrumentation and Tracing Technology (ITT) API, which is integrated into PyTorch. We can use it to label parts of the code. Following the tutorial, we can add ITT labeling this way:
```
max_batches = 5
with torch.autograd.profiler.emit_itt():
for batch_idx, (data, target) in enumerate(train_loader):
with torch.profiler.itt.range(f'iteration_{batch_idx}'):
train_step(data, target, batch_idx)
if batch_idx >= max_batches:
break
```
You might want to run for a limited number of steps, since VTune logs are extensive. For this example, if we only profile 5 batches, the finalization will finish within a 10-minute job.
The PyTorch tutorial explains launching the application with the VTune GUI, but for Sunspot, it would likely be easier to use VTune programmatically. The tutorial mentions that "To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script." We can handle this by having most steps in `run_ipex_example.sh.` The script `submit_ipex_example.sh` can handle PBS and launching `run_ipex_example.sh`. Here we need a sepcial environment variable and then can use VTune on `run_ipex_example.sh`:
```
export AMPLXE_EXPERIMENTAL=gpu-multi-tile-metrics
ZE_AFFINITY_MASK=0.0 vtune -collect gpu-hotspots -result-dir=./vtune_log/ -- bash run_ipex_example.sh
```
After the job ends, we want to view the results in VTune. See the section "After collecting the performance data, VTune profiler web server can be used for the post-processing" in the [Aurora documentation for VTune](https://docs.alcf.anl.gov/aurora/performance-tools/vtune/#after-collecting-the-performance-data-vtune-profiler-web-server-can-be-used-for-the-post-processing).
For convenience, you can try these edits by commenting and uncommenting lines in `submit_ipex_example.sh` and `ipex_example.py`.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment