Update README.md with VTune

5f22779c · Bethany Lusch · e3945cd6 · 5f22779c
Commit 5f22779c authored 10 months ago by Bethany Lusch
--- a/README.md
+++ b/README.md
@@ -136,3 +136,26 @@ with torch.profiler.profile(
    prof.step()
 ```
 For convenience, you can try these edits by commenting and uncommenting lines in `ipex_example.py` and `run_ipex_example.sh`. It runs in a few minutes.
+
+## VTune
+[VTune](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html#gs.9ocuuy) is a profiler by Intel. There is a tutorial specifically on using VTune to profile PyTorch [here](https://pytorch.org/tutorials/recipes/profile_with_itt.html). VTune provides a Instrumentation and Tracing Technology (ITT) API, which is integrated into PyTorch. We can use it to label parts of the code. Following the tutorial, we can add ITT labeling this way: 
+```
+max_batches = 5
+with torch.autograd.profiler.emit_itt():
+    for batch_idx, (data, target) in enumerate(train_loader):
+        with torch.profiler.itt.range(f'iteration_{batch_idx}'):
+            train_step(data, target, batch_idx)
+        if batch_idx >= max_batches:
+            break
+```
+You might want to run for a limited number of steps, since VTune logs are extensive. For this example, if we only profile 5 batches, the finalization will finish within a 10-minute job. 
+
+The PyTorch tutorial explains launching the application with the VTune GUI, but for Sunspot, it would likely be easier to use VTune programmatically. The tutorial mentions that "To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script." We can handle this by having most steps in `run_ipex_example.sh.` The script `submit_ipex_example.sh` can handle PBS and launching `run_ipex_example.sh`. Here we need a sepcial environment variable and then can use VTune on `run_ipex_example.sh`: 
+```
+export AMPLXE_EXPERIMENTAL=gpu-multi-tile-metrics
+ZE_AFFINITY_MASK=0.0 vtune -collect gpu-hotspots -result-dir=./vtune_log/ -- bash run_ipex_example.sh
+```
+
+After the job ends, we want to view the results in VTune. See the section "After collecting the performance data, VTune profiler web server can be used for the post-processing" in the [Aurora documentation for VTune](https://docs.alcf.anl.gov/aurora/performance-tools/vtune/#after-collecting-the-performance-data-vtune-profiler-web-server-can-be-used-for-the-post-processing). 
+
+For convenience, you can try these edits by commenting and uncommenting lines in `submit_ipex_example.sh` and `ipex_example.py`.