Update README.md with legacy PyTorch profiler

63d893e6 · Bethany Lusch · 2dce3ed9 · 63d893e6
Commit 63d893e6 authored 10 months ago by Bethany Lusch
--- a/README.md
+++ b/README.md
@@ -62,3 +62,39 @@ and then inside the loop over batches:


 For convenience, you can try these edits by commenting and uncommenting lines in `ipex_example.py` and `run_ipex_example.sh`. Both approaches only run for about two minutes. 
+
+## PyTorch Profiler 
+Measures time and memory consumption of model operators. (Note that this is only for PyTorch functions.)\
+Tutorial: https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html
+
+### PyTorch Legacy Profiler
+
+One option is to use PyTorch's legacy profiler API, which is under `torch.autograd`. See https://pytorch.org/docs/stable/autograd.html#profiler. 
+For example:
+- You could add this around `train_model()` in `ipex_example.py`:
+```
+with torch.autograd.profiler_legacy.profile(
+    use_xpu=True, profile_memory=True, record_shapes=True, with_stack=True,
+    ) as prof:
+    ...
+```
+- with_stack=True includes the file and line number of operations in the trace, so it adds overhead but could be helpful.  
+- record_shapes=True might skew the timings, but it collects information about input dimensions 
+- You could then print a sorted table of timings. (This version of `ipex_example.py` ran for about 6 minutes.) If you later want to explore other ways to sort the table, you could copy it into something like Excel. 
+```
+print(prof.key_averages().table(sort_by="self_cpu_time_total"))
+```
+- You could also choose to save a Chrome trace. You can view the Chrome trace in the Chrome browser by going to chrome://tracing and loading the trace file. 
+```
+prof.export_chrome_trace("trace_file.json")
+```
+- In the case of this example, the Chrome trace is 360 MB, which is fine. However, the trace files saved this way are often too large to load. You'll either get an error or it will quietly fail without a trace appearing. (Perhaps one solution would be splitting the JSON file into smaller segments before loading it.) 
+- You could also add the PyTorch profiler elsewhere in the code in order to save smaller traces. For example, the profiler could be around just one training step (`train_step(data, target, batch_idx)`). If, for your application, the trace is still too big to load in Chrome, you might want to wrap one forward pass (`output = model(data)`) and one backward pass (`loss.backward()`) separately so that two different trace files get saved.  
+
+For convenience, you can try these edits by commenting and uncommenting lines in ipex_example.py. Whether I printed the table or saved the Chrome trace, the whole job took about six minutes.
+
+Notes:
+- If I used the legacy PyTorch profiler on Sunspot to create a Chrome trace and loaded it in Chrome, I got a JSON parsing error. I fixed it using this: 
+`sed 's/{{}}}}/{}}/g' trace_file.json > trace_file_repaired.json`
+- You may want to use `torch.profiler.record_function` so that regions of the code will be conveniently annotated. 
+- Important: The printed table includes columns about XPU time. However, the Chrome trace does not contain information about what happens on the XPU.