ezpz.utils¶
Utility functions for the ezpz package.
- See ezpz/
utils
Overview¶
The ezpz.utils module provides various utility functions for common tasks such as:
- Debugging and breakpoint management
- Timestamp generation and formatting
- String normalization and formatting
- Tensor/array conversion utilities
- Memory monitoring
- Model summary generation
- Data serialization and deserialization
Key Functions¶
ezpz/utils/init.py
DistributedPdb
¶
Bases: Pdb
Supports using PDB from inside a multiprocessing child process.
Usage: DistributedPdb().set_trace()
Source code in src/ezpz/utils/__init__.py
breakpoint(rank=0)
¶
Set a breakpoint, but only on a single rank. All other ranks will wait for you to be done with the breakpoint before continuing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rank
|
int
|
Which rank to break on. Default: |
0
|
Source code in src/ezpz/utils/__init__.py
format_pair(k, v, precision=6)
¶
Format a key-value pair as a string.
Formats a key-value pair where the value can be an integer, boolean, or float. Integers and booleans are formatted without decimal places, while floats are formatted with the specified precision.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
str
|
The key/name of the parameter. |
required |
v
|
ScalarLike
|
The value to format (int, bool, float, or numpy scalar). |
required |
precision
|
int
|
Number of decimal places for float values. Defaults to 6. |
6
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Formatted key-value pair string in the format "key=value". |
Example
format_pair("lr", 0.001) 'lr=0.001000' format_pair("epochs", 10) 'epochs=10' format_pair("verbose", True) 'verbose=True'
Source code in src/ezpz/utils/__init__.py
get_bf16_config_json(enabled=True)
¶
Get the deepspeed bf16 config json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to use bf16. Default: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Deepspeed bf16 config. |
Source code in src/ezpz/utils/__init__.py
get_deepspeed_adamw_optimizer_config_json(auto_config=True)
¶
Get the deepspeed adamw optimizer config json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
auto_config
|
bool
|
Whether to use the auto config. Default: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Deepspeed adamw optimizer config. |
Source code in src/ezpz/utils/__init__.py
get_deepspeed_config_json(auto_config=True, gradient_accumulation_steps=1, gradient_clipping='auto', steps_per_print=10, train_batch_size='auto', train_micro_batch_size_per_gpu='auto', wall_clock_breakdown=False, wandb=True, bf16=True, fp16=None, flops_profiler=None, optimizer=None, scheduler=None, zero_optimization=None, stage=0, allgather_partitions=None, allgather_bucket_size=int(500000000.0), overlap_comm=None, reduce_scatter=True, reduce_bucket_size=int(500000000.0), contiguous_gradients=None, offload_param=None, offload_optimizer=None, stage3_max_live_parameters=int(1000000000.0), stage3_max_reuse_distance=int(1000000000.0), stage3_prefetch_bucket_size=int(500000000.0), stage3_param_persistence_threshold=int(1000000.0), sub_group_size=None, elastic_checkpoint=None, stage3_gather_16bit_weights_on_model_save=None, ignore_unused_parameters=None, round_robin_gradients=None, zero_hpz_partition_size=None, zero_quantized_weights=None, zero_quantized_gradients=None, log_trace_cache_warnings=None, save_config=True, output_file=None, output_dir=None)
¶
Write a deepspeed config to the output directory.
Source code in src/ezpz/utils/__init__.py
| |
get_deepspeed_warmup_decay_scheduler_config_json(auto_config=True)
¶
Get the deepspeed warmup decay scheduler config json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
auto_config
|
bool
|
Whether to use the auto config. Default: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Deepspeed warmup decay scheduler config. |
Source code in src/ezpz/utils/__init__.py
get_deepspeed_zero_config_json(zero_config)
¶
get_flops_profiler_config_json(enabled=True, profile_step=1, module_depth=-1, top_modules=1, detailed=True)
¶
Get the deepspeed flops profiler config json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to use the flops profiler. Default: |
True
|
profile_step
|
int
|
The step to profile. Default: |
1
|
module_depth
|
int
|
The depth of the module. Default: |
-1
|
top_modules
|
int
|
The number of top modules to show. Default: |
1
|
detailed
|
bool
|
Whether to show detailed profiling. Default: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Deepspeed flops profiler config. |
Source code in src/ezpz/utils/__init__.py
get_fp16_config_json(enabled=True)
¶
Get the deepspeed fp16 config json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to use fp16. Default: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Deepspeed fp16 config. |
Source code in src/ezpz/utils/__init__.py
get_max_memory_allocated(device)
¶
Get the maximum memory allocated on the specified device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
device
|
The device to check memory allocation for. |
required |
Source code in src/ezpz/utils/__init__.py
get_timestamp(fstr=None)
¶
Get formatted timestamp.
Returns the current date and time as a formatted string. By default, returns a timestamp in the format 'YYYY-MM-DD-HHMMSS'. A custom format string can be provided to change the output format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fstr
|
str
|
Format string for strftime. If None, uses default format '%Y-%m-%d-%H%M%S'. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Formatted timestamp string. |
Example
get_timestamp() # Returns something like '2023-12-01-143022' get_timestamp("%Y-%m-%d") # Returns something like '2023-12-01'
Source code in src/ezpz/utils/__init__.py
grab_tensor(x, force=False)
¶
Convert various tensor/array-like objects to numpy arrays.
This function converts different types of array-like objects (tensors, lists, etc.) to numpy arrays for consistent handling. Supports PyTorch tensors, numpy arrays, and nested lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Any
|
The object to convert to a numpy array. Can be None, scalar values, lists, numpy arrays, or PyTorch tensors. |
required |
force
|
bool
|
Force conversion even if it requires copying data. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
Union[ndarray, ScalarLike, None]
|
Union[np.ndarray, ScalarLike, None]: Numpy array representation of the input, or the original scalar value, or None if input was None. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If unable to convert a list to array. |
Example
import torch import numpy as np grab_tensor([1, 2, 3]) array([1, 2, 3]) grab_tensor(torch.tensor([1, 2, 3])) array([1, 2, 3]) grab_tensor(np.array([1, 2, 3])) array([1, 2, 3])
Source code in src/ezpz/utils/__init__.py
model_summary(model, verbose=False, depth=1, input_size=None)
¶
Print a summary of the model using torchinfo.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
The model to summarize. |
required | |
verbose
|
bool
|
Whether to print the summary. Default: |
False
|
depth
|
int
|
The depth of the summary. Default: |
1
|
input_size
|
Optional[Sequence[int]]
|
The input size for the model. Default: |
None
|
Returns:
| Type | Description |
|---|---|
ModelStatistics | None
|
ModelStatistics | None: The model summary if torchinfo is available, otherwise None. |
Source code in src/ezpz/utils/__init__.py
normalize(name)
¶
Normalize a name by replacing special characters with dashes and converting to lowercase.
This function replaces hyphens, underscores, and periods with single dashes, then converts the result to lowercase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name to normalize. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The normalized name with only lowercase letters, numbers, and dashes. |
Example
normalize("Test_Name.Sub-Name") 'test-name-sub-name' normalize("example__file..name") 'example-file-name'
Source code in src/ezpz/utils/__init__.py
summarize_dict(d, precision=6)
¶
Summarize a dictionary into a string with formatted key-value pairs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict
|
The dictionary to summarize. |
required |
precision
|
int
|
The precision for floating point values. Default: |
6
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A string representation of the dictionary with formatted key-value pairs. |
Source code in src/ezpz/utils/__init__.py
write_deepspeed_zero12_auto_config(zero_stage=1, output_dir=None)
¶
Write a deepspeed zero1 auto config to the output directory.
Source code in src/ezpz/utils/__init__.py
write_deepspeed_zero3_auto_config(zero_stage=3, output_dir=None)
¶
Write a deepspeed zero1 auto config to the output directory.
Source code in src/ezpz/utils/__init__.py
write_generic_deepspeed_config(gradient_accumulation_steps=1, gradient_clipping='auto', steps_per_print=10, train_batch_size='auto', train_micro_batch_size_per_gpu='auto', wall_clock_breakdown=False, wandb=None, bf16=None, fp16=None, flops_profiler=None, optimizer=None, scheduler=None, zero_optimization=None)
¶
Write a generic deepspeed config to the output directory.