Changelog¶
All notable changes to this project are documented here. Format follows Keep a Changelog.
[0.25.0] - 2026-05-15¶
Added¶
--summaryflag — prints a compact executive summary after any audit run: member counts, UC grant totals by level, and key risk indicators (redundancy, stale grants, escalations, workspace-local groups). For--output json/csv/htmlthe summary is written to stderr so machine-readable stdout is not corrupted.- Better error messages — HTTP errors (401/403/404/429) and network failures now produce
plain-English messages instead of stack traces. 401 hints at credential misconfiguration;
403 mentions
--auto-elevate; 404 includes the failing URL;ConnectionErroradvises checking network reachability. - Bandit + pip-audit security CI —
.github/workflows/security.ymlruns static analysis and dependency CVE scanning on every push/PR.
Changed¶
requestslower bound raised to>=2.33.0on Python 3.10+ (clears CVE-2026-25645); Python 3.9 retains>=2.28for compatibility.banditandpip-auditadded to[dev]extras.
Tests¶
- 7 new tests in
tests/test_cli.py:--summaryoutput for group and principal audit,--summarygoes to stderr for JSON output, and_handle_fatalerror message formatting for HTTP 401/403/404 andConnectionError. - Total: 599 tests.
[0.23.0] - 2026-05-14¶
Added¶
--scan-volumesflag — scan Unity Catalog volume-level grants in group audit, principal audit, and snapshots. Volumes are a GA UC securable type and were previously invisible to the tool.VolumeGrantmodel — new dataclass for volume grants; purple.t-voltag in HTML output; appears in grants table (not in Mermaid chart, same as tables).VolumePermissionScannerclass — exported from the public API.system.access.auditschema introspection (--stale-days) —StaleGrantCheckernow probes the audit table schema at runtime before running the activity query. Modern accounts use theuser_identitystruct; legacy accounts use flatuser_name/service_principal_namecolumns. The probe result is cached per checker instance. If the schema is unrecognised, aRuntimeErrorwith a GitHub issues link is raised rather than silently producing wrong results.- Weekly SDK compatibility CI —
.github/workflows/sdk-compat.ymlruns the test suite on Mondays against both the latestdatabricks-sdkand the minimum pinned version (0.20.0).
Changed¶
databricks-sdkdependency upper-bounded to<2.0to prevent unvetted major-version breakage. The raw HTTP fallback (--no-sdk) remains available if SDK issues arise.
Tests¶
tests/test_stale_checker.py: added_probe_response()fixture helper; 10 new introspection tests covering both schema variants, caching, unknown schema error, and end-to-end query routing.tests/test_volume_scanner.py: 11 new tests (new file).- Total: 590 tests.
[0.22.1] - 2026-05-10¶
Fixed¶
- Documentation corrections: test count in CLAUDE.md (477 → 570), depth toggle description in
capabilities.mdanddocs/use-cases/access-map.md.
[0.22.0] - 2026-05-10¶
Added¶
- UC hierarchy in principal audit HTML chart — catalog nodes connect to schema nodes via dashed structural edges. Groups with
ALL_PRIVILEGESon a catalog no longer draw redundant schema-level arrows; hierarchy edges imply them. Collapse is conservative: only suppressed when the same group holdsALL_PRIVILEGESon the direct parent, so no information is lost. - Schema hierarchy in group audit HTML chart — group audit Mermaid chart now includes schema nodes when
--scan-schemasis used, using the same conservative collapse logic. - Depth toggle on HTML charts — both principal and group HTML output now have a "Schema view" / "Catalog view" toggle button. Catalog view (default) always stays readable; schema view renders lazily on first click via
mermaid.run()so the catalog diagram is never blocked by hidden-element rendering issues. - Azure AD B2B guest UPN identity resolution —
--principalnow correctly resolves B2B guest identities. When a principal has a home-tenant email (e.g.alice@gmail.com) and a guest UPN (e.g.alice_gmail.com#EXT#@tenant.onmicrosoft.com), the tool now chains: workspace SCIMexternalIdfilter → account SCIMuserNamefilter to discover all alternate SCIM IDs. BFS is run from each ID and merged, so UC grants and group memberships stored against either identity are surfaced. - REVOKE SQL embedded in group audit HTML —
--revoke-script --output htmlnow embeds the generated SQL in a dark<pre>block inside the HTML report instead of appending raw text after</html>. - Built-in group styling —
account usersandadminsare styled separately in the principal audit chart (grey dashed nodes) with a "built-in" badge in the group memberships table.
Changed¶
- Group audit UC grants table —
Member Directrows removed from the HTML grants table. Member personal grants are already fully covered by the redundancy section; mixing them into the group grants table conflated two different questions and exploded at scale.Member Directgrants continue to appear in--output csv,--output text, and--output json. - HTML table overflow — long securable names no longer break table layout (
word-break: break-word,overflow-wrap: anywhere,max-width: 340pxontd; tables wrapped inoverflow-x: auto). - Principal audit HTML progress messages (
Auditing principal: ...) routed to stderr for non-text output formats so they never appear in redirected HTML files.
Fixed¶
--revoke-script --output htmlno longer prints raw SQL to stdout after the HTML closing tag.
Tests¶
- 570 tests (unchanged count — all existing tests pass against new renderer behaviour).
[0.21.0] - 2026-05-09¶
Fixed¶
- Removed real SP application ID that had slipped into
docs/use-cases/principal-audit.md; replaced with a generic placeholder UUID.
[0.20.0] - 2026-05-09¶
Added¶
- UC-only group ancestor annotation — when a group appears in the UC-only section of a principal audit (UC grants but no direct workspace assignment), the output now shows which parent group actually provides the workspace access, e.g.
test-audit-data-eng [members get workspace access via test-audit-org]. Groups with genuinely no workspace path continue to show without annotation. Applies to both--treeand--output html.
Changed¶
- Automated PyPI publishing via GitHub Actions (TestPyPI on
develop, prod PyPI onmain). - Removed prototype Jupyter notebook (superseded by the CLI and Python API).
[0.19.0] - 2026-05-07¶
Added¶
- Resource audit (
--resource) — new audit mode that inverts the principal/group perspective: given a Unity Catalog resource (catalog, schema, or table) or a workspace, discover every identity that has access to it. Auto-detects resource type from the name format: 0 dots = catalog, 1 dot = schema, 2+ dots = table,https://or "databricks" in the name = workspace. --no-expand-groups— for--resourcemode, show only the direct grants on the resource without expanding group members to individual users and service principals. Default is to expand groups.ResourceAuditor(resource_auditor.py) — parallel workspace scanner, SCIM-based principal classification with cache, group membership expansion viaGroupMembershipResolver, deduplication by(principal_name, via_group, frozenset(privileges)).ResourceGrant/ResourceAuditResultmodels inmodels.py._resource_html_renderer.py— self-contained HTML page with teal gradient header, stat cards, Mermaid LR flowchart (resource → direct principals, group nodes → member nodes with dashed edges), direct grants table, and via-group grants table.write_resource_audit_csv()incsv_output.py— CSV with 8 columns:resource_type,resource_name,principal_name,principal_type,principal_source,privileges,via_group,workspace_name.detect_resource_type()module-level utility function exported fromresource_auditor.py.
Tests¶
- 567 tests (up from 527 before this release cycle): 37 new tests covering
detect_resource_type,_classify_principal(email / group / SP / default / cache),_scan_uc_resource(catalog, 404 silence, group expansion, no-expand),_scan_workspace_resource(basic, expand),audit()catalog/workspace modes (result type, dedup, not-found error), model field checks, HTML renderer (resource name, Mermaid, no-grants, HTML escaping, via-group section), CSV column header/data/via-group, and full CLI integration (text/csv/json/html output, workspace-not-found → exit 1, mutual-exclusion with--group).
[0.18.7] - 2026-05-07¶
Fixed¶
- Principal
--treeand--output html: direct workspace assignments (ADMINset explicitly on the principal, not via a group) were rendering as a fakevia (direct)group section instead of the "Direct" block. Root cause:principal_auditor.pysetsvia_group="(direct)"on directWorkspaceRoleobjects; the renderers bucketed byr.via_group or "__direct__"but"(direct)"is truthy so it never reached the__direct__sentinel. Fixed in both_tree_renderer.pyand_html_renderer.pyto treatvia_group == "(direct)"identically toNone. - Group
--tree: Unity Catalog rows were showing the group/principal name in the workspace column instead of the workspace name. Root cause:_print_ucused*_, wsto unpack the grant tuple, grabbingprincipal(last element) instead ofworkspace_name(second-to-last). Fixed with explicit unpacking.
[0.18.6] - 2026-05-07¶
Added¶
- Group audit
--tree— ASCII tree view for--groupmode, organised by grant source rather than securable type. Upstream (parent-group-inherited) grants are shown per parent group; direct grants the group holds itself form their own branch; member-direct personal grants appear in a compact summary with redundancy warnings. Workspace objects included when--scan-workspace-objectsis set. Redundancy callout line printed before the footer when full or partial overlaps are found.
Tests¶
- 527 tests (up from 525): 2 new tests for group audit
--treeoutput structure and member-count presence.
[0.18.5] - 2026-05-07¶
Added¶
- Group audit
--output html— self-contained HTML page for--groupmode. Green-themed header with IdP vs Databricks classification, member counts, and timestamp. Mermaid LR flowchart showing the group's access footprint: parent groups (dashed edges), workspaces, and UC catalogs. Summary stats grid highlights redundant grant count in amber when non-zero. Redundancy findings are surfaced in a prominent banner and dedicated table before the full grant list. Combined Unity Catalog grants table (catalog + schema + table) with grant-source tags. Progress messages routed to stderr. - Snapshot diff
--output html— self-contained HTML diff page for--baselinemode. Works for both group and principal audits. Slate-themed header with a baseline → current timeline. Summary cards show +/− counts for grants and members in green/red. Color-coded rows: green background for additions, red for removals. Renders a clean "No changes detected" state when there are no differences — suitable for committing to a repo as a compliance artifact.--output htmlnow supported in_print_diffwhich is shared by both audit modes.
Fixed¶
_login_run_group_auditnow routes progress messages to stderr for all non-textoutput modes (was only routing for json).
Tests¶
- 525 tests (up from 519): 6 new tests covering group audit HTML output structure, section headings, progress-to-stderr isolation, diff HTML no-changes state, diff HTML with additions and removals, and diff HTML principal-mode member label.
[0.18.4] - 2026-05-06¶
Added¶
--output html— self-contained HTML access map for principal audit. Embeds a Mermaid LR flowchart (principal → groups → workspaces + UC securables) with solid edges for direct group memberships and dashed edges for transitive ones. Includes a summary stats grid and four data tables (group memberships, workspace access, UC permissions, workspace objects). No server required — one file, renders in any browser. Progress messages are routed to stderr so the HTML on stdout is clean.--tree— ASCII tree view for principal audit, reorganising output by granting entity rather than securable type. Each section shows "via" with the workspace roles and UC grants beneath it; direct grants and workspace objects have their own nodes; escalation findings appear when --escalation-checkis set.- Visualizing Access use-case page —
docs/use-cases/access-map.mdcovering when to use--treevs--output htmlvs CSV, how to compose with--scan-workspace-objectsand--escalation-check, and the "show this to a manager" scenario. - CLI reference updated —
docs/reference/cli.mdnow documents--output htmland--treewith examples.
Fixed¶
- Progress messages (
Auditing principal: …) were leaking onto stdout in--output htmlmode._login_run_principal_auditnow routes to stderr for all non-text output modes (json, csv, html).
Tests¶
- 519 tests (up from 513): 6 new tests in
tests/test_cli.pycovering--treeoutput structure,--output htmlcontent, HTML progress-to-stderr isolation, and--treewith--output json.
[0.18.3] - 2026-05-05¶
Added¶
--compare A B— pure-read membership diff between two principals. Shows which groups are unique to each principal and which are shared. Each group is annotated with source (external= IdP-managed,internal= Databricks-managed), directness (is_direct), and the full membership chain. Available intext,json, andcsvoutput formats.--clone-from SOURCE --to TARGET— provisioning report that classifies each of the source's direct group memberships into one of four actions:Databricks— the group is Databricks-managed and has a workspace assignment or UC grants; the tool can perform the SCIM PATCH when--applyis passed.IdP required— the group is synced from an external IdP (Entra / Okta); the target must be added in the identity provider — Databricks has no write access to IdP-managed group membership.Unverified— the group is Databricks-managed but has no detected workspace assignment; UC grants are not checked by default (pass--scan-ucto resolve these intoDatabricksorSkipped).Skipped— verified dead-end: no workspace assignment and no UC grants (requires--scan-uc).--apply— when passed alongside--clone-from / --to, executes the SCIM PATCH for everyDatabricks-classified group, adding the target to each group.--scan-uc— optional flag for--clone-from; scans Unity Catalog catalog grants in parallel to resolveUnverifiedgroups intoDatabricks(has grants) orSkipped(dead-end). Adds catalog-scan API calls per workspace, so it is off by default.PrincipalComparer— Python API class wrapping the compare logic. Takes two principal identifiers, BFS-walks group memberships for both, and returns aCompareResult.AccessCloner— Python API class withbuild_report()(dry-run analysis) andapply()(SCIM writes).apply()mutates theCloneReportin place, settingapplied=Trueorerror=...per action.- New models —
GroupComparison,CompareResult,CloneActionType,CloneAction,CloneReportinmodels.py. - CSV output functions —
write_compare_csv()andwrite_clone_report_csv()incsv_output.py. - Access Provisioning use-case page —
docs/use-cases/access-provisioning.mdcovering the "match one user's access to another" scenario with CLI and Python API examples, IdP vs Databricks group classification explanation, and--scan-ucguidance.
Tests¶
- 513 tests (up from 477): 12 new tests in
tests/test_principal_comparer.py, 10 new tests intests/test_access_cloner.py, 22 new tests intests/test_cli.py(compare and clone modes across all output formats,--applysuccess/error paths, missing--toguard, mutually-exclusive mode validation).
[0.18.2] - 2026-05-04¶
Added¶
uc_only_groupsin principal audit — groups with no workspace permission assignment but that still grant Unity Catalog access are now separated from true dead-end groups.PrincipalAuditResultgains a newuc_only_groups: List[str]field. Text output shows two labelled buckets: UC-only groups (intentional pattern — access via UC grants only) and Unused groups (no workspace or UC grants — safe to review for removal). JSON and CSV output include both fields.via_pathinheritance chain on workspace roles and UC permissions —WorkspaceRoleandEffectivePermissionnow carryvia_path: List[str](the full membership chain from the principal to the grant-holding group, e.g.["alice@company.com", "team-A", "data-engineers"]). Built from the BFS walk at zero extra API cost. Text output shows the chain in brackets; CSV adds avia_pathcolumn; JSON and snapshots include the field. Parallel paths (same securable reachable via multiple groups) each appear as separate entries with distinct chains.- Permission Hygiene and Stale Access use-case pages —
docs/use-cases/permission-hygiene.mdanddocs/use-cases/stale-access.mdadded to the docs site, covering redundancy analysis, REVOKE SQL generation,--stale-daysusage, SQL warehouse prerequisites, threshold tuning, and Python API examples.
Tests¶
- 3 CSV tests updated to account for the new
via_pathcolumn in workspace-roles and permissions headers. test_dead_end_groups_detectedrenamed totest_workspace_unassigned_groups_split_into_uc_only_and_dead_end; updated assertions verify groups with UC grants land inuc_only_groupsand groups with no grants land indead_end_groups.test_cli.pyassertsuc_only_groupskey present in principal JSON output.
[0.18.1] - 2026-05-05¶
Added¶
- MkDocs documentation site — full docs published to GitHub Pages at
https://lukaleet.github.io/databricks-access-audit. Sections: Getting Started, Capabilities, Use Cases (offboarding, access review, incident response, compliance snapshots), Reference (CLI flags, Python API, output formats), How It Works (architecture, grant classification, Azure B2B guests). - Capabilities page — each core feature (multi-workspace scanning, recursive group resolution, permission inheritance tracking, schema drill-down, redundancy analysis, resilient API calls) documented with example commands and sample output.
- GitHub Actions docs workflow (
.github/workflows/docs.yml) — automatically redeploys the site on every push tomainthat touchesdocs/ormkdocs.yml. [docs]optional dependency —pip install "databricks-access-audit[docs]"installsmkdocs-materialfor local doc builds.- README slimmed to ~100 lines — hook, two modes, capabilities list, quick-start examples, and links to the full docs site. Detailed reference content moved to GitHub Pages.
~/.databrickscfgprofile-based authentication — credentials are now resolved in priority order: CLI flags → environment variables →~/.databrickscfgprofile → defaultazurecloud. New--profile NAMEflag (env:DATABRICKS_CONFIG_PROFILE, default:DEFAULT) selects a named profile.DATABRICKS_CONFIG_FILEpoints to a non-default config file path.- Cloud auto-detection from profile host — when
--cloudis not explicitly passed, the cloud provider is inferred from thehostfield in the profile (accounts.azuredatabricks.net→azure,accounts.cloud.databricks.com→aws,accounts.gcp.databricks.com→gcp). No need to pass--cloudon every invocation when using a profile. - New module
config.py—load_profile()reads named sections from~/.databrickscfg(mergingDEFAULTfallbacks viaconfigparser);cloud_from_host()maps account host URLs to cloud identifiers. - Improved credential error message — when credentials are missing, the error now mentions
--profileand~/.databrickscfgas the resolution path. - Package renamed —
databricks-group-audit→databricks-access-audit; module renameddatabricks_group_audit→databricks_access_audit; CLI command renamed todatabricks-access-audit.
Tests¶
- 477 tests (up from 451): 14 new tests in
tests/test_config.pycoveringload_profileandcloud_from_host; 12 new integration tests intests/test_cli.pycovering_resolve_credentials.
[0.18.0] - 2026-04-30¶
Added¶
- 8 new workspace object types —
WorkspaceObjectScannernow covers 13 object types (up from 5): - SQL / Analytics:
sql_queries(/api/2.0/sql/queries),sql_alerts(/api/2.0/sql/alerts),lakeview_dashboards(/api/2.0/lakeview/dashboards),genie_spaces(/api/2.0/genie/spaces) - AI / ML:
mlflow_experiments(/api/2.0/mlflow/experiments/list),registered_models(/api/2.0/mlflow/registered-models/list),serving_endpoints(/api/2.0/serving-endpoints),apps(/api/2.0/apps) - All use the same
classify_grantpath and are available via--workspace-object-typesfiltering. - Agent Bricks coverage comes from the three AI/ML types (experiments, registered models, serving endpoints) that underpin the platform.
- Bare-array response handling in
_list_objects— some DBSQL endpoints return a raw JSON array instead of a wrapped dict;_list_objectsnow detectsisinstance(resp, list)and handles both shapes without error.
Fixed¶
- Azure AD B2B guest UPN mismatch (improved) —
_get_workspace_principal_aliases()now searches workspace SCIM byexternalId eq "{id}"instead of looking up the account-synced record by principal ID. Azure AD B2B guest users have two workspace SCIM records: the account-synced record (userName = account email) and the Azure AD guest record (userName = guest UPN, e.g.user_gmail.com#EXT#@tenant). The previous ID-lookup only returned the account email (already known), so the guest UPN was never discovered and workspace ACL entries stored under that UPN were silently missed. The externalId search returns both records; only userNames not already in known identities are returned as new aliases. - Workspace object scan misses implicit-group workspaces — when a principal's workspace access comes exclusively through a built-in group like "account users" (which doesn't appear in
permissionassignments),ws_roleswas empty and the workspace object scan loop never ran, producing 0 grants.audit()now also supplementsws_roleswith all discovered workspaces whenscan_workspace_objects=True, matching the behaviour of the group audit scanner.
Tests¶
- 451 tests (up from 427): 8 parametrized group-audit smoke tests, 8 parametrized principal-audit smoke tests, bare-array resilience test, pagination test for
mlflow_experiments, name-as-ID test forregistered_models, non-standard perm-prefix test forgenie_spaces, non-pagination test forserving_endpoints; 2 new tests for_get_workspace_principal_aliasesexternalId search (B2B guest discovery); 1 new test for workspace object scan fallback to all discovered workspaces whenws_rolesis empty.
[0.17.0] - 2026-04-28¶
Added¶
- Workspace object permission scanning — new
--scan-workspace-objectsflag scans workspace-level ACLs for jobs, clusters, SQL warehouses, pipelines, and cluster policies. Off by default (adds significant API calls per workspace). Use--workspace-object-types jobs,clustersto restrict to a subset. Works in both--groupand--principalmodes. WorkspaceObjectGrantmodel — mirrorsCatalogGrant/SchemaGrant/TableGrant; carriesobject_type,object_id,object_name,permission_level,grant_source(DIRECT/UPSTREAM/MEMBER_DIRECT),principal_type, andinherited_from.WorkspaceObjectScanner— newworkspace_object_scanner.py; fans out withThreadPoolExecutorper object type, reusesclassify_grantfrom_classification.py, handles pagination for jobs and pipelines, skips objects on ACL errors. Deduplicates workspace URLs before dispatch.- CLI output — group audit gets a new
Workspace Object Permissionssection in text output; principal audit gets the same. JSON output gainsworkspace_object_grants(group) andworkspace_object_permissions(principal) arrays. CSV output gains a third section after the redundancy table. All outputs include a note that remediation requires the Databricks permissions REST API, not SQL. - Snapshot / diff —
build_group_snapshotandbuild_principal_snapshotinclude workspace object grants;diff_snapshotsdiffs them by full-field fingerprint alongside UC grants. - SDK client routes —
DatabricksSDKClient.workspace_apinow handles all five object-list endpoints via SDK typed iterators (auto-pagination) and all/api/2.0/permissions/…paths via raw REST (ws.api_client.do) to avoid gRPC shim issues. PrincipalAuditor.audit()— two new parameters:scan_workspace_objects: bool = Falseandworkspace_object_types: Optional[List[str]] = None.
Fixed¶
- Infinite loop in
_list_objectspagination test —test_list_objects_paginationintest_workspace_object_scanner.pyusedif not calls[0]to branch between the first and subsequent page responses;calls[0]is always{}(the first call's empty params dict), so the mock always returnednext_page_token, sending_list_objectsinto an infinite loop that exhausted RAM and crashed the process. Fixed by checkingif not params(the current call's params) instead. -
Retry-backoff hang in
test_principal_source.py— the localmock_clientfixture used defaultmax_retries=5, base_delay=1.0; any URL not registered in theresponsesmock raisedrequests.exceptions.ConnectionError, which is aRequestExceptionand triggered five retries with 1+2+4+8+16 = 31 s of backoff per unmatched request. Fixed by addingmax_retries=0, base_delay=0to the fixture. -
Azure AD B2B guest UPN mismatch in workspace object scan (initial fix) — added
_get_workspace_principal_aliases()toPrincipalAuditor; superseded and extended in v0.18.0 with externalId-based search.
Tests¶
- 427 tests (up from 389): 31 new tests in
test_workspace_object_scanner.py; new coverage intest_sdk_client.py,test_cli.py,test_csv_output.py, andtest_snapshot.pyfor the workspace object scanning feature; 2 bug-fix tests (infinite-loop and retry hang); 5 new tests inTestGetWorkspacePrincipalAliasescovering alias extraction, identity match, SP skip, API failure, and case-insensitive match.
[0.16.0] - 2026-04-27¶
Added¶
- Parallel group membership map with session cache —
GroupMembershipResolver.get_group_membership_map()replaces the serial O(N) individual-GET loops incatalog_scannerandprincipal_auditor. The Databricks SCIM list endpoint never returns themembersfield, so individual GETs are unavoidable; they now fire concurrently viaThreadPoolExecutor(default 16 workers) and the result is cached on the resolver instance for the lifetime of the session. On a 300-group account with 8+ workers this reduces the membership-map build step by roughly an order of magnitude. PrincipalAuditoraccepts a sharedgroup_resolver— new optional constructor parametergroup_resolver: Optional[GroupMembershipResolver] = None. When passed, the auditor uses the provided instance (and its cache) instead of creating its own, eliminating duplicate O(N) fetches when group audit and principal audit run in the same session. Backwards compatible: omitting the parameter behaves identically to before.- Notebook resolver sharing —
pa_auditorin cell 4 is now instantiated with the sharedgroup_resolver, so running group audit followed by principal audit in the same notebook session reuses the cached membership map.
Tests¶
- 352 tests (up from 342): 10 new tests in
test_group_resolver.pycovering map correctness,child_to_parentsstructure, cache hit behaviour,_group_cachewarming,clear_caches()invalidation, empty-account edge case, failed-GET skipping, and threePrincipalAuditorintegration tests.
[0.15.1] - 2026-04-26¶
Fixed¶
- SCIM group membership resolution — the Databricks SCIM group list endpoint (
GET /scim/v2/Groups) never returns themembersfield regardless of client (SDK typed call, raw HTTP,attributes=membersparam); only individualGET /scim/v2/Groups/{id}includes members;get_groups_containing_targetincatalog_scannerandresolve_group_membershipsinprincipal_auditorboth now fetch the ID/name list first and then do one GET per group to build the child-to-parent adjacency map; this caused upstream group detection and group membership tracing to silently return empty results on all runs against real Databricks accounts - SDK client group listing —
DatabricksSDKClient.account_apifor/scim/v2/Groupsandscim_list_all("Groups")now route through raw HTTP (api_client.do) rather than the SDK'sgroups.list()iterator, which also omits members; test suite updated accordingly
Tests¶
- 342 tests: updated
test_sdk_client.pygroup-listing tests to assertapi_client.docall path and payload instead ofgroups.list
[0.15.0] - 2026-04-26¶
Fixed¶
- Account OIDC token URL — the raw HTTP client was calling
{account_host}/oidc/v1/token(the workspace path); corrected to{account_host}/oidc/accounts/{account_id}/v1/token(the account-scoped path required by Databricks); this causedinvalid_request400 errors on every run using the raw HTTP client - Workspace OIDC fallback handles 401 — the
invalid_clientfallback to the account-level token previously only caught HTTP 400; Databricks also returns HTTP 401 with the sameinvalid_clientbody in some workspace configurations; the guard now checksstatus_code in (400, 401)so the fallback fires in both cases - SDK client grant queries —
ws.grants.get(securable_type=SecurableType.CATALOG, …)routes through a gRPC shim that returnsSECURABLETYPE.CATALOG is not a valid securable typeon some workspace versions; replaced all three grant endpoints (catalog, schema, table) withws.api_client.do("GET", endpoint)to hit the REST path directly, matching the raw HTTP client and working on all workspace versions; this caused 0 grants returned for all catalogs when using the default SDK backend - Principal auditor UC grant matching —
find_principal()now returns a 5th value (uc_name): the SCIMuserName, which is what Unity Catalog stores grants against; previously onlydisplayNamewas matched, causing UC grants to be missed whendisplayName ≠ userName(most visibly for Azure AD guest users whose UC grants use their#ext#UPN);scan_permissionsaccepts aprincipal_aliasesset and includesuc_namein the relevant-principal check - Notebook elevation safety — the
ensure_workspace_adminloop in both group-audit and principal-audit cells was called after_elevator.__enter__()but outside a try/except; if an exception was raised mid-loop, temporary Workspace Admin grants on already-processed workspaces were never revoked; both cells now wrap the loop intry/exceptthat calls__exit__(*sys.exc_info())on failure, matching the CLI's cleanup guarantee - Notebook install cell — added
dbutils.library.restartPython()after%pip installso the newly installed package is picked up by the cluster driver without a manual restart - Notebook JSON format — cell sources were accidentally serialised as a single string instead of the per-line list-of-strings format required by the
.ipynbspec; corrected so the notebook opens correctly in Jupyter, VS Code, and Databricks - Workspace URL parsing for explicit
--workspace-urls—parse_workspace_urlsfailed to extract a numeric workspace ID from Azure (adb-<id>.<region>.azuredatabricks.net) and AWS URL formats; a regex now extracts the ID from the hostname so explicit workspace URLs work without requiring the Account API workspace list
Tests¶
- 342 tests (up from 332): new tests in
test_client.pyfor account/workspace OIDC URL construction and 401invalid_clientfallback; new tests intest_principal_auditor.pyforuc_namereturn value; updatedtest_sdk_client.pygrant tests to mockws.api_client.doinstead ofws.grants.get
[0.14.0] - 2026-04-26¶
Added¶
- Top-members ranking in group audit — after redundancy detection, members are ranked by personal (member-direct) catalog grant count; each entry includes the principal name, grant count, and redundancy level (
Full/Partial/None), giving admins an instant cleanup shortlist; available in--output text(top 5 printed in the summary block),--output json(top_membersarray), and the Databricks notebook (df_top_membersDataFrame)
Fixed¶
- Python 3.9 incompatible union type hint —
str | Nonein a function signature intests/test_cli.pyrequires Python 3.10+; addedfrom __future__ import annotationsto restore 3.9 compatibility (same fix applied previously totests/test_workspace.py) - Ruff lint violations —
E501(line too long) inclient.py:104wrapped;F401unused imports (time,pytest) andI001import block formatting intests/test_client.pycleaned up
Tests¶
- 332 tests (up from 331):
test_group_audit_json_top_members_rankedassertstop_membersis present in JSON output and contains the expected principals and fields
[0.13.0] - 2026-04-26¶
Fixed¶
- Schema / table grants had empty
workspace_name— the stubWorkspaceInfoobjects constructed for parallel schema and table scans in_run_group_auditused a hardcoded""forworkspace_name; the CLI now builds aworkspace_url → workspace_namemapping from the discovered workspaces list and passes the correct name into each stub, so schema/table grants carry the right name in CSV and snapshot output logundefined incli.py—log.warning(...)calls in the parallel schema/table scan error handlers referenced a name that was never imported/defined; addedimport loggingandlog = logging.getLogger(__name__)at module level- BFS queue in
get_groups_containing_targetusedlist.pop(0)(O(n) per call); replaced withcollections.deque+popleft()(O(1)) for better performance on deep group hierarchies - CLI JSON indentation —
"principal_source"key in the principal audit JSON dict was at the wrong indent level (valid Python but confusing); aligned with the other keys
Tests¶
- 331 tests (up from 330):
test_scan_schemas_workspace_name_propagatedasserts the workspace_name flows through the scanner;test_principal_audit_json_outputextended withprincipal_sourcekey assertion
[0.12.0] - 2026-04-26¶
Fixed¶
- REVOKE SQL quoting incomplete — principals were only backtick-quoted when they contained
@or a space, leaving group names with hyphens (e.g.data-engineers) and other non-alphanumeric characters unquoted; embedded backtick characters in any identifier (principal name, catalog name) were not escaped; the new_bt()helper unconditionally backtick-quotes every identifier and escapes embedded backticks by doubling them (`→ `), matching the Spark SQL standard
Tests¶
- 330 tests (up from 317):
_btunit tests (wrap, single/multiple escapes); parametrizedtest_principal_always_backtick_quotedacross email / group / SP / space variants;test_principal_with_embedded_backtick_escaped;test_catalog_with_embedded_backtick_escaped;test_principal_with_hyphen_is_quoted;test_principal_without_special_chars_still_quoted
[0.11.0] - 2026-04-25¶
Added¶
--workers Nnow also applies to principal audit —get_workspace_assignmentsandscan_permissionsinPrincipalAuditornow acceptmax_workersand fan out withThreadPoolExecutor; workspace permission-assignment queries run in parallel and each unique workspace is UC-scanned independently in parallel;audit()accepts and threadsmax_workersthrough both calls;_run_principal_auditincli.pypassesargs.workers
Changed¶
scan_permissionsrefactored: duplicate workspace URLs are now deduplicated upfront (replacing the inlineseen_wsset); the per-workspace catalog scan is extracted into_scan_one_workspace()(all state local, safe for concurrent execution);scanned_catalogsis keyed only on catalog name within each workspace call rather than(url, name)globally
Tests¶
- 317 tests (up from 313):
test_parallel_two_workspaces_roles_merged,test_empty_workspaces_returns_empty,test_parallel_two_workspaces_perms_merged,test_full_audit_max_workers_oneintest_principal_auditor.py
[0.10.0] - 2026-04-25¶
Fixed¶
- TokenCache used naive local datetime —
get_token()andset_token()now usedatetime.now(timezone.utc)so token expiry comparisons are correct across DST boundaries and are consistent with the UTC timestamps used elsewhere in the codebase - Statement execution polling loop — timeout was tracked as
elapsed += poll_interval(inaccurate iftime.sleep()overshoots; non-terminating whenpoll_interval=0); replaced with a wall-clock deadline viatime.monotonic() - Bulk-fetch fallback logged at WARNING — some account configurations legitimately cannot bulk-list SCIM users or SPs; the per-member fallback is fully supported and was downgraded from
WARNINGtoINFOto eliminate false alerts in log-monitoring systems - Silent
{}returns in SDK client — three sites inDatabricksSDKClientthat coerced unexpected response types to{}now emit aDEBUGlog before returning so unexpected SDK-version surprises are observable in verbose output
Tests¶
- 313 tests (up from 304): new
tests/test_client.pywith 8TokenCachetests (UTC-awareness, expiry, thread safety, minimum-expiry floor); newtest_execute_statement_timeout_raisesintest_stale_checker.py
[0.9.0] - 2026-04-25¶
Added¶
- Parallel scanning (
--workers N, default 8) — workspace, schema, and table scans now fan out withThreadPoolExecutor; each workspace is scanned from its own vantage point so workspace-catalog bindings are respected; duplicate workspace URLs are silently deduplicated before dispatch scan_all_workspacesnow accepts amax_workersparameter for programmatic use
Fixed¶
- SCIM filter injection — group names, user emails, and SP identifiers are now escaped (backslash and double-quote) before being interpolated into SCIM filter expressions; unescaped values could produce malformed filters or match unintended principals
- UTC timestamps — JSON output
timestampfields were naive local-time strings; they are now always UTC with+00:00offset - Elevation cleanup leak — if
ensure_workspace_adminraised mid-loop, already-elevated workspaces were never revoked; the loop is now wrapped so cleanup runs unconditionally on any exception StaleFinding.last_access— was alwaysNonebecause the SQL query only covered thestale_dayswindow; an extendedmax_lookback_dayswindow (defaultmax(stale_days × 3, 365)) is now used for the query and active-vs-stale classification is done in Python, so stale-but-historically-seen principals get a real date- Snapshot version validation —
load_snapshot()now raisesValueErroron version mismatch instead of silently loading an incompatible schema - CSV output gaps —
write_group_audit_csvwas missing theadditional_privilegescolumn in the redundancy section;write_principal_audit_csvomitted the group-memberships and workspace-roles sections entirely;write_diff_csvlabelled theexternal_idmember column"source" - Workspace token cache race —
_get_workspace_tokenused a non-atomic check-then-insert on_workspace_token_caches; replaced withdict.setdefault()so concurrent threads always share the sameTokenCacheobject per host
Tests¶
- 304 tests (up from 275): new tests for UTC timestamps, elevation cleanup, stale
last_access, snapshot version validation, CSV column counts and section headers,--workersflag, parallel deduplication, and local-group pagination
[0.8.0] - 2026-04-24¶
Added¶
- CSV output (
--output csv) - flat grant table plus redundancy/escalation sections; Excel-ready for auditors who won't run a CLI - Snapshot / diff mode -
--save-snapshot PATHwrites a timestamped JSON snapshot after any audit run;--baseline PATHcompares the current run against a previous snapshot and reports new grants, removed grants, new/removed members - SOC 2 / ISO 27001 compliance evidence workflow AuditDiffmodel withhas_changespropertycsv_output.py-write_group_audit_csv(),write_principal_audit_csv(),write_diff_csv()snapshot.py-build_group_snapshot(),build_principal_snapshot(),save_snapshot(),load_snapshot(),diff_snapshots()- Databricks notebook fully rewritten: inline fallback classes removed, all new features wired to widgets,
AuditResultBuilderupdated with source tagging and new DataFrame builders
[0.7.0] - 2026-04-24¶
Added¶
- Identity source tagging - every user, SP, and group is tagged
external(IdP-provisioned via SCIMexternalId) orinternal(Databricks-managed) PrincipalSourceenum and_source_from_external_id()helper inmodels.pysourceproperty onGroupMember,GroupNode,GroupMembership;principal_sourceproperty onPrincipalAuditResult- Group audit text/JSON output shows
(N IdP-synced, M Databricks-managed)breakdowns for users, SPs, and groups - Principal audit text/JSON output shows per-principal and per-group source tags
[0.6.0] - 2026-04-24¶
Added¶
- Privilege escalation detection (
--escalation-check, principal audit only) - flagsALL_PRIVILEGESandMANAGEgrants inherited through group membership;EscalationFindingmodel;escalation.py - Stale grant detection (
--stale-days N) - cross-references member-direct catalog grants againstsystem.access.auditvia the Statement Execution API; flags principals with no recorded activity in the last N days;StaleFindingmodel;stale_checker.py; requires--sql-warehouse-id - Workspace-local group detection (
--check-local-groups) - scans workspace SCIM and flags groups absent from account SCIM (legacy pre-UC groups);LocalGroupFindingmodel;local_groups.py
[0.5.0] - 2026-04-24¶
Added¶
- Just-in-time Workspace Admin elevation (
--auto-elevate) - temporarily grants the audit SP Workspace Admin on each workspace that lacks it, then restores the prior state after the audit (success or failure);PermissionElevatorcontext manager;elevate.py --dry-run-elevation- previews which workspaces would be elevated without writing any changes
[0.3.0] - 2026-04-01¶
Added¶
- Initial public release
- Group audit mode (
--group) - recursive SCIM group resolution, multi-workspace catalog/schema/table permission scanning, grant classification (Direct/Upstream/Member Direct), redundancy detection, copy-paste REVOKE SQL generation - Principal audit mode (
--principal) - reverse BFS lookup from user/SP/group through all group memberships and workspace assignments to effective UC permissions; dead-end group detection - Dual client backends:
DatabricksAPIClient(raw HTTP, always available) andDatabricksSDKClient(optional, wrapsdatabricks-sdk); auto-selected bycreate_client() - Multi-cloud support: Azure, AWS, GCP
--output textand--output json--scan-schemasand--scan-tablesdepth flags- Databricks notebook with Spark DataFrame output and optional Delta export