ifc-commit/docs/research.md
2026-03-25 10:36:30 +01:00

226 lines
7.6 KiB
Markdown

# Research Notes
This document covers the research behind ifc-commit's approach to storing git
provenance inside IFC files: the mechanisms surveyed, the tradeoffs considered,
and how the implementation was designed.
---
## Embedding Commit History in IFC Files
A survey of IFC mechanisms for storing git commit metadata.
---
### 1. IfcOwnerHistory — The Native Mechanism
Every `IfcRoot`-derived entity (walls, spaces, products, etc.) carries an optional `IfcOwnerHistory` attribute. It is the closest thing IFC has to built-in change tracking.
**Fields:**
| Field | Type | Notes |
|-------|------|-------|
| `OwningUser` | `IfcPersonAndOrganization` | Who created the element |
| `OwningApplication` | `IfcApplication` | Software used |
| `State` | `IfcStateEnum` | `READWRITE`, `READONLY`, `LOCKED` |
| `ChangeAction` | `IfcChangeActionEnum` | `ADDED`, `MODIFIED`, `DELETED`, `NOCHANGE` |
| `LastModifiedDate` | `IfcTimeStamp` | Unix timestamp |
| `LastModifyingUser` | `IfcPersonAndOrganization` | |
| `LastModifyingApplication` | `IfcApplication` | |
| `CreationDate` | `IfcTimeStamp` | Unix timestamp |
Raw IFC line from `samples/duplex.ifc`:
```
#33=IFCOWNERHISTORY(#32,#2,$,.NOCHANGE.,$,$,$,0);
```
**Limitations:**
- Only the *current* state — no history chain
- `ChangeAction` is a coarse enum; no room for a commit hash, message, or branch
- `IfcApplication.Version` is a short string, not designed for structured data
- One record per element; previous owners are lost on update
**Verdict:** Good for standard compliance and timestamping. Not sufficient alone for git metadata.
---
### 2. IfcPropertySet — The Recommended Extension Point
Custom property sets (`Pset_*`) are the standard IFC way to attach arbitrary key-value metadata to any `IfcObject`. They survive round-trips through most IFC-aware tools (unknown Psets are ignored, not discarded).
> **Schema** — `Pset_GitCommit`
| Property | Type | Example |
|----------|------|---------|
| `CommitHash` | `IfcLabel` | `a1b2c3d4f5e6c7b8` |
| `CommitMessage` | `IfcText` | `Fix wall thickness` |
| `CommitAuthor` | `IfcLabel` | `alice <***@***>` |
| `CommitDate` | `IfcLabel` | `2026-03-24T14:30:00Z` |
| `CommitBranch` | `IfcLabel` | `main` |
| `OperationName` | `IfcLabel` | `Merge` |
**ifcopenshell snippet — writing:**
```python
pset = ifcopenshell.api.pset.add_pset(model, product=element, name="Pset_GitCommit")
ifcopenshell.api.pset.edit_pset(model, pset=pset, properties={
"CommitHash": commit_hash,
"CommitMessage": commit_message,
"CommitAuthor": commit_author,
"CommitDate": commit_date,
"CommitBranch": branch,
"OperationName": operation_name,
})
```
**Reading back:**
```python
for rel in element.IsDefinedBy or []:
if rel.is_a("IfcRelDefinesByProperties"):
pset = rel.RelatingPropertyDefinition
if pset.Name == "Pset_GitCommit":
props = {p.Name: p.NominalValue.wrappedValue for p in pset.HasProperties}
```
**Verdict:** Best fit for per-element traceability. Flexible, queryable, spec-compliant.
---
### 3. IfcDocumentInformation — For Linking to External Commits
`IfcDocumentInformation` + `IfcRelAssociatesDocument` lets you attach a document reference (URL, identifier, description) to any `IfcRoot` entity. It can carry a git commit URL back to the forge.
```python
doc = ifcopenshell.api.document.add_information(model)
ifcopenshell.api.document.edit_information(model, information=doc, attributes={
"Identification": commit_hash[:8],
"Name": commit_message,
"Location": f"https://gitaec.org/rvba/ifc-commit/commit/{commit_hash}",
})
ref = ifcopenshell.api.document.add_reference(model, information=doc)
ifcopenshell.api.document.assign_document(model, products=[element], document=ref)
```
**Verdict:** Useful for linking elements to a hosted commit URL. More verbose than a Pset. Better suited for file-level "source revision" than per-element tracking.
---
### 4. IfcApplication — File-Level Commit Stamp
`IfcApplication` is referenced by every `IfcOwnerHistory`. Its `Version` field can carry the current commit hash as a lightweight file-level stamp.
```python
app = ifcopenshell.api.owner.add_application(model)
ifcopenshell.api.owner.edit_application(model, application=app, attributes={
"ApplicationIdentifier": "ifc-commit",
"Version": commit_hash,
"Name": "ifc-commit",
})
```
**Verdict:** Zero overhead. Limited to one hash per file. Good as a quick "what commit produced this file" marker.
---
### 5. Comparison
| Mechanism | Granularity | Stores hash/message | IFC compliance | Overhead |
|-----------|-------------|---------------------|----------------|----------|
| `IfcOwnerHistory` | per-element | No | Native | Minimal |
| `Pset_GitCommit` | per-element | Yes (all fields) | Standard extension | Medium |
| `IfcDocumentInformation` | per-element | Yes (via Location) | Standard | High |
| `IfcApplication.Version` | per-file | Hash only | Native | Minimal |
---
### 6. Adopted Approach
A two-layer design:
1. **File level**`IfcApplication.Version` is set to the commit hash. Every tool that reads `IfcOwnerHistory` exposes this with no extra work.
2. **Element level** — On elements touched by an operation, a `Pset_GitCommit` property set is written with the full commit metadata. `IfcOwnerHistory.ChangeAction` is updated to `ADDED` or `MODIFIED` accordingly.
This keeps standard IFC compliance intact while making full git provenance queryable directly from the model.
---
## Implementation Plan
The following section describes how the history mechanism is integrated into the pipeline and webapp.
---
### Pipeline Integration
The `history` command in ifc-commit operates in two modes:
- **Write mode** (`write_psets: true`): at the end of a pipeline run, stamps `Pset_GitCommit` on all elements in every operation's output IFC, using the current `HEAD` commit.
- **Read mode** (`input` present): opens an IFC file, collects all `Pset_GitCommit` records, and writes them to a JSON file for the webapp.
**Example pipeline declaration:**
```yaml
operations:
- name: Extract
command: extract
input: ifc/duplex.ifc
output: ifc/duplex_extract.ifc
type: IfcSpace
id: A102
- name: Modify
command: modify
input: ifc/duplex_extract.ifc
output: ifc/duplex_modified.ifc
element: "168381"
x: 2
- name: Merge
command: merge
base: ifc/duplex.ifc
space: A102
part: ifc/duplex_modified.ifc
output: ifc/duplex_merge.ifc
- name: WriteHistory
command: history
write_psets: true
- name: ReadHistory
command: history
input: ifc/duplex_merge.ifc
output: ifc/history.json
```
---
### Webapp Integration
After a pipeline run, the webapp calls `/api/ifc-history` to surface the per-element commit metadata, with links back to the corresponding commits on the forge.
Each element that was touched by the pipeline carries:
> **Schema** — per-element provenance
| Property | Example |
|----------|---------|
| `CommitHash` | `a1b2c3d4…` |
| `CommitMessage` | `move table` |
| `CommitAuthor` | `rvba <***@***>` |
| `CommitDate` | `2026-03-24T14:30:00Z` |
| `CommitBranch` | `main` |
| `OperationName` | `Modify` |
The element history panel links each commit hash to its page on the forge, making the full modification trail navigable directly from the webapp.
---
## Related Work
- [ifc-data-bus](https://github.com/vyzn-tech/ifc-data-bus)
- [ifc-data-horse](https://gitaec.org/rvba/ifc-data-horse)
- [buildingSMART Hackathon 2026](https://github.com/lfcastel/buildingSMART-Hackathon-2026)
- [Git-based IFC — Bruno Postle](https://gitaec.org/brunopostle/creative-freedom)