LinkDesk/.kiro/specs/shot-asset-task-status-opti.../design.md

587 lines
24 KiB
Markdown

# Design Document
## Overview
This design optimizes the SQL schema and query patterns for shot and asset data fetching by consolidating task status information into single database operations. Currently, the system fetches shot/asset data and then makes separate queries for task statuses, creating N+1 query problems when displaying tables with many rows. This optimization will use SQL joins and aggregation to fetch all required data in single queries, significantly improving performance for data table rendering.
The optimization maintains full backward compatibility while providing new optimized endpoints and query patterns. The system will support both the existing individual query approach and the new aggregated approach, allowing for gradual migration and testing.
## Architecture
### Current Architecture Issues
**Confirmed N+1 Query Problem in Current Implementation:**
1. **Main Query**: `shots = query.offset(skip).limit(limit).all()` fetches shots first
2. **Per-Shot Task Query**: For each shot, runs `db.query(Task).filter(Task.shot_id == shot.id, Task.deleted_at.is_(None)).all()`
3. **Application-Level Aggregation**: Task status building happens in Python loops:
```python
for shot in shots:
tasks = db.query(Task).filter(Task.shot_id == shot.id, Task.deleted_at.is_(None)).all()
task_status = {}
for task_type in all_task_types:
task_status[task_type] = "not_started"
for task in tasks:
task_status[task.task_type] = task.status
```
4. **Same Pattern for Assets**: Assets follow identical N+1 pattern with per-asset task queries
5. **Performance Impact**: For 100 shots, this results in 101 database queries (1 for shots + 100 for tasks)
### Optimized Architecture
1. **Single Query Operations**: Use SQL joins to fetch shots/assets with all task status data in one query
2. **Database-Level Aggregation**: Use SQL aggregation functions (JSON_OBJECT, GROUP_CONCAT) to build task status maps
3. **Indexed Relationships**: Add strategic indexes to optimize join performance
4. **Cached Aggregations**: Optional caching layer for frequently accessed aggregated data
## Components and Interfaces
### Database Layer Optimizations
#### New Database Indexes
```sql
-- Optimize task lookups by shot/asset
CREATE INDEX idx_tasks_shot_id_active ON tasks(shot_id)
WHERE deleted_at IS NULL;
CREATE INDEX idx_tasks_asset_id_active ON tasks(asset_id)
WHERE deleted_at IS NULL;
-- Optimize task status filtering
CREATE INDEX idx_tasks_status_type ON tasks(status, task_type)
WHERE deleted_at IS NULL;
-- Composite indexes for common query patterns
CREATE INDEX idx_tasks_shot_status_type ON tasks(shot_id, status, task_type)
WHERE deleted_at IS NULL;
CREATE INDEX idx_tasks_asset_status_type ON tasks(asset_id, status, task_type)
WHERE deleted_at IS NULL;
```
#### Optimized SQL Query Patterns
**Shot List with Task Status Aggregation:**
```sql
SELECT
s.*,
COALESCE(
JSON_OBJECT(
'task_statuses', JSON_OBJECTAGG(t.task_type, t.status),
'task_details', JSON_ARRAYAGG(
JSON_OBJECT(
'task_id', t.id,
'task_type', t.task_type,
'status', t.status,
'assigned_user_id', t.assigned_user_id,
'updated_at', t.updated_at
)
)
),
JSON_OBJECT('task_statuses', JSON_OBJECT(), 'task_details', JSON_ARRAY())
) as task_data
FROM shots s
LEFT JOIN tasks t ON s.id = t.shot_id AND t.deleted_at IS NULL
WHERE s.deleted_at IS NULL
GROUP BY s.id
ORDER BY s.name;
```
**Asset List with Task Status Aggregation:**
```sql
SELECT
a.*,
COALESCE(
JSON_OBJECT(
'task_statuses', JSON_OBJECTAGG(t.task_type, t.status),
'task_details', JSON_ARRAYAGG(
JSON_OBJECT(
'task_id', t.id,
'task_type', t.task_type,
'status', t.status,
'assigned_user_id', t.assigned_user_id,
'updated_at', t.updated_at
)
)
),
JSON_OBJECT('task_statuses', JSON_OBJECT(), 'task_details', JSON_ARRAY())
) as task_data
FROM assets a
LEFT JOIN tasks t ON a.id = t.asset_id AND t.deleted_at IS NULL
WHERE a.deleted_at IS NULL
GROUP BY a.id
ORDER BY a.name;
```
### Service Layer
#### Enhanced Existing Services
**Modified Shot Router Methods:**
```python
# Modify existing methods in backend/routers/shots.py
def list_shots():
"""Enhanced to use single query with joins instead of N+1 pattern."""
# Replace current N+1 implementation with optimized join query
def get_shot():
"""Enhanced to fetch shot with task status in single query."""
# Replace current separate task query with join
```
**Modified Asset Router Methods:**
```python
# Modify existing methods in backend/routers/assets.py
def list_assets():
"""Enhanced to use single query with joins instead of N+1 pattern."""
# Replace current N+1 implementation with optimized join query
def get_asset():
"""Enhanced to fetch asset with task status in single query."""
# Replace current separate task query with join
```
**Implementation Strategy**:
Replace the current loop-based approach with SQLAlchemy joins and subqueries to fetch all data in single database operations.
### API Layer
#### Optimized Existing Endpoints
The existing endpoints will be optimized to use single-query patterns while maintaining full backward compatibility:
**Current Endpoints (to be optimized):**
- `GET /api/shots/` - List shots with embedded task status data (optimized internally)
- `GET /api/shots/{shot_id}` - Get single shot with embedded task status data (optimized internally)
- `GET /api/assets/` - List assets with embedded task status data (optimized internally)
- `GET /api/assets/{asset_id}` - Get single asset with embedded task status data (optimized internally)
**No API Changes Required**: The response format remains identical, but the underlying queries will be optimized to use joins instead of N+1 patterns.
**Optional Enhancement**: Add an optional `use_legacy_queries=true` parameter for testing and rollback purposes during deployment.
### Frontend Layer Optimizations
#### Current Frontend Issues Identified
**Redundant API Calls in Components:**
1. **ShotDetailPanel**: Makes additional `taskService.getTasks({ shotId })` call even though shot data already includes `task_details`
2. **TaskBrowser**: Makes separate `taskService.getTasks()` calls when task data could be included in parent queries
3. **AssetBrowser**: Already optimized - uses `task_status` and `task_details` from asset data
4. **TasksStore**: Makes separate task queries that could be consolidated
### Frontend Layer Optimizations
#### Current Frontend Issues Identified
**Redundant API Calls in Existing Components:**
1. **ShotDetailPanel.vue**: Makes additional `taskService.getTasks({ shotId })` call even though shot data already includes `task_details`
2. **AssetDetailPanel.vue**: Likely makes additional task API calls even though asset data already includes `task_details`
3. **TaskBrowser.vue**: Makes separate `taskService.getTasks()` calls when task data could be included in parent queries
4. **AssetBrowser.vue**: Already partially optimized - uses `task_status` and `task_details` from asset data
5. **TasksStore**: Makes separate task queries that could be consolidated
6. **EditableTaskStatus.vue**: Each component instance calls `customTaskStatusService.getAllStatuses()` causing N+1 API calls to `/projects/{id}/task-statuses`
6. **EditableTaskStatus.vue**: Each component instance calls `customTaskStatusService.getAllStatuses()` causing N+1 API calls to `/projects/{id}/task-statuses`
6. **EditableTaskStatus.vue**: Each component instance calls `customTaskStatusService.getAllStatuses()` causing N+1 API calls to `/projects/{id}/task-statuses`
6. **EditableTaskStatus.vue**: Each component instance calls `customTaskStatusService.getAllStatuses()` causing N+1 API calls to `/projects/{id}/task-statuses`
6. **EditableTaskStatus.vue**: Each component instance calls `customTaskStatusService.getAllStatuses()` causing N+1 API calls to `/projects/{id}/task-statuses`
#### Frontend Optimization Strategy
**Modify Existing Components to Use Embedded Data:**
**1. Update ShotDetailPanel.vue:**
```typescript
// CURRENT CODE (makes redundant API call):
async function loadTasks() {
isLoadingTasks.value = true
const taskList = await taskService.getTasks({ shotId: props.shotId })
tasks.value = taskList
isLoadingTasks.value = false
}
// OPTIMIZED CODE (use embedded data):
function loadTasks() {
// Use task_details already embedded in shot data - no API call needed!
tasks.value = shot.value?.task_details || []
isLoadingTasks.value = false
}
```
**2. Update AssetDetailPanel.vue (if it exists):**
```typescript
// CURRENT CODE (makes redundant API call):
async function loadTasks() {
isLoadingTasks.value = true
const taskList = await taskService.getTasks({ assetId: props.assetId })
tasks.value = taskList
isLoadingTasks.value = false
}
// OPTIMIZED CODE (use embedded data):
function loadTasks() {
// Use task_details already embedded in asset data - no API call needed!
tasks.value = asset.value?.task_details || []
isLoadingTasks.value = false
}
```
**3. Update TaskBrowser.vue:**
```typescript
// CURRENT CODE (separate task API call):
const fetchTasks = async () => {
const response = await taskService.getTasks({ projectId: props.projectId })
tasks.value = response
}
**3. Update TaskBrowser.vue:**
```typescript
// CURRENT CODE (separate task API call):
const fetchTasks = async () => {
const response = await taskService.getTasks({ projectId: props.projectId })
tasks.value = response
}
// OPTIMIZED CODE (extract from shots AND assets):
const fetchTasks = async () => {
// Get both shots and assets with embedded task data (two optimized backend calls)
const [shots, assets] = await Promise.all([
shotService.getShots({ projectId: props.projectId }),
assetService.getAssets(props.projectId)
])
// Extract tasks from embedded data - no separate task API calls needed!
const shotTasks = shots.flatMap(shot => shot.task_details || [])
const assetTasks = assets.flatMap(asset => asset.task_details || [])
tasks.value = [...shotTasks, ...assetTasks]
}
```
**4. Update TasksStore.ts:**
```typescript
// CURRENT CODE (separate task queries):
async function fetchTasks(filters?: { projectId?: number }) {
const response = await taskService.getTasks(filters)
tasks.value = response
}
// OPTIMIZED CODE (use embedded data from shots AND assets):
async function fetchTasks(filters?: { projectId?: number }) {
if (filters?.projectId) {
// Get both shots and assets with embedded task data
const [shots, assets] = await Promise.all([
shotService.getShots({ projectId: filters.projectId }),
assetService.getAssets(filters.projectId)
])
// Combine all tasks from embedded data
const shotTasks = shots.flatMap(shot => shot.task_details || [])
const assetTasks = assets.flatMap(asset => asset.task_details || [])
tasks.value = [...shotTasks, ...assetTasks]
}
}
```
**5. Optimize Custom Task Status Loading:**
```typescript
// CURRENT PROBLEM (N+1 API calls):
// Each EditableTaskStatus.vue component calls:
const response = await customTaskStatusService.getAllStatuses(props.projectId)
// OPTIMIZED SOLUTION (shared store/cache):
// Create a shared store for custom task statuses
const useCustomTaskStatusStore = () => {
const statusCache = new Map<number, CustomTaskStatusResponse>()
const getStatuses = async (projectId: number) => {
if (statusCache.has(projectId)) {
return statusCache.get(projectId)!
}
const response = await customTaskStatusService.getAllStatuses(projectId)
statusCache.set(projectId, response)
return response
}
return { getStatuses }
}
// OR include custom statuses in shot/asset responses:
// Backend includes custom_task_statuses in project data
// Frontend uses embedded custom status data instead of separate calls
```
**6. Verify AssetBrowser.vue Optimization:**
```typescript
// AssetBrowser.vue is already well-optimized:
// - Uses asset.task_status for status display
// - Uses asset.task_details for task information
// - No redundant API calls for task data
// This component serves as a good example of the optimized pattern
const response = await taskService.getTasks(filters)
tasks.value = response
}
// OPTIMIZED CODE (use embedded data from shots/assets):
async function fetchTasks(filters?: { projectId?: number }) {
if (filters?.projectId) {
// Get both shots and assets with embedded task data
const [shots, assets] = await Promise.all([
shotService.getShots({ projectId: filters.projectId }),
assetService.getAssets(filters.projectId)
])
// Combine all tasks from embedded data
const shotTasks = shots.flatMap(shot => shot.task_details || [])
const assetTasks = assets.flatMap(asset => asset.task_details || [])
tasks.value = [...shotTasks, ...assetTasks]
}
}
```
**Key Benefits:**
- **Reduce API Calls**: From multiple separate calls to using already-loaded embedded data
- **Improve Performance**: Eliminate redundant network requests for both shots and assets
- **Maintain Compatibility**: No changes to component interfaces or props
- **Leverage Backend Optimization**: Use the optimized backend queries that include task data
- **Comprehensive Coverage**: Optimize both shot and asset workflows consistently
## Data Models
### Enhanced Response Schemas
**No Schema Changes Required**: The existing `ShotListResponse` and `AssetListResponse` schemas already include the required fields:
```python
# Current schemas already support optimized data:
class ShotListResponse(BaseModel):
# ... existing fields ...
task_status: Dict[str, Optional[TaskStatus]] = Field(default_factory=dict)
task_details: List[TaskStatusInfo] = Field(default_factory=list)
class AssetListResponse(BaseModel):
# ... existing fields ...
task_status: Dict[str, Optional[TaskStatus]] = Field(default_factory=dict)
task_details: List[TaskStatusInfo] = Field(default_factory=list)
```
**Internal Optimization Only**: The optimization will be purely internal - same response format, but built using efficient database queries instead of N+1 patterns.
### Database View Optimization
**Optional Materialized Views for Heavy Workloads:**
```sql
CREATE MATERIALIZED VIEW shot_task_status_summary AS
SELECT
s.id as shot_id,
s.name as shot_name,
s.project_id,
s.episode_id,
COUNT(t.id) as total_tasks,
COUNT(CASE WHEN t.status = 'completed' THEN 1 END) as completed_tasks,
COUNT(CASE WHEN t.status = 'in_progress' THEN 1 END) as in_progress_tasks,
JSON_OBJECTAGG(t.task_type, t.status) as task_statuses,
MAX(t.updated_at) as last_task_update
FROM shots s
LEFT JOIN tasks t ON s.id = t.shot_id AND t.deleted_at IS NULL
WHERE s.deleted_at IS NULL
GROUP BY s.id, s.name, s.project_id, s.episode_id;
-- Refresh trigger for real-time updates
CREATE TRIGGER refresh_shot_task_summary
AFTER INSERT OR UPDATE OR DELETE ON tasks
FOR EACH ROW
EXECUTE FUNCTION refresh_shot_task_summary();
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property Reflection
After reviewing all properties identified in the prework, I've identified several areas where properties can be consolidated:
**Redundancy Elimination:**
- Properties 1.1 and 2.1 (single query operations for shots/assets) can be combined into one comprehensive property about single query operations
- Properties 1.2 and 2.2 (no additional API calls) can be combined into one property about API call efficiency
- Properties 1.3 and 2.3 (task status aggregation) can be combined into one property about data aggregation
- Properties 1.4 and 2.4 (custom status support) can be combined into one property about custom status handling
- Properties 1.5 and 2.5 (performance requirements) can be combined into one property about performance thresholds
**Property 1: Single Query Data Fetching**
*For any* shot or asset table request, the system should fetch all entity data and associated task statuses in a single database query operation
**Validates: Requirements 1.1, 2.1**
**Property 2: API Call Efficiency**
*For any* data table display operation, the system should render all task status information without requiring additional API calls per table row
**Validates: Requirements 1.2, 2.2**
**Property 3: Complete Task Status Aggregation**
*For any* shot or asset with multiple tasks, the system should include all task statuses in the aggregated response data
**Validates: Requirements 1.3, 2.3**
**Property 4: Custom Status Support**
*For any* project with custom task statuses, the system should include both default and custom status information in all aggregated responses
**Validates: Requirements 1.4, 2.4**
**Property 5: Performance Threshold Compliance**
*For any* table loading operation with up to 100 shots or assets, the system should complete data fetching within 500ms
**Validates: Requirements 1.5, 2.5**
**Property 6: Optimized SQL Join Usage**
*For any* shot or asset query with task status requirements, the system should use SQL joins to fetch all data in a single database round trip
**Validates: Requirements 3.1**
**Property 7: Scalable Query Performance**
*For any* database containing thousands of tasks, the system should maintain query performance through proper indexing strategies
**Validates: Requirements 3.2**
**Property 8: Data Consistency Maintenance**
*For any* task status update operation, the system should ensure consistency between individual task updates and aggregated views
**Validates: Requirements 3.3**
**Property 9: Dynamic Task Type Inclusion**
*For any* project with newly added task types, the system should automatically include them in aggregated task status queries
**Validates: Requirements 3.4**
**Property 10: Database-Level Aggregation**
*For any* task status aggregation operation, the system should use database-level aggregation functions rather than application-level processing
**Validates: Requirements 3.5**
**Property 11: Embedded Task Status Response**
*For any* API response containing shot or asset data, the response should include a task_statuses field with all associated task information
**Validates: Requirements 4.1, 4.2**
**Property 12: Complete Task Status Information**
*For any* embedded task status data, the response should include task type, current status, assignee, and last updated information
**Validates: Requirements 4.3**
**Property 13: Table-Optimized Data Format**
*For any* shot or asset data received by the frontend, the system should provide task status information in a format optimized for table rendering
**Validates: Requirements 4.4**
**Property 14: Real-Time Aggregated Updates**
*For any* task status change, the system should provide real-time updates to aggregated data without requiring full table refreshes
**Validates: Requirements 4.5**
**Property 15: Backward Compatibility Preservation**
*For any* existing API endpoint, the system should maintain all current response formats and functionality after optimization implementation
**Validates: Requirements 5.1**
**Property 16: Legacy Query Support**
*For any* legacy code requesting individual task data, the system should continue to support separate task status queries
**Validates: Requirements 5.2**
**Property 17: Frontend Component Compatibility**
*For any* existing frontend component, the system should return optimized data in formats compatible with current component implementations
**Validates: Requirements 5.3**
**Property 18: Migration Data Integrity**
*For any* database migration operation, the system should preserve all existing data relationships and constraints
**Validates: Requirements 5.4**
**Property 19: Configuration Flexibility**
*For any* deployment environment, the system should provide configuration options to enable or disable new query patterns for testing purposes
**Validates: Requirements 5.5**
## Error Handling
### Query Optimization Errors
1. **Index Missing Errors**: Graceful fallback to non-optimized queries if indexes are missing
2. **JSON Aggregation Failures**: Handle cases where JSON functions are not available in SQLite version
3. **Large Dataset Timeouts**: Implement query timeouts and pagination for very large datasets
4. **Memory Constraints**: Monitor memory usage during aggregation operations
### Data Consistency Errors
1. **Stale Aggregated Data**: Implement cache invalidation strategies for materialized views
2. **Concurrent Update Conflicts**: Handle race conditions during task status updates
3. **Partial Data Loading**: Ensure atomic operations for aggregated data fetching
### Backward Compatibility Errors
1. **Schema Migration Failures**: Rollback strategies for failed database migrations
2. **API Version Conflicts**: Clear error messages for incompatible API usage
3. **Frontend Integration Issues**: Detailed error reporting for data format mismatches
## Testing Strategy
### Unit Testing
**Database Layer Tests:**
- Test optimized SQL queries with various data scenarios
- Verify index usage and query performance
- Test JSON aggregation functions with different data types
- Validate soft deletion filtering in aggregated queries
**Service Layer Tests:**
- Test optimized service methods with mock data
- Verify data transformation and aggregation logic
- Test error handling for edge cases
- Validate caching mechanisms if implemented
**API Layer Tests:**
- Test new optimized endpoints with various parameters
- Verify response format compatibility
- Test backward compatibility with existing endpoints
- Validate error responses and status codes
### Property-Based Testing
The model will use **Hypothesis** for Python property-based testing, configured to run a minimum of 100 iterations per property test.
Each property-based test will be tagged with a comment explicitly referencing the correctness property from this design document using the format: **Feature: shot-asset-task-status-optimization, Property {number}: {property_text}**
**Property Test Examples:**
```python
@given(shots_with_tasks=shot_task_data_strategy())
def test_single_query_data_fetching(shots_with_tasks):
"""
Feature: shot-asset-task-status-optimization, Property 1: Single Query Data Fetching
For any shot or asset table request, the system should fetch all entity data
and associated task statuses in a single database query operation
"""
# Test implementation here
@given(table_data=table_display_strategy())
def test_api_call_efficiency(table_data):
"""
Feature: shot-asset-task-status-optimization, Property 2: API Call Efficiency
For any data table display operation, the system should render all task status
information without requiring additional API calls per table row
"""
# Test implementation here
```
### Integration Testing
**End-to-End Performance Tests:**
- Load test with 100+ shots/assets with multiple tasks each
- Measure query execution times and memory usage
- Test concurrent access patterns
- Validate real-time update propagation
**Frontend Integration Tests:**
- Test data table rendering with optimized data
- Verify task status filtering and sorting
- Test real-time updates in UI components
- Validate error handling in frontend components
### Migration Testing
**Data Migration Validation:**
- Test migration scripts with production-like data volumes
- Verify data integrity before and after migration
- Test rollback procedures
- Validate index creation and performance impact
**Backward Compatibility Testing:**
- Run existing test suites against optimized system
- Test legacy API endpoints with new backend
- Verify existing frontend components work with optimized data
- Test configuration options for enabling/disabling optimizations