feat(server): workspace embedding improve (#12022)

fix AI-10
fix AI-109
fix PD-2484

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

- **New Features**
  - Added a method to check if a document requires embedding, improving embedding efficiency.
  - Enhanced document embeddings with enriched metadata, including title, summary, creation/update dates, and author information.
  - Introduced a new type for document fragments with extended metadata fields.

- **Improvements**
  - Embedding logic now conditionally processes only documents needing updates.
  - Embedding content now includes document metadata for more informative context.
  - Expanded and improved test coverage for embedding scenarios and workspace behaviors.
  - Event emission added for workspace embedding updates on client version mismatch.
  - Job queueing enhanced with prioritization and explicit job IDs for better management.
  - Job queue calls updated to include priority and context identifiers in a structured format.

- **Bug Fixes**
  - Improved handling of ignored documents in embedding matches.
  - Fixed incorrect document ID assignment in embedding job queueing.

- **Tests**
  - Added and updated snapshot and behavioral tests for embedding and workspace document handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
darkskygit
2025-05-23 10:16:14 +00:00
parent 262f1a47a4
commit 2a80fbb993
9 changed files with 326 additions and 54 deletions

View File

@@ -14,6 +14,7 @@ import {
CallMetric,
DocNotFound,
DocUpdateBlocked,
EventBus,
GatewayErrorWrapper,
metrics,
NotInSpace,
@@ -144,6 +145,7 @@ export class SpaceSyncGateway
constructor(
private readonly ac: AccessController,
private readonly event: EventBus,
private readonly workspace: PgWorkspaceDocStorageAdapter,
private readonly userspace: PgUserspaceDocStorageAdapter,
private readonly docReader: DocReader,
@@ -201,6 +203,7 @@ export class SpaceSyncGateway
await client.join(room);
}
} else {
this.event.emit('workspace.embedding', { workspaceId: spaceId });
await this.selectAdapter(client, spaceType).join(user.id, spaceId);
}