mirror of
https://github.com/toeverything/AFFiNE.git
synced 2026-02-15 05:37:32 +00:00
feat(server): workspace embedding improve (#12022)
fix AI-10 fix AI-109 fix PD-2484 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added a method to check if a document requires embedding, improving embedding efficiency. - Enhanced document embeddings with enriched metadata, including title, summary, creation/update dates, and author information. - Introduced a new type for document fragments with extended metadata fields. - **Improvements** - Embedding logic now conditionally processes only documents needing updates. - Embedding content now includes document metadata for more informative context. - Expanded and improved test coverage for embedding scenarios and workspace behaviors. - Event emission added for workspace embedding updates on client version mismatch. - Job queueing enhanced with prioritization and explicit job IDs for better management. - Job queue calls updated to include priority and context identifiers in a structured format. - **Bug Fixes** - Improved handling of ignored documents in embedding matches. - Fixed incorrect document ID assignment in embedding job queueing. - **Tests** - Added and updated snapshot and behavioral tests for embedding and workspace document handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
@@ -175,6 +175,55 @@ export class CopilotWorkspaceConfigModel extends BaseModel {
|
||||
};
|
||||
}
|
||||
|
||||
@Transactional()
|
||||
async checkDocNeedEmbedded(workspaceId: string, docId: string) {
|
||||
// NOTE: check if the document needs re-embedding.
|
||||
// 1. check if there have been any recent updates to the document snapshot and update
|
||||
// 2. check if the embedding is older than the snapshot and update
|
||||
// 3. check if the embedding is older than 10 minutes (avoid frequent updates)
|
||||
// if all conditions are met, re-embedding is required.
|
||||
const result = await this.db.$queryRaw<{ needs_embedding: boolean }[]>`
|
||||
SELECT
|
||||
EXISTS (
|
||||
WITH docs AS (
|
||||
SELECT
|
||||
s.workspace_id,
|
||||
s.guid AS doc_id,
|
||||
s.updated_at
|
||||
FROM
|
||||
snapshots s
|
||||
WHERE
|
||||
s.workspace_id = ${workspaceId}
|
||||
AND s.guid = ${docId}
|
||||
UNION
|
||||
ALL
|
||||
SELECT
|
||||
u.workspace_id,
|
||||
u.guid AS doc_id,
|
||||
u.created_at AS updated_at
|
||||
FROM
|
||||
"updates" u
|
||||
WHERE
|
||||
u.workspace_id = ${workspaceId}
|
||||
AND u.guid = ${docId}
|
||||
)
|
||||
SELECT
|
||||
1
|
||||
FROM
|
||||
docs
|
||||
LEFT JOIN ai_workspace_embeddings e
|
||||
ON e.workspace_id = docs.workspace_id
|
||||
AND e.doc_id = docs.doc_id
|
||||
WHERE
|
||||
e.updated_at IS NULL
|
||||
OR docs.updated_at > e.updated_at
|
||||
OR e.updated_at < NOW() - INTERVAL '10 minutes'
|
||||
) AS needs_embedding;
|
||||
`;
|
||||
|
||||
return result[0]?.needs_embedding ?? false;
|
||||
}
|
||||
|
||||
// ================ embeddings ================
|
||||
|
||||
async checkEmbeddingAvailable(): Promise<boolean> {
|
||||
|
||||
Reference in New Issue
Block a user