Compare commits

...

9 Commits

Author SHA1 Message Date
d95452b807 feat: 删除不再使用的浏览器代理插件导出;添加使用指南文档 2026-01-22 12:49:00 +08:00
dde98e437b 修复:注释掉调试信息以清理控制台输出 2026-01-21 22:58:51 +08:00
e5c3ec3f79 feat: refactor context usage and add search template route
- Updated context key usage from `useConfigKey` to `useContextKey` in app initialization.
- Introduced a new route for searching templates related to 小红书 with a default keyword.
- Enhanced error handling for saving notes and user information.
- Added a command for searching templates in the CLI.
- Created a new agent plugin for browser integration.
2026-01-21 01:45:27 +08:00
e5ddd01fd2 重构用户和笔记相关的数据库模式,更新字段名称;优化路由描述,添加新搜索功能;调整浏览器启动参数,简化调试信息 2026-01-07 01:49:49 +08:00
2621d0229f update 2026-01-05 02:02:35 +08:00
2472cb0059 重构数据库模式,增加用户信息和笔记字段;更新配置文件路径,优化浏览器启动参数;添加用户和笔记索引;更新初始化脚本和快照文件 2026-01-02 18:30:52 +08:00
2c3bc79e6e 更新依赖项,添加 user-agents 库;重构浏览器启动逻辑,支持无头模式和隐身模式 2026-01-01 22:04:14 +08:00
95a65e0f84 更新 package.json 和 pnpm-lock.yaml,添加 zod 和 zod-to-json-schema 依赖,优化依赖管理 2025-12-31 17:53:34 +08:00
2fcba3a096 更新文档,添加小红书模块描述;重构数据库模式,增加笔记和用户信息字段;优化核心逻辑,增加记录超时处理;更新示例数据。 2025-12-31 03:36:27 +08:00
25 changed files with 2597 additions and 437 deletions

View File

401
AGENTS.md Normal file
View File

@@ -0,0 +1,401 @@
# Browser Helper - 浏览器自动化助手使用指南
## 项目简介
browser-helper 是一个基于 Playwright 的浏览器自动化助手工具,主要用于自动化收集和整理小红书平台信息。通过 AI 辅助过滤无效信息,帮助用户高效收集和整理数据。
## 核心功能
- **浏览器自动化控制**:通过 Playwright 控制浏览器执行自动化操作
- **小红书数据采集**:自动搜索、收集笔记、用户、标签等信息
- **本地数据存储**:使用 SQLite + Drizzle ORM 持久化存储
- **API 接口服务**:提供 RESTful API 供 AI 助手调用
## 技术栈
- **运行时**: Bun / Node.js
- **数据库**: SQLite + Drizzle ORM
- **浏览器自动化**: Playwright
- **API 框架**: @kevisual/router
- **缓存**: LRU Cache
## 快速开始
### 环境初始化
```bash
# 安装依赖
pnpm install
# 初始化项目(安装依赖、数据库、浏览器)
pnpm run init
```
### 启动服务
```bash
# 开发模式启动服务
pnpm run dev
# 启动浏览器(生产模式,使用 pm2
pnpm run browser
# 开发模式启动浏览器
pnpm run dev:browser
# 启动 Drizzle Studio数据库可视化工具
pnpm run studio
```
## API 接口文档
### 认证
所有接口都需要通过 `auth` 中间件认证(当前为临时 Token 验证)。
### 小红书笔记接口
#### 搜索笔记
```
POST /xhs/search-notes
```
**参数**:
- `keyword`: 搜索关键词(必填)
- `pushTime`: 发布时间筛选(一天内/一周内/半年内),默认一天内
- `sort`: 排序方式(综合/最新/最多点赞/最多评论),默认最新
- `distance`: 距离筛选(不限/同城/附近),默认不限
- `searchRange`: 搜索范围(不限/已看过/未看过/已关注),默认不限
- `scrollTimes`: 滚动次数默认5次
#### 获取笔记列表
```
GET /xhs/list
```
**参数**:
- `page`: 页码默认1
- `pageSize`: 每页数量默认20
- `search`: 搜索关键词
- `sort`: 排序方式ASC/DESC默认DESC
#### 获取单条笔记
```
GET /xhs/get
```
**参数**:
- `data.id`: 笔记ID
#### 创建/更新笔记
```
POST /xhs/update
```
**参数**:
- `data.id`: 笔记ID更新时必填
- `data.title`: 笔记标题
- `data.summary`: 笔记摘要
- `data.description`: 笔记描述
- `data.tags`: 标签数组
- `data.data`: 笔记完整数据
#### 删除笔记
```
POST /xhs/delete
```
**参数**:
- `data.id`: 笔记ID
### 小红书用户接口
#### 获取用户列表
```
GET /xhs-users/list
```
**参数**:
- `page`: 页码默认1
- `pageSize`: 每页数量默认20
- `search`: 搜索关键词(模糊匹配昵称、用户名、描述)
- `sort`: 排序方式ASC/DESC默认DESC
#### 获取单个用户
```
GET /xhs-users/get
```
**参数**:
- `data.id`: 用户ID
#### 创建/更新用户
```
POST /xhs-users/update
```
**参数**:
- `data.id`: 用户ID更新时必填
- `data.nickname`: 用户昵称
- `data.username`: 用户名
- `data.avatar`: 用户头像
- `data.description`: 用户描述
- `data.tags`: 标签数组
#### 删除用户
```
POST /xhs-users/delete
```
**参数**:
- `data.id`: 用户ID
### 小红书标签接口
#### 获取标签列表
```
GET /xhs-tags/list
```
**参数**:
- `page`: 页码默认1
- `pageSize`: 每页数量默认20
- `search`: 搜索关键词
- `sort`: 排序方式ASC/DESC默认DESC
#### 创建/更新标签
```
POST /xhs-tags/update
```
**参数**:
- `data.id`: 标签ID更新时必填
- `data.title`: 标签标题
- `data.description`: 标签描述
#### 删除标签
```
POST /xhs-tags/delete
```
**参数**:
- `data.id`: 标签ID
### 预设搜索场景
#### 信息差搜索
```
POST /good/searchInfo
```
**参数**:
- `keyword`: 搜索关键词,默认"信息差"
#### 工作招聘搜索
```
POST /good/searchWork
```
**参数**:
- `keyword`: 搜索关键词,默认"工作 杭州"
#### 交友相亲搜索
```
POST /good/searchDate
```
**参数**:
- `keyword`: 搜索关键词,默认"相亲 杭州"
#### 拼豆搜索
```
POST /good/searchBean
```
**参数**:
- `keyword`: 搜索关键词,默认"拼豆"
#### 网站模板搜索
```
POST /good/searchTemplate
```
**参数**:
- `keyword`: 搜索关键词,默认"网站模板"
## 数据库结构
### xhs_note小红书笔记表
| 字段 | 类型 | 说明 |
|------|------|------|
| id | text | 笔记ID主键 |
| title | text | 笔记标题 |
| summary | text | 笔记摘要 |
| description | text | 笔记描述/搜索关键词 |
| link | text | 笔记链接 |
| data | text | 完整数据JSON |
| tags | text | 标签 |
| status | text | 状态(正常笔记/归档/禁止用户/删除/不相关) |
| authorUrl | text | 作者主页链接 |
| cover | text | 封面图 |
| syncStatus | integer | 同步状态 |
| syncAt | integer | 同步时间 |
| star | integer | 标记 |
| userId | text | 用户ID |
| createdAt | integer | 创建时间 |
| updatedAt | integer | 更新时间 |
### xhs_user小红书用户表
| 字段 | 类型 | 说明 |
|------|------|------|
| id | text | 用户ID主键 |
| xsec_token | text | XSEC Token |
| username | text | 用户名 |
| nickname | text | 昵称 |
| avatar | text | 头像 |
| title | text | 标题 |
| summary | text | 摘要 |
| description | text | 描述 |
| link | text | 主页链接 |
| data | text | 完整数据JSON |
| tags | text | 标签 |
| bunTags | text | 屏蔽标签 |
| followersCount | integer | 粉丝数 |
| followingCount | integer | 关注数 |
| status | text | 状态 |
| syncStatus | integer | 同步状态 |
| syncAt | integer | 同步时间 |
| star | integer | 标记 |
### xhs_tags小红书标签表
| 字段 | 类型 | 说明 |
|------|------|------|
| id | text | 标签ID主键 |
| title | text | 标签标题 |
| description | text | 标签描述 |
| createdAt | integer | 创建时间 |
| updatedAt | integer | 更新时间 |
### cache缓存表
| 字段 | 类型 | 说明 |
|------|------|------|
| key | text | 缓存键(主键) |
| value | text | 缓存值 |
| expireAt | integer | 过期时间 |
| createdAt | integer | 创建时间 |
## 项目结构
```
browser-helper/
├── src/
│ ├── app.ts # 应用入口和核心配置
│ ├── index.ts # 主入口,导出 API 插件
│ ├── db/
│ │ ├── schema.ts # 数据库表结构定义
│ │ └── cache.ts # 数据库缓存模块
│ ├── modules/
│ │ └── cache.ts # 会话缓存LRU Cache
│ ├── playwright/
│ │ ├── core.ts # 浏览器核心控制类
│ │ ├── browser.ts # 浏览器启动逻辑
│ │ ├── actions.ts # 浏览器操作封装
│ │ └── index.ts # 导出
│ └── routes/
│ ├── index.ts # 路由入口
│ ├── xhs/ # 小红书相关路由
│ │ ├── index.ts
│ │ ├── search-notes.ts # 笔记搜索
│ │ ├── xhs-list.ts # 笔记 CRUD
│ │ ├── xhs-user-list.ts # 用户 CRUD
│ │ └── xhs-tags-list.ts # 标签 CRUD
│ └── good/ # 预设搜索场景
│ └── index.ts
├── storage/
│ └── browser-helper/
│ └── data.sqlite3 # SQLite 数据库文件
├── browser-context/ # 浏览器上下文数据
├── dist/ # 构建输出目录
├── typings/ # TypeScript 类型定义
├── drizzle.config.ts # Drizzle 配置
├── bun.config.ts # Bun 构建配置
├── package.json # 项目配置
└── pnpm-lock.yaml # 依赖锁定
```
## 浏览器控制
### Core 类
`Core` 类是浏览器自动化的核心组件,提供以下功能:
```typescript
// 连接浏览器
await core.connect();
// 获取页面实例
const page = await core.getPage();
// 设置记录就绪状态
await core.setReady(true);
// 监听响应
core.on('search/notes', (data) => {
console.log('捕获响应:', data);
});
```
### 启动流程
1. 启动浏览器进程(通过 `start-browser.js`
2. Playwright 通过 CDP 协议连接到浏览器
3. 加载浏览器上下文和页面
4. 注册请求/响应监听器
5. 开始执行自动化任务
## 常见问题
### 1. 浏览器无法连接
确保浏览器进程已启动:
```bash
pnpm run browser # 生产模式
# 或
pnpm run dev:browser # 开发模式
```
### 2. 数据库初始化失败
运行数据库迁移:
```bash
pnpm run push
```
### 3. 搜索无结果
检查搜索关键词是否正确,确保小红书搜索页面能正常访问。
## License
MIT

View File

@@ -11,9 +11,10 @@ if (!fs.existsSync(dir)) {
export default {
schema: './src/db/schema.ts',
out: './storage/browser-helper/drizzle',
out: './src/db/drizzle',
dialect: 'sqlite',
dbCredentials: {
url: process.env.DATABASE_URL || 'storage/browser-helper/data.sqlite3',
},
strict: false,
} satisfies Config;

84
examples/xhs/feed.json Normal file
View File

@@ -0,0 +1,84 @@
{
"cursor_score": "",
"items": [
{
"id": "692d2c3c000000000d035cfd",
"model_type": "note",
"note_card": {
"title": "怎么让自己一台设备控制另一台电脑快捷键?",
"user": {
"user_id": "6726cef4000000001c019303",
"nickname": "小熊猫呜呜呜",
"avatar": "https://sns-avatar-qc.xhscdn.com/avatar/1040g2jo31nptpebimm605pp6prq734o33ikhhig",
"xsec_token": "ABgQHM5P-hKJbm3K-GuN6GGJGXQGIRQX2d4m2TmuGPW0Y="
},
"tag_list": [],
"at_user_list": [],
"time": 1764568124000,
"share_info": {
"un_share": false
},
"note_id": "692d2c3c000000000d035cfd",
"type": "normal",
"image_list": [
{
"file_id": "",
"width": 1440,
"info_list": [
{
"image_scene": "WB_PRV",
"url": "http://sns-webpic-qc.xhscdn.com/202512310146/01744c61858f30b8e306f79c3d17902d/1040g00831phpsi4p3c6g5pp6prq734o3ghs6t8g!nd_prv_wlteh_webp_3"
},
{
"url": "http://sns-webpic-qc.xhscdn.com/202512310146/166c6ac26933a554f17aefd009037ee7/1040g00831phpsi4p3c6g5pp6prq734o3ghs6t8g!nd_dft_wlteh_webp_3",
"image_scene": "WB_DFT"
}
],
"url_pre": "http://sns-webpic-qc.xhscdn.com/202512310146/01744c61858f30b8e306f79c3d17902d/1040g00831phpsi4p3c6g5pp6prq734o3ghs6t8g!nd_prv_wlteh_webp_3",
"height": 2400,
"url": "",
"trace_id": "",
"url_default": "http://sns-webpic-qc.xhscdn.com/202512310146/166c6ac26933a554f17aefd009037ee7/1040g00831phpsi4p3c6g5pp6prq734o3ghs6t8g!nd_dft_wlteh_webp_3",
"stream": {},
"live_photo": false
},
{
"url_default": "http://sns-webpic-qc.xhscdn.com/202512310146/a4c16164274aa5833d119d4b9d150da3/1040g00831phpsi4p3c605pp6prq734o3dmgotbo!nd_dft_wlteh_webp_3",
"file_id": "",
"url": "",
"url_pre": "http://sns-webpic-qc.xhscdn.com/202512310146/a26e9c3737fee7be0844121a6b83f897/1040g00831phpsi4p3c605pp6prq734o3dmgotbo!nd_prv_wlteh_webp_3",
"info_list": [
{
"image_scene": "WB_PRV",
"url": "http://sns-webpic-qc.xhscdn.com/202512310146/a26e9c3737fee7be0844121a6b83f897/1040g00831phpsi4p3c605pp6prq734o3dmgotbo!nd_prv_wlteh_webp_3"
},
{
"image_scene": "WB_DFT",
"url": "http://sns-webpic-qc.xhscdn.com/202512310146/a4c16164274aa5833d119d4b9d150da3/1040g00831phpsi4p3c605pp6prq734o3dmgotbo!nd_dft_wlteh_webp_3"
}
],
"stream": {},
"live_photo": false,
"height": 2400,
"width": 1440,
"trace_id": ""
}
],
"last_update_time": 1764568125000,
"ip_location": "浙江",
"desc": "",
"interact_info": {
"liked_count": "1",
"collected": false,
"collected_count": "1",
"comment_count": "0",
"share_count": "0",
"followed": false,
"relation": "none",
"liked": false
}
}
}
],
"current_time": 1767116763109
}

View File

@@ -17,11 +17,13 @@
"dev": "tsx watch src/index.ts",
"init:browser": "npx playwright install",
"build": "bun run bun.config.ts",
"browser": "pm2 start start-browser.js --name browser ",
"browser": "pm2 start start-browser.js --name browser ",
"dev:browser": "node start-browser.js ",
"cmd": "tsx src/test/cmd.ts ",
"init": "pnpm run init:pnpm && pnpm run init:db && pnpm run init:browser",
"init:pnpm": "pnpm approve-builds",
"init:db": "npx drizzle-kit push",
"push": "npx drizzle-kit push",
"studio": "npx drizzle-kit studio",
"drizzle:migrate": "npx drizzle-kit migrate",
"drizzle:push": "npx drizzle-kit push"
@@ -35,27 +37,36 @@
],
"author": "abearxiong <xiongxiao@xiongxiao.me> (https://www.xiongxiao.me)",
"license": "MIT",
"packageManager": "pnpm@10.26.0",
"packageManager": "pnpm@10.28.1",
"type": "module",
"dependencies": {
"better-sqlite3": "^12.6.2",
"nanoid": "^5.1.6",
"playwright": "^1.57.0",
"better-sqlite3": "^12.5.0"
"playwright-extra": "^4.3.6",
"playwright-extra-plugin-stealth": "^0.0.1",
"user-agents": "^1.1.669",
"zod": "^4.3.5",
"zod-to-json-schema": "^3.25.1"
},
"devDependencies": {
"@kevisual/code-builder": "^0.0.2",
"@kevisual/code-builder": "^0.0.3",
"@kevisual/context": "^0.0.4",
"@kevisual/router": "^0.0.49",
"@kevisual/types": "^0.0.10",
"@kevisual/use-config": "^1.0.21",
"@kevisual/js-filter": "^0.0.5",
"@kevisual/router": "^0.0.59",
"@types/better-sqlite3": "^7.6.13",
"@types/bun": "^1.3.5",
"@types/node": "^25.0.3",
"@kevisual/types": "^0.0.11",
"@kevisual/use-config": "^1.0.28",
"@types/bun": "^1.3.6",
"@types/node": "^25.0.9",
"@types/user-agents": "^1.0.4",
"commander": "^14.0.2",
"dotenv": "^17.2.3",
"drizzle-kit": "^0.31.8",
"drizzle-orm": "^0.45.1",
"es-toolkit": "^1.43.0",
"eventemitter3": "^5.0.1",
"lru-cache": "^11.2.4"
"es-toolkit": "^1.44.0",
"eventemitter3": "^5.0.4",
"lru-cache": "^11.2.4",
"puppeteer-extra-plugin-stealth": "^2.11.2"
}
}

814
pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,10 @@
# 浏览器自动化助手
> 信息收集工具,过滤一些烂的资料,通过ai去过滤一些无用的信息帮助用户更高效地收集和整理信息提升工作和学习效率。
实现功能,浏览了页面,自动把想要的数据,存储到数据库中,方便后续分析和使用。
## 初始化
```bash
@@ -9,8 +12,14 @@ pnpm install
pnpm run init
```
## 启动studio
## 启动 studio
studio 是 drizzle 提供的可视化数据库管理工具,可以用来查看和管理本地的 SQLite 数据库。
```bash
pnpm run studio
```
```
## 关于小红书模块
过滤自己想要的笔记,保存到本地数据库中,方便后续使用。

View File

@@ -1,14 +1,14 @@
import { App } from '@kevisual/router'
import { useConfigKey } from '@kevisual/context'
import { useContextKey } from '@kevisual/context'
import { useConfig } from '@kevisual/use-config'
import Database from 'better-sqlite3';
import { drizzle } from 'drizzle-orm/better-sqlite3';
import { Core } from './playwright/core.ts';
import * as schema from './db/schema.ts';
export { schema }
export const config = useConfig()
export const app = useConfigKey<App>('app', () => new App({
export const app = useContextKey<App>('app', () => new App({
serverOptions: {
cors: {
origin: '*',
@@ -27,14 +27,16 @@ app.route({
}
}).addTo(app);
export const db = useConfigKey('db', () => {
export const db = useContextKey('db', () => {
const sqlite = new Database(config.DATABASE_URL || 'storage/browser-helper/data.sqlite3');
sqlite.pragma('journal_mode = WAL');
const db = drizzle({ client: sqlite, schema });
return db;
})
export const core = useConfigKey<Core>('core', () => new Core({
export const core = useContextKey<Core>('core', () => new Core({
useDebugPort: true, // 不使用debugPort避免被网站检测
useCDPConnect: true, // 使用纯Playwright模式而不是CDP连接
listeners: [
{
path: "search/notes",
@@ -58,6 +60,12 @@ export const core = useConfigKey<Core>('core', () => new Core({
console.error('解析搜索笔记响应失败:', error);
}
}
},
{
/**
* 小红书笔记详情响应处理
*/
path: 'https://edith.xiaohongshu.com/api/sns/web/v1/feed',
}
]
}));

View File

@@ -0,0 +1,58 @@
CREATE TABLE `cache` (
`key` text PRIMARY KEY NOT NULL,
`value` text NOT NULL,
`expire_at` integer NOT NULL,
`created_at` integer NOT NULL
);
--> statement-breakpoint
CREATE TABLE `xhs_note` (
`id` text PRIMARY KEY NOT NULL,
`title` text,
`summary` text,
`description` text,
`link` text,
`data` text,
`tags` text,
`status` text,
`author_url` text,
`cover` text,
`sync_status` integer NOT NULL,
`sync_at` integer NOT NULL,
`star` integer,
`user_id` text,
`pushed_at` integer,
`created_at` integer NOT NULL,
`updated_at` integer NOT NULL,
`deleted_at` integer
);
--> statement-breakpoint
CREATE INDEX `idx_xhs_note_user_id` ON `xhs_note` (`user_id`);--> statement-breakpoint
CREATE INDEX `idx_xhs_note_tags` ON `xhs_note` (`tags`);--> statement-breakpoint
CREATE TABLE `xhs_user` (
`id` text PRIMARY KEY NOT NULL,
`user_id` text NOT NULL,
`xsec_token` text,
`username` text,
`nickname` text,
`avatar` text,
`title` text,
`summary` text,
`description` text,
`link` text,
`data` text,
`tags` text,
`bun_tags` text,
`followers_count` integer,
`following_count` integer,
`status` text,
`sync_status` integer DEFAULT 0 NOT NULL,
`sync_at` integer DEFAULT 0 NOT NULL,
`star` integer,
`created_at` integer DEFAULT 1767349555883 NOT NULL,
`updated_at` integer DEFAULT 1767349555883 NOT NULL,
`deleted_at` integer
);
--> statement-breakpoint
CREATE INDEX `idx_xhs_user_user_id` ON `xhs_user` (`user_id`);--> statement-breakpoint
CREATE INDEX `idx_xhs_user_tags` ON `xhs_user` (`tags`);--> statement-breakpoint
CREATE INDEX `idx_xhs_user_bun_tags` ON `xhs_user` (`bun_tags`);

View File

@@ -0,0 +1,397 @@
{
"version": "6",
"dialect": "sqlite",
"id": "6e34d9c0-5f26-4fcf-8f85-9de7832cd139",
"prevId": "00000000-0000-0000-0000-000000000000",
"tables": {
"cache": {
"name": "cache",
"columns": {
"key": {
"name": "key",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"value": {
"name": "value",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"expire_at": {
"name": "expire_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"xhs_note": {
"name": "xhs_note",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"summary": {
"name": "summary",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"link": {
"name": "link",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"data": {
"name": "data",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"tags": {
"name": "tags",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"status": {
"name": "status",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"author_url": {
"name": "author_url",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"cover": {
"name": "cover",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"sync_status": {
"name": "sync_status",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"sync_at": {
"name": "sync_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"star": {
"name": "star",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"user_id": {
"name": "user_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"pushed_at": {
"name": "pushed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"deleted_at": {
"name": "deleted_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
}
},
"indexes": {
"idx_xhs_note_user_id": {
"name": "idx_xhs_note_user_id",
"columns": [
"user_id"
],
"isUnique": false
},
"idx_xhs_note_tags": {
"name": "idx_xhs_note_tags",
"columns": [
"tags"
],
"isUnique": false
}
},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"xhs_user": {
"name": "xhs_user",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"user_id": {
"name": "user_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"xsec_token": {
"name": "xsec_token",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"username": {
"name": "username",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"nickname": {
"name": "nickname",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"avatar": {
"name": "avatar",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"summary": {
"name": "summary",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"link": {
"name": "link",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"data": {
"name": "data",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"tags": {
"name": "tags",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"bun_tags": {
"name": "bun_tags",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"followers_count": {
"name": "followers_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"following_count": {
"name": "following_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"status": {
"name": "status",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"sync_status": {
"name": "sync_status",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": 0
},
"sync_at": {
"name": "sync_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": 0
},
"star": {
"name": "star",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": 1767349555883
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": 1767349555883
},
"deleted_at": {
"name": "deleted_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
}
},
"indexes": {
"idx_xhs_user_user_id": {
"name": "idx_xhs_user_user_id",
"columns": [
"user_id"
],
"isUnique": false
},
"idx_xhs_user_tags": {
"name": "idx_xhs_user_tags",
"columns": [
"tags"
],
"isUnique": false
},
"idx_xhs_user_bun_tags": {
"name": "idx_xhs_user_bun_tags",
"columns": [
"bun_tags"
],
"isUnique": false
}
},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
}
},
"views": {},
"enums": {},
"_meta": {
"schemas": {},
"tables": {},
"columns": {}
},
"internal": {
"indexes": {}
}
}

View File

@@ -0,0 +1,13 @@
{
"version": "7",
"dialect": "sqlite",
"entries": [
{
"idx": 0,
"version": "6",
"when": 1767349555897,
"tag": "0000_rapid_genesis",
"breakpoints": true
}
]
}

View File

@@ -1,5 +1,5 @@
import { sqliteTable, text, integer } from 'drizzle-orm/sqlite-core';
import { sqliteTable, text, integer, index } from 'drizzle-orm/sqlite-core';
import { randomUUID } from 'node:crypto';
export const cache = sqliteTable('cache', {
key: text('key').primaryKey(),
value: text('value').notNull(),
@@ -9,16 +9,75 @@ export const cache = sqliteTable('cache', {
export const xhsNote = sqliteTable('xhs_note', {
id: text('id').primaryKey(),
content: text('content').notNull(),
title: text('title'),
summary: text('summary'),
description: text('description'),
tags: text('tags').notNull(),
noteUrl: text('note_url'),
status: text('status'),
link: text('link'),
data: text('data'),
tags: text('tags'),
status: text('status'), // 正常笔记,归档,禁止用户,删除,不相关
authorUrl: text('author_url'),
cover: text('cover'),
syncStatus: integer('sync_status').notNull(),
syncAt: integer('sync_at').notNull(),
star: integer('star'),
userId: text('user_id'),
pushedAt: integer('pushed_at'),
createdAt: integer('created_at').notNull(),
updatedAt: integer('updated_at').notNull(),
});
deletedAt: integer('deleted_at'),
}, (table) => ([
index('idx_xhs_note_user_id').on(table.userId),
index('idx_xhs_note_tags').on(table.tags),
]));
export const xhsUser = sqliteTable('xhs_user', {
id: text('id').primaryKey(),
xsec_token: text('xsec_token'),
username: text('username'),
nickname: text('nickname'),
avatar: text('avatar'),
title: text('title'),
summary: text('summary'),
description: text('description'),
link: text('link'),
data: text('data'),
tags: text('tags'),
bunTags: text('bun_tags'),
followersCount: integer('followers_count'),
followingCount: integer('following_count'),
status: text('status'), // 笔记用户(从笔记中添加,没有获取具体详情) 正常用户,封禁,已删除
syncStatus: integer('sync_status').default(0).notNull(),
syncAt: integer('sync_at').default(0).notNull(),
star: integer('star'), // 标记
createdAt: integer('created_at').default(Date.now()).notNull(),
updatedAt: integer('updated_at').default(Date.now()).notNull(),
deletedAt: integer('deleted_at'),
}, (table) => ([
index('idx_xhs_user_id').on(table.id),
index('idx_xhs_user_tags').on(table.tags),
index('idx_xhs_user_bun_tags').on(table.bunTags),
]));
export const xhsTags = sqliteTable('xhs_tags', {
id: text('id').primaryKey().default(randomUUID()),
title: text('title').notNull(),
description: text('description'),
createdAt: integer('created_at').default(Date.now()).notNull(),
updatedAt: integer('updated_at').default(Date.now()).notNull(),
}, (table) => ([
index('idx_xhs_tags_title').on(table.title),
]));

View File

@@ -3,13 +3,16 @@ export { app, db } from './app.ts';
export { Core } from './playwright/core.ts';
import './routes/index.ts';
// 如果是直接运行,则启动应用
// better-sqlite3 不支持 bun
// playwright 也不支持 bun
import { createRouterAgentPluginFn } from '@kevisual/router/src/opencode.ts'
app.route({
id: 'auth',
description: 'Token 权限验证,临时方案',
}).define(async (ctx) => {
// token authentication
// console.log('token', ctx.state);
ctx.state.token = 'abc';
}).addTo(app);
const isPm2 = !!process.env.PM2_HOME;
if (import.meta.main || isPm2) {
@@ -17,4 +20,9 @@ if (import.meta.main || isPm2) {
app.listen(52000, () => {
console.log('Application is running on http://localhost:52000');
})
}
}
export const browserAgentPlugin = createRouterAgentPluginFn({
// router: app.,
router: app,
})

View File

@@ -1,6 +1,8 @@
import { spawn } from 'node:child_process';
import path from 'node:path';
import fs from 'node:fs';
import UserAgent from 'user-agents';
import { chromium } from 'playwright';
export const getExecutablePath = () => {
// 根据不同平台返回 Chrome 的可执行文件路径
@@ -25,30 +27,44 @@ export const getExecutablePath = () => {
*
* 启动 Chrome 浏览器,带远程调试端口
* 注意:需要手动登录账号和安装插件
*
* @returns {Promise<void>}
*/
export const main = async (opts?: {
executablePath?: string;
userDataDir?: string;
debugPort?: number;
kiosk?: boolean;
headless?: boolean;
}) => {
// Chrome 路径和配置
const executablePath = opts?.executablePath || getExecutablePath();
let executablePath = opts?.executablePath || getExecutablePath();
// 使用独立的用户数据目录,避免与 Chrome 冲突
const userDataDir = opts?.userDataDir || path.join(process.cwd(), 'browser-context');
const debugPort = opts?.debugPort || 9223;
const headless = opts?.headless || false;
console.log('启动 Chrome...', executablePath);
console.log(`端口: ${debugPort}`);
console.log(`用户数据目录: ${userDataDir}`);
// console.log('注意:需要手动登录账号和安装插件');
console.log(`无头模式: ${headless}`);
// const userAgent = new UserAgent().toString();
const params = [
`--remote-debugging-port=${debugPort}`,
`--user-data-dir=${userDataDir}`,
// '--kiosk', // 全屏模式,无修改边框
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--no-first-run',
// `--user-agent=${userAgent}`,
];
// 如果需要无头模式,添加额外参数
if (headless) {
params.push(
'--headless',
'--window-size=1920,1080',
);
}
console.log('启动参数:', params);
if (opts?.kiosk) {
params.push('--kiosk'); // 全屏模式,无修改边框
@@ -62,13 +78,12 @@ export const main = async (opts?: {
return;
}
// 检查 Chrome 可执行文件是否存在
// 检查 Chrome 可执行文件是否存在,不存在则使用 Playwright 的浏览器
if (!fs.existsSync(executablePath)) {
console.error('Chrome 可执行文件不存在:', executablePath);
return;
console.log('Chrome 可执行文件不存在,使用 Playwright 的浏览器');
executablePath = chromium.executablePath();
}
// 启动 Chrome带远程调试端口
const chromeProcess = spawn(
executablePath,

View File

@@ -1,9 +1,10 @@
import { chromium, Page, BrowserContext, Browser, CDPSession, Request } from 'playwright';
import { execSync } from 'node:child_process';
import path from 'node:path';
import { EventEmitter } from 'eventemitter3'
const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));
import { main } from "./browser.ts";
import { stealthMode } from './stealth/index.ts';
type RequestObject = {
url: string;
path: string;
@@ -27,12 +28,16 @@ export class Core<T = {}> {
page: Page | null = null;
debugPort = 9223;
debugHost = '127.0.0.1';
headless = false;
useDebugPort = false; // 默认不使用debugPort以避免检测
useCDPConnect = false; // 是否使用CDP连接而不是纯Playwright
status: 'disconnected' | 'connecting' | 'connected' | 'failed' = 'disconnected';
emitter = new EventEmitter();
listeners: Listener[] = [];
recordReady: boolean = false;
timer: NodeJS.Timeout | null = null;
data: T | null = null;
constructor(opts?: { debugPort?: number, debugHost?: string, listeners?: Listener[] }) {
constructor(opts?: { debugPort?: number, debugHost?: string, listeners?: Listener[], headless?: boolean, useDebugPort?: boolean, useCDPConnect?: boolean }) {
if (opts?.debugPort) {
this.debugPort = opts.debugPort;
}
@@ -42,9 +47,16 @@ export class Core<T = {}> {
if (opts?.listeners) {
this.listeners = opts.listeners;
}
if (opts?.headless !== undefined) {
this.headless = opts.headless;
}
if (opts?.useDebugPort !== undefined) {
this.useDebugPort = opts.useDebugPort;
}
this.useCDPConnect = opts?.useCDPConnect || true;
}
async createBrowser() {
await main({ debugPort: this.debugPort });
const chrome = await main({ debugPort: this.debugPort, headless: this.headless });
}
async init() {
const debugPort = this.debugPort;
@@ -57,11 +69,14 @@ export class Core<T = {}> {
this.browser = browser;
this.browserContext = browser.contexts()[0];
this.handleRequest(this.browserContext);
this.page = this.browserContext.pages()[0] || await this.browserContext.newPage();
// 创建全新的空白页面
this.page = await this.browserContext.newPage()
// await this.stealthMode(this.page);
this.emitter.emit('connected');
return;
} catch (error: any) {
throw new Error(`无法连接到 Chrome CDP端口 ${debugPort} 可能未正确启动: ${(error as Error).message.slice(0, 100)}`);
throw new Error(`无法连接到浏览器,错误: ${(error as Error).message.slice(0, 100)}`);
}
}
async connect() {
@@ -121,11 +136,24 @@ export class Core<T = {}> {
return this.page!;
}
throw new Error('无法连接到浏览器实例');
}
async setReady(ready: boolean = true) {
if (this.recordReady !== ready) {
this.recordReady = ready;
}
if (ready === true) {
this.timer && clearTimeout(this.timer);
const that = this;
this.timer = setTimeout(() => {
that.recordReady = false;
that.timer = null;
console.log('记录超时,已自动设置为未就绪状态');
}, 5 * 60 * 1000); // 5分钟后自动设置为未就绪, 防止长时间占用资源
} else {
this.timer && clearTimeout(this.timer);
this.timer = null;
}
}
async setData(data?: any) {
if (!data) {
@@ -134,6 +162,9 @@ export class Core<T = {}> {
}
this.data = data;
}
async stealthMode(page: Page) {
await stealthMode(page);
}
async handleRequest(context: BrowserContext) {
context.on('request', request => {
const url = request.url();
@@ -153,7 +184,7 @@ export class Core<T = {}> {
context.on('response', async response => {
const url = response.url();
const recordReady = this.recordReady;
// console.log('Response URL:', url);
for (let listener of this.listeners) {
const type = listener.type || 'both';
if (type === 'request') continue;
@@ -162,6 +193,7 @@ export class Core<T = {}> {
console.log('记录未就绪,跳过响应处理');
return
}
console.log(`捕获到响应: ${url}`);
try {
const status = response.status();
const contentType = response.headers()['content-type'] || '';

View File

@@ -0,0 +1,227 @@
import { Page } from 'playwright';
export const stealthMode = async (page: Page) => {
const stealthScript = `
() => {
// 1. 隐藏webdriver属性最重要的检测点
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
configurable: true,
});
// 2. 隐藏Chrome automation特征
// 某些网站通过检查特定的Chrome API来判断是否被自动化
if (!window.chrome) {
window.chrome = {};
}
window.chrome.runtime = window.chrome.runtime || {};
// 移除可能暴露automation的chrome属性
delete window.chrome.i18n;
delete window.__selenium_evaluate;
delete window.__webdriver_evaluate;
delete window._Selenium_IDE_Recorder;
delete window._selenium;
delete window.callPhantom;
delete window._phantom;
// 3. 隐藏自动化工具标志
Object.defineProperty(navigator, 'userAgentData', {
get: () => ({
brands: [
{ brand: 'Not A(Brand', version: '99' },
{ brand: 'Google Chrome', version: '120' },
{ brand: 'Chromium', version: '120' }
],
mobile: false,
platform: 'Windows',
platformVersion: '10.0'
}),
configurable: true,
});
// 4. 隐藏permissions API中的automation特征
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
// ===== 以下为其他反检测特征 =====
// 伪造chrome对象
window.chrome = window.chrome || {
runtime: {},
loadTimes: function() {},
csi: function() {},
app: {}
};
// 隐藏plugins
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
// 设置正常的languages
Object.defineProperty(navigator, 'languages', {
get: () => ['zh-CN', 'zh', 'en'],
});
// 伪造硬件信息
Object.defineProperty(navigator, 'hardwareConcurrency', {
get: () => 8,
});
Object.defineProperty(navigator, 'deviceMemory', {
get: () => 8,
});
// 修改Canvas指纹
const originalGetContext = HTMLCanvasElement.prototype.getContext;
HTMLCanvasElement.prototype.getContext = function(type) {
const context = originalGetContext.apply(this, arguments);
if (type === '2d' && context) {
const originalGetImageData = context.getImageData;
context.getImageData = function() {
const imageData = originalGetImageData.apply(this, arguments);
for (let i = 0; i < imageData.data.length; i += 4) {
imageData.data[i] = imageData.data[i] + Math.random() * 0.1 - 0.05;
}
return imageData;
};
}
return context;
};
// 伪造网络状况
Object.defineProperty(navigator, 'connection', {
get: () => ({
effectiveType: '4g',
rtt: 100,
downlink: 10,
}),
});
// 伪造电池信息
window.navigator.getBattery = () => Promise.resolve({
charging: true,
chargingTime: 0,
dischargingTime: Infinity,
level: 1,
});
// 隐藏触摸点
Object.defineProperty(navigator, 'maxTouchPoints', {
get: () => 0,
});
// 修改toDataURL
const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function(type) {
if (type === 'image/webp' || type === 'image/jpeg') {
return originalToDataURL.apply(this, arguments);
}
const context = this.getContext('2d');
if (context) {
const imageData = context.getImageData(0, 0, this.width, this.height);
for (let i = 0; i < imageData.data.length; i += 4) {
imageData.data[i] += Math.floor(Math.random() * 3) - 1;
}
context.putImageData(imageData, 0, 0);
}
return originalToDataURL.apply(this, arguments);
};
// 隐藏CDP/DevTools检测
window.addEventListener('beforeunload', function(e) {
// 防止某些网站通过beforeunload检测到automation
}, true);
// 隐藏chrome.debugger API
if (window.chrome && window.chrome.runtime) {
window.chrome.runtime.sendMessage = undefined;
}
// 重写toString方法隐藏native code标记
const nativeToString = Function.prototype.toString;
Function.prototype.toString = function() {
const str = nativeToString.call(this);
if (str.includes('[native code]')) {
return 'function() { [native code] }';
}
return str;
};
// 隐藏devtools打开检测
let devtools = { open: false, orientation: null };
const threshold = 160;
setInterval(() => {
if (window.outerHeight - window.innerHeight > threshold ||
window.outerWidth - window.innerWidth > threshold) {
if (!devtools.open) {
devtools.open = true;
}
} else {
if (devtools.open) {
devtools.open = false;
}
}
}, 500);
// 防止网站通过port检测
Object.defineProperty(window, '__REMOTE_DEBUGGER_PORT__', {
get: () => undefined,
set: () => {},
configurable: true
});
// 隐藏Playwright特征
Object.defineProperty(navigator, 'vendor', {
get: () => 'Google Inc.',
});
Object.defineProperty(navigator, 'platform', {
get: () => 'Win32',
});
Object.defineProperty(navigator, 'userAgent', {
get: () => {
const ua = navigator.userAgent || '';
return ua.replace(/HeadlessChrome/, 'Chrome').replace(/Playwright/, '');
},
});
// 禁用performance.measure在CDP中的表现
if (window.performance && window.performance.measure) {
const originalMeasure = window.performance.measure;
window.performance.measure = function() {
return originalMeasure.apply(this, arguments);
};
}
// 隐藏其他自动化工具标志
Object.defineProperty(window, '__nightmare', {
get: () => undefined,
set: () => {},
configurable: true,
});
Object.defineProperty(window, '__puppeteer__', {
get: () => undefined,
set: () => {},
configurable: true,
});
// 防止通过postMessage检测
const originalPostMessage = window.postMessage;
window.postMessage = function(message, origin) {
if (typeof message === 'object' && message.type === 'WEB_DRIVER') {
return;
}
return originalPostMessage.apply(this, arguments);
};
}
`;
await page.addInitScript(stealthScript);
}

View File

@@ -3,23 +3,24 @@ import { app, core, db } from '../../app.ts';
app.route({
path: 'good',
key: 'searchInfo',
description: '搜索小红书今日热门信息差内容。支持自定义关键词,参数keyword(字符串)可选,默认搜索"信息差"',
description: '搜索小红书今日热门信息差内容。参数keyword默认搜索"信息差"',
middleware: ['auth'],
metadata: {
tags: ['小红书', '信息差', '热门'],
icon: 'search',
}
}).define(async (ctx) => {
const keyword = ctx.query?.keyword as string || '信息差';
const { keyword = '信息差', ...rest } = ctx.query;
const res = await app.run({
path: 'xhs',
key: 'search-notes',
payload: {
keyword: keyword,
scrollTimes: 5,
...rest,
token: ctx.query?.token as string,
}
})
}, ctx)
ctx.forward(res)
}).addTo(app);
@@ -27,22 +28,93 @@ app.route({
app.route({
path: 'good',
key: 'searchWork',
description: '搜索小红书今日工作机会与招聘信息。支持自定义关键词搜索,默认搜索"工作 杭州"',
description: '搜索小红书今日工作机会与招聘信息。参数是keyword,默认搜索"工作 杭州"',
middleware: ['auth'],
metadata: {
tags: ['小红书', '工作', '招聘'],
icon: 'search',
}
}).define(async (ctx) => {
const keyword = ctx.query?.keyword as string || '工作 杭州';
const { keyword = '工作 杭州', ...rest } = ctx.query;
const res = await app.run({
path: 'xhs',
key: 'search-notes',
payload: {
keyword: keyword,
scrollTimes: 5,
...rest,
token: ctx.query?.token as string,
}
})
}, ctx)
ctx.forward(res)
}).addTo(app);
app.route({
path: 'good',
key: 'searchDate',
description: '搜索小红书今日交友信息。参数是keyword默认搜索"相亲 杭州"',
middleware: ['auth'],
metadata: {
tags: ['小红书', '约会', '交友', '相亲'],
}
}).define(async (ctx) => {
const { keyword = '相亲 杭州', ...rest } = ctx.query;
const res = await app.run({
path: 'xhs',
key: 'search-notes',
payload: {
keyword: keyword,
scrollTimes: 10,
...rest,
token: ctx.query?.token as string,
}
}, ctx)
ctx.forward(res)
}).addTo(app);
app.route({
path: 'good',
key: 'searchBean',
description: '搜索小红书的拼豆参数是keyword默认搜索"拼豆"',
middleware: ['auth'],
metadata: {
tags: ['小红书', '拼豆'],
}
}).define(async (ctx) => {
const { keyword = '拼豆', ...rest } = ctx.query;
const res = await app.run({
path: 'xhs',
key: 'search-notes',
payload: {
keyword: keyword,
scrollTimes: 10,
...rest,
token: ctx.query?.token as string,
}
}, ctx)
ctx.forward(res)
}).addTo(app);
// 调用 path: good key: searchTemplate
app.route({
path: 'good',
key: 'searchTemplate',
description: '搜索小红书的模板相关参数是keyword默认搜索"网站模板"',
middleware: ['auth'],
metadata: {
tags: ['小红书', '网站模板'],
}
}).define(async (ctx) => {
const { keyword = '网站模板', ...rest } = ctx.query;
const res = await app.run({
path: 'xhs',
key: 'search-notes',
payload: {
keyword: keyword,
scrollTimes: 10,
...rest,
token: ctx.query?.token as string,
}
}, ctx)
ctx.forward(res)
}).addTo(app);

View File

@@ -1 +1,4 @@
import './search-notes.ts';
import './search-notes.ts';
import './xhs-list.ts';
import './xhs-user-list.ts';
import './xhs-tags-list.ts';

View File

@@ -1,6 +1,6 @@
import { xhsNote } from '@/db/schema.ts';
import { xhsNote, xhsUser, xhsTags } from '@/db/schema.ts';
import { app, core, db } from '../../app.ts';
import { sql } from 'drizzle-orm';
import { sql, eq } from 'drizzle-orm';
const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));
import { Page } from 'playwright';
import { Core } from '@/playwright/core.ts';
@@ -79,6 +79,7 @@ const hoverPickerExample = async (page: Page, opts?: HoverPickerOptions) => {
}
}
}
await sleep(2000); // 等待2秒以确保筛选生效
// 将鼠标移到页面外,移除 hover 状态
await page.mouse.move(0, 0);
console.log('已移除 hover 状态');
@@ -138,11 +139,16 @@ app.route({
console.log(`导航到搜索页面: ${url.toString()}`);
await sleep(3000); // 等待页面加载
}
const keyword = query.keyword as string;
let keyword = query.keyword as string || '';
keyword = keyword.trim();
if (!keyword) {
ctx.throw(400, '缺少 keyword 参数');
}
// 存储关键词到 core 的 data 中,供响应处理使用
sessionCache.set('xhs-search-keyword', keyword);
await hoverPickerExample(page, {
keyword: query.keyword as string,
keyword: keyword as string,
pushTime: (query.pushTime as '一天内' | '一周内' | '半年内') || '一天内',
sort: (query.sort as '综合' | '最新' | '最多点赞' | '最多评论') || '最新',
distance: (query.distance as '不限' | '同城' | '附近') || '不限',
@@ -156,7 +162,7 @@ app.route({
app.route({
path: 'xhs',
key: 'save-search-notes',
description: '保存搜索笔记结果',
description: '保存搜索笔记结果, 浏览器自动化完成搜索后调用此接口保存结果。',
middleware: ['auth'],
metadata: {
tags: ['小红书', '搜索', '保存'],
@@ -164,48 +170,129 @@ app.route({
}
}).define(async (ctx) => {
const data = ctx.query!.data as XHS.SearchNote[];
if (!data || !Array.isArray(data)) {
ctx.throw(400, '缺少有效的 data 参数');
}
try {
const getNoteUrl = (note: XHS.SearchNote) => {
const id = note.id;
const secToken = note.xsec_token;
return `https://www.xiaohongshu.com/explore/${id}?xsec_token=${secToken}`
}
const getUserUrl = (note: XHS.SearchNote) => {
const getUser = (note: XHS.SearchNote) => {
const user = note.note_card?.user;
const id = user?.user_id;
const secToken = user?.xsec_token;
if (user) {
return `https://www.xiaohongshu.com/user/profile/${id}?xsec_token=${secToken}`
return {
user: user,
link: `https://www.xiaohongshu.com/user/profile/${id}?xsec_token=${secToken}`
}
}
return ``
return { user: null, link: '' }
}
const getCover = (note: XHS.SearchNote) => {
const cover = note.note_card?.cover
return cover?.url_default || ''
}
const keyword = sessionCache.get('xhs-search-keyword');
const notes = data.filter(note => note.model_type === 'note').map(note => ({
id: note.id,
content: JSON.stringify(note),
description: keyword || '',
title: note.note_card?.display_title || '',
authorUrl: getUserUrl(note),
tags: '',
syncStatus: 0,
noteUrl: getNoteUrl(note),
cover: getCover(note),
syncAt: 0,
createdAt: Date.now(),
updatedAt: Date.now(),
}));
const dataNotes = data.filter(note => note.model_type === 'note');
let notes = dataNotes.map(note => {
const cornnerTag = note.note_card?.corner_tag_info;
const pushTime = cornnerTag?.find(tag => tag.type === 'publish_time')?.text || '';
// 一天前 pushTime 包含 "前"
const user = getUser(note);
return {
id: note.id,
title: note.note_card?.display_title || '',
tags: '',
summary: '',
status: '正常笔记',
description: keyword || '',
link: getNoteUrl(note),
data: JSON.stringify({ note, keyword, user }),
cover: getCover(note),
authorUrl: user.link,
user_id: user.user?.user_id || '',
syncStatus: 0,
// pushedAt: 0,
syncAt: 0,
createdAt: Date.now(),
updatedAt: Date.now(),
}
});
let notesUser = dataNotes.map(note => {
const userData = getUser(note);
const user = userData.user;
if (!user) return null;
return {
id: user?.user_id || '',
nickname: user?.nickname || '',
avatar: user?.avatar || '',
status: '笔记用户',
link: userData.link,
xsec_token: user?.xsec_token || '',
data: JSON.stringify({ user }),
}
})
const userIds = notes.map(note => note.id).filter(id => id);
const userList = userIds.length > 0 ? await db.select().from(xhsUser).where(sql`id IN (${userIds.join(',')})`) : [];
// 如果用户表有bun的tags对关键字进行屏蔽对应的笔记默认打上禁止标签
for (const note of notes) {
const user = userList.find(u => u.id === note.user_id);
if (user) {
const bunTags = user.bunTags || '-';
if (bunTags.includes(keyword || '')) {
note.status = '禁止用户'; // 直接修改 notes 数组中的对象
}
}
}
if (notes.length === 0) {
console.log('没有笔记需要保存');
ctx.body = { success: true, message: '没有笔记需要保存' };
return;
}
await db.insert(xhsNote).values(notes).onConflictDoUpdate({
target: xhsNote.id,
set: {
content: sql`excluded.content`,
summary: sql`excluded.summary`,
cover: sql`excluded.cover`,
status: sql`excluded.status`,
data: sql`excluded.data`,
link: sql`excluded.link`,
description: sql`excluded.description`,
authorUrl: sql`excluded.author_url`,
updatedAt: Date.now(),
},
}).execute();
console.log(`已保存 ${data.length} 条搜索笔记结果`);
// 保存用户信息,去重
const uniqueUsers = Array.from(new Map(notesUser.filter(u => u !== null).map(u => [u!.id, u!])).values());
if (uniqueUsers.length > 0) {
await db.insert(xhsUser).values(uniqueUsers).onConflictDoUpdate({
target: xhsUser.id,
set: {
nickname: sql`excluded.nickname`,
avatar: sql`excluded.avatar`,
},
}).execute();
console.log(`已保存 ${uniqueUsers.length} 条用户信息`);
}
// 检查 keyword 是否存在于 xhsTags 的 title 中,如果不存在则添加
if (keyword) {
const existingTag = await db.select().from(xhsTags).where(eq(xhsTags.title, keyword)).limit(1);
if (existingTag.length === 0) {
await db.insert(xhsTags).values({
title: keyword,
description: `来自搜索页面的关键词: ${keyword}`,
}).execute();
console.log(`已添加新的标签: ${keyword}`);
} else {
console.log(`标签已存在: ${keyword}`);
}
}
} catch (error) {
console.error('保存搜索笔记结果时出错:', error);
}

142
src/routes/xhs/xhs-list.ts Normal file
View File

@@ -0,0 +1,142 @@
import { desc, eq, count, or, like, and } from 'drizzle-orm';
import { schema, app, db } from '@/app.ts'
const xhsNote = schema.xhsNote;
app.route({
path: 'xhs',
key: 'list',
middleware: ['auth'],
description: '获取小红书笔记列表',
metadata: {
tags: ['小红书', '笔记'],
}
}).define(async (ctx) => {
const { page = 1, pageSize = 20, search, sort = 'DESC' } = ctx.query || {};
const offset = (page - 1) * pageSize;
const orderByField = sort === 'ASC' ? xhsNote.updatedAt : desc(xhsNote.updatedAt);
let whereCondition = undefined;
if (search) {
whereCondition = or(
like(xhsNote.title, `%${search}%`),
like(xhsNote.summary, `%${search}%`),
like(xhsNote.description, `%${search}%`)
);
}
const [list, totalCount] = await Promise.all([
db.select()
.from(xhsNote)
.where(whereCondition)
.limit(pageSize)
.offset(offset)
.orderBy(orderByField),
db.select({ count: count() })
.from(xhsNote)
.where(whereCondition)
]);
ctx.body = {
list,
pagination: {
page,
current: page,
pageSize,
total: totalCount[0]?.count || 0,
},
};
return ctx;
}).addTo(app);
const noteUpdate = `创建或更新一个小红书笔记, 参数定义:
title: 笔记标题, 必填
summary: 笔记摘要, 选填
description: 笔记描述, 选填
tags: 标签数组, 选填
data: 笔记数据, 对象, 选填
`;
app.route({
path: 'xhs',
key: 'update',
middleware: ['auth'],
description: noteUpdate,
metadata: {
tags: ['小红书', '笔记'],
}
}).define(async (ctx) => {
const { id, createdAt, updatedAt, ...rest } = ctx.query.data || {};
let note;
if (!id) {
note = await db.insert(xhsNote).values({
id: rest.id || `note_${Date.now()}`,
title: rest.title || '',
description: rest.description || '',
summary: rest.summary || '',
tags: rest.tags ? JSON.stringify(rest.tags) : null,
link: rest.link || '',
data: rest.data ? JSON.stringify(rest.data) : null,
syncStatus: 1,
syncAt: Date.now(),
createdAt: Date.now(),
updatedAt: Date.now(),
}).returning();
} else {
const existing = await db.select().from(xhsNote).where(eq(xhsNote.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的笔记');
}
note = await db.update(xhsNote).set({
title: rest.title,
description: rest.description,
summary: rest.summary,
tags: rest.tags ? JSON.stringify(rest.tags) : undefined,
link: rest.link,
data: rest.data ? JSON.stringify(rest.data) : undefined,
updatedAt: Date.now(),
}).where(eq(xhsNote.id, id)).returning();
}
ctx.body = note;
}).addTo(app);
app.route({
path: 'xhs',
key: 'delete',
middleware: ['auth'],
description: '删除小红书笔记, 参数: data.id 笔记ID',
metadata: {
tags: ['小红书', '笔记'],
}
}).define(async (ctx) => {
const { id } = ctx.query.data || {};
if (!id) {
ctx.throw(400, 'id 参数缺失');
}
const existing = await db.select().from(xhsNote).where(eq(xhsNote.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的笔记');
}
await db.delete(xhsNote).where(eq(xhsNote.id, id));
ctx.body = { success: true };
}).addTo(app);
app.route({
path: 'xhs',
key: 'get',
middleware: ['auth'],
description: '获取单个小红书笔记, 参数: data.id 笔记ID',
metadata: {
tags: ['小红书', '笔记'],
}
}).define(async (ctx) => {
const { id } = ctx.query.data || {};
if (!id) {
ctx.throw(400, 'id 参数缺失');
}
const existing = await db.select().from(xhsNote).where(eq(xhsNote.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的笔记');
}
ctx.body = existing[0];
}).addTo(app);

View File

@@ -0,0 +1,124 @@
import { desc, eq, count, like } from 'drizzle-orm';
import { schema, app, db } from '@/app.ts'
const xhsTags = schema.xhsTags;
app.route({
path: 'xhs-tags',
key: 'list',
middleware: ['auth'],
description: '获取小红书标签列表',
metadata: {
tags: ['小红书', '标签'],
}
}).define(async (ctx) => {
const { page = 1, pageSize = 20, search, sort = 'DESC' } = ctx.query || {};
const offset = (page - 1) * pageSize;
const orderByField = sort === 'ASC' ? xhsTags.updatedAt : desc(xhsTags.updatedAt);
let whereCondition = undefined;
if (search) {
whereCondition = like(xhsTags.title, `%${search}%`);
}
const [list, totalCount] = await Promise.all([
db.select()
.from(xhsTags)
.where(whereCondition)
.limit(pageSize)
.offset(offset)
.orderBy(orderByField),
db.select({ count: count() })
.from(xhsTags)
.where(whereCondition)
]);
ctx.body = {
list,
pagination: {
page,
current: page,
pageSize,
total: totalCount[0]?.count || 0,
},
};
return ctx;
}).addTo(app);
const tagUpdate = `创建或更新一个小红书标签, 参数定义:
title: 标签标题, 必填
description: 标签描述, 选填
`;
app.route({
path: 'xhs-tags',
key: 'update',
middleware: ['auth'],
description: tagUpdate,
metadata: {
tags: ['小红书', '标签'],
}
}).define(async (ctx) => {
const { id, createdAt, updatedAt, ...rest } = ctx.query.data || {};
let tag;
if (!id) {
tag = await db.insert(xhsTags).values({
title: rest.title || '',
description: rest.description || '',
createdAt: Date.now(),
updatedAt: Date.now(),
}).returning();
} else {
const existing = await db.select().from(xhsTags).where(eq(xhsTags.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的标签');
}
tag = await db.update(xhsTags).set({
title: rest.title,
description: rest.description,
updatedAt: Date.now(),
}).where(eq(xhsTags.id, id)).returning();
}
ctx.body = tag;
}).addTo(app);
app.route({
path: 'xhs-tags',
key: 'delete',
middleware: ['auth'],
description: '删除小红书标签, 参数: data.id 标签ID',
metadata: {
tags: ['小红书', '标签'],
}
}).define(async (ctx) => {
const { id } = ctx.query.data || {};
if (!id) {
ctx.throw(400, 'id 参数缺失');
}
const existing = await db.select().from(xhsTags).where(eq(xhsTags.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的标签');
}
await db.delete(xhsTags).where(eq(xhsTags.id, id));
ctx.body = { success: true };
}).addTo(app);
app.route({
path: 'xhs-tags',
key: 'get',
middleware: ['auth'],
description: '获取单个小红书标签, 参数: data.id 标签ID',
metadata: {
tags: ['小红书', '标签'],
}
}).define(async (ctx) => {
const { id } = ctx.query.data || {};
if (!id) {
ctx.throw(400, 'id 参数缺失');
}
const existing = await db.select().from(xhsTags).where(eq(xhsTags.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的标签');
}
ctx.body = existing[0];
}).addTo(app);

View File

@@ -0,0 +1,152 @@
import { desc, eq, count, or, like } from 'drizzle-orm';
import { schema, app, db } from '@/app.ts'
const xhsUser = schema.xhsUser;
app.route({
path: 'xhs-users',
key: 'list',
middleware: ['auth'],
description: `获取小红书用户列表, 参数说明:
page: 页码默认1
pageSize: 每页数量默认20
search: 搜索关键词,模糊匹配昵称、用户名和描述
sort: 排序方式ASC或DESC默认DESC按更新时间降序
`,
metadata: {
tags: ['小红书', '用户'],
}
}).define(async (ctx) => {
const { page = 1, pageSize = 20, search, sort = 'DESC' } = ctx.query || {};
const offset = (page - 1) * pageSize;
const orderByField = sort === 'ASC' ? xhsUser.updatedAt : desc(xhsUser.updatedAt);
let whereCondition = undefined;
if (search) {
whereCondition = or(
like(xhsUser.nickname, `%${search}%`),
like(xhsUser.username, `%${search}%`),
like(xhsUser.description, `%${search}%`)
);
}
const [list, totalCount] = await Promise.all([
db.select()
.from(xhsUser)
.where(whereCondition)
.limit(pageSize)
.offset(offset)
.orderBy(orderByField),
db.select({ count: count() })
.from(xhsUser)
.where(whereCondition)
]);
ctx.body = {
list,
pagination: {
page,
current: page,
pageSize,
total: totalCount[0]?.count || 0,
},
};
return ctx;
}).addTo(app);
const userUpdate = `创建或更新一个小红书用户, 参数定义:
nickname: 用户昵称, 必填
username: 用户名, 选填
avatar: 用户头像, 选填
description: 用户描述, 选填
tags: 标签数组, 选填
data: 用户数据, 对象, 选填
`;
app.route({
path: 'xhs-users',
key: 'update',
middleware: ['auth'],
description: userUpdate,
metadata: {
tags: ['小红书', '用户'],
}
}).define(async (ctx) => {
const { id, createdAt, updatedAt, ...rest } = ctx.query.data || {};
let user;
if (!id) {
user = await db.insert(xhsUser).values({
id: rest.id || `user_${Date.now()}`,
nickname: rest.nickname || '',
username: rest.username || '',
avatar: rest.avatar || '',
description: rest.description || '',
summary: rest.summary || '',
tags: rest.tags ? JSON.stringify(rest.tags) : null,
link: rest.link || '',
data: rest.data ? JSON.stringify(rest.data) : null,
syncStatus: 1,
syncAt: Date.now(),
createdAt: Date.now(),
updatedAt: Date.now(),
}).returning();
} else {
const existing = await db.select().from(xhsUser).where(eq(xhsUser.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的用户');
}
user = await db.update(xhsUser).set({
nickname: rest.nickname,
username: rest.username,
avatar: rest.avatar,
description: rest.description,
summary: rest.summary,
tags: rest.tags ? JSON.stringify(rest.tags) : undefined,
link: rest.link,
data: rest.data ? JSON.stringify(rest.data) : undefined,
updatedAt: Date.now(),
}).where(eq(xhsUser.id, id)).returning();
}
ctx.body = user;
}).addTo(app);
app.route({
path: 'xhs-users',
key: 'delete',
middleware: ['auth'],
description: '删除小红书用户, 参数: data.id 用户ID',
metadata: {
tags: ['小红书', '用户'],
}
}).define(async (ctx) => {
const { id } = ctx.query.data || {};
if (!id) {
ctx.throw(400, 'id 参数缺失');
}
const existing = await db.select().from(xhsUser).where(eq(xhsUser.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的用户');
}
await db.delete(xhsUser).where(eq(xhsUser.id, id));
ctx.body = { success: true };
}).addTo(app);
app.route({
path: 'xhs-users',
key: 'get',
middleware: ['auth'],
description: '获取单个小红书用户, 参数: data.id 用户ID',
metadata: {
tags: ['小红书', '用户'],
}
}).define(async (ctx) => {
const { id } = ctx.query.data || {};
if (!id) {
ctx.throw(400, 'id 参数缺失');
}
const existing = await db.select().from(xhsUser).where(eq(xhsUser.id, id)).limit(1);
if (existing.length === 0) {
ctx.throw(404, '没有找到对应的用户');
}
ctx.body = existing[0];
}).addTo(app);

View File

@@ -16,3 +16,18 @@ program
console.log(showMore(res));
});
program.command('xhs:search-template')
.description('搜索小红书的模板相关参数是keyword默认搜索"网站模板"')
.option('-k, --keyword <string>', '搜索关键词', '网站模板')
.action(async (options) => {
const res = await app.run({
path: 'good',
key: 'searchTemplate',
payload: {
keyword: options.keyword,
}
});
console.log(showMore(res));
});

29
src/test/zwpy/index.ts Normal file
View File

@@ -0,0 +1,29 @@
import { chromium } from 'playwright';
import { main } from '../../playwright/browser.ts';
import path from 'node:path';
const checkUrl = 'https://pg.zwpyyds.com/pindou'
const userDataDir = path.join(process.cwd(), 'browser-data-zwpy');
// const chromeProcess = await main({
// userDataDir: path.join(process.cwd(), 'browser-data-zwpy'),
// debugPort: 9223,
// });
// await new Promise(resolve => setTimeout(resolve, 3000));
// const browser = await chromium.connectOverCDP('http://localhost:9223');
// const context = browser.contexts()[0];
// const page = context.pages()[0] || await context.newPage();
// await page.goto(checkUrl, { waitUntil: 'networkidle' });
// await page.route('**/*', (route) => {
// const request = route.request();
// console.log(`请求URL: ${request.url()}`);
// route.continue();
// });
const context = await chromium.launchPersistentContext(userDataDir, {
headless: false,
});
const page = context.pages()[0] || await context.newPage();
await page.goto(checkUrl, { waitUntil: 'networkidle' });

121
typings/note.d.ts vendored
View File

@@ -103,4 +103,125 @@ declare namespace XHS {
hasMore: boolean;
items: T[];
}
}
declare namespace XHS {
/** 分享信息 */
export interface ShareInfo {
/** 是否不可分享 */
un_share: boolean;
}
/** 标签 */
export interface Tag {
/** 标签ID */
id: string;
/** 标签名称 */
name: string;
/** 标签类型如topic话题 */
type: 'topic' | string;
}
/** 完整的笔记交互信息 */
export interface FullInteractInfo {
/** 分享数 */
share_count: string;
/** 是否已关注 */
followed: boolean;
/** 关系如none无 */
relation: 'none' | 'following' | string;
/** 是否已点赞 */
liked: boolean;
/** 点赞数 */
liked_count: string;
/** 是否已收藏 */
collected: boolean;
/** 收藏数 */
collected_count: string;
/** 评论数 */
comment_count: string;
}
/** 完整的图片信息 */
export interface FullImageInfo {
/** 图片场景如WB_DFT默认、WB_PRV预览 */
image_scene: string;
/** 图片URL */
url: string;
}
/** 完整的笔记图片 */
export interface FullImage {
/** 图片宽度 */
width: number;
/** 图片高度 */
height: number;
/** 图片信息列表不同场景的URL */
info_list: FullImageInfo[];
/** 流信息 */
stream: Record<string, unknown>;
/** 是否Live Photo */
live_photo: boolean;
/** 文件ID */
file_id: string;
/** URL */
url: string;
/** 追踪ID */
trace_id: string;
/** 预览URL */
url_pre: string;
/** 默认URL */
url_default: string;
}
/** 完整的笔记卡片 */
export interface NoteCardDetail {
/** 时间 */
time: number;
/** 分享信息 */
share_info: ShareInfo;
/** 描述 */
desc: string;
/** 用户信息 */
user: NoteUser;
/** 标签列表 */
tag_list: Tag[];
/** 交互信息 */
interact_info: FullInteractInfo;
/** 图片列表 */
image_list: FullImage[];
/** @用户列表 */
at_user_list: unknown[];
/** 最后更新时间 */
last_update_time: number;
/** IP位置 */
ip_location: string;
/** 笔记ID */
note_id: string;
/** 类型如normal普通 */
type: 'normal' | 'video' | string;
/** 标题 */
title: string;
}
/** 笔记详情Feed中的完整笔记 */
export interface NoteDetail {
/** 笔记ID */
id: string;
/** 模型类型如note笔记 */
model_type: 'note' | string;
/** 笔记卡片 */
note_card: NoteCard;
}
/** Feed响应 */
export interface FeedResponse {
/** 游标分数 */
cursor_score: string;
/** 笔记列表 */
items: NoteDetail[];
/** 当前时间 */
current_time: number;
}
}