103 lines
3.5 KiB
Plaintext
103 lines
3.5 KiB
Plaintext
|
|
一、第一步先准备处理文档的工具,使用tika(依赖jdk)
|
|||
|
|
sudo apt install openjdk-11-jdk
|
|||
|
|
或者
|
|||
|
|
sudo apt install default-jdk
|
|||
|
|
|
|||
|
|
如果不能使用系统安装的,可以手工下载https://adoptium.net/zh-CN/来解压,最后指定好位置
|
|||
|
|
|
|||
|
|
sudo ln -s 所在目录/jdk-17.0.14+7-jre/bin/java /usr/bin/java
|
|||
|
|
|
|||
|
|
终端输入,验证java安装成功
|
|||
|
|
java -version
|
|||
|
|
|
|||
|
|
然后安装tika
|
|||
|
|
pip install tika -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
|
|||
|
|
二、文档同步和切割
|
|||
|
|
1、文档同步
|
|||
|
|
pip install smbprotocol -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
pip install paramiko -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
|
|||
|
|
2、升级langchain到0.3
|
|||
|
|
pip install -U langchain -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
3、文档生成
|
|||
|
|
#docx
|
|||
|
|
sudo apt-get install pandoc
|
|||
|
|
pip install pypandoc -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
#pdf支持,需要下载的资源比较多
|
|||
|
|
sudo apt-get install wkhtmltopdf
|
|||
|
|
pip install pdfkit -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
#ppt支持
|
|||
|
|
pip install lxml python-pptx -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
|
|||
|
|
三、 全文检索
|
|||
|
|
pip install xapian-bindings-binary -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
pip install jieba -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
|
|||
|
|
四、 ORM模型
|
|||
|
|
pip install peewee -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
|
|||
|
|
五、Fastapi
|
|||
|
|
pip install fastapi -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
pip install python-multipart -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
pip install uvicorn[standard] -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
pip install gunicorn -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
|
|||
|
|
六、qwen-agent
|
|||
|
|
pip install qwen-agent -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
pip install dotenv -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
|
|||
|
|
七、初始化和运行
|
|||
|
|
1、解压k3GPT-Vx.xz文件,进入解压后的目录
|
|||
|
|
2、 默认的存储路径为/mnt, 需要确保用户有创建和写入权限
|
|||
|
|
sudo chown -R 用户.用户组 /mnt
|
|||
|
|
3、用./start.sh脚本启动三个服务
|
|||
|
|
一个主服务,一个对外知识体共享服务,一个构建全文索引服务
|
|||
|
|
Started uvicorn with PID: 47382
|
|||
|
|
Started uvicorn with PID: 47384
|
|||
|
|
Started build_full_index.py with PID: 47386
|
|||
|
|
|
|||
|
|
|
|||
|
|
3、启动好后在浏览器中访问https://ip:8000/
|
|||
|
|
|
|||
|
|
4、 /stop文件会停掉所有服务
|
|||
|
|
|
|||
|
|
5. 真正处理文档的时候才会启动tika服务,可以在文件中心上传一个文件试试
|
|||
|
|
用ps -ef |grep java验证
|
|||
|
|
element+ 169987 1 1 16:43 ? 00:00:01 java -cp /tmp/tika-server.jar org.apache.tika.server.core.TikaServerCli --port 9998 --host localhost
|
|||
|
|
element+ 170024 169987 27 16:43 ? 00:00:20 java -Djava.awt.headless=true -cp /tmp/tika-server.jar -Dtika.server.id=1b8dcb6f-a416-45b2-9e07-5349ae2d61fb org.apache.tika.server.core.TikaServerProcess -h localhost -p 9998 -i 1b8dcb6f-a416-45b2-9e07-5349ae2d61fb -forkedStatusFile /tmp/apache-tika-server-forked-tmp-6459948668656732552 -numRestarts 0
|
|||
|
|
element+ 170272 4933 0 16:44 pts/2 00:00:00 grep --color=auto java
|
|||
|
|
|
|||
|
|
6、 知识体对完访问
|
|||
|
|
|
|||
|
|
https://ip:7000/?sn=LOZxAy06AaQ
|
|||
|
|
|
|||
|
|
八、单独启动的方法如下:
|
|||
|
|
首先进入到main目录
|
|||
|
|
1、主服务web启动
|
|||
|
|
uvicorn web:app --host 0.0.0.0 --port 8000
|
|||
|
|
2. 一个知识体对外共享服务
|
|||
|
|
uvicorn webx:app --host 0.0.0.0 --port 7000
|
|||
|
|
3、构建知识库索引
|
|||
|
|
python3 build_full_index.py
|
|||
|
|
|
|||
|
|
|
|||
|
|
九. Excel数据分析
|
|||
|
|
1. 分析库
|
|||
|
|
pip install polars
|
|||
|
|
2. 读写excel的库
|
|||
|
|
pip install fastexcel
|
|||
|
|
pip install xlsxwriter
|
|||
|
|
|
|||
|
|
十. 海报生成
|
|||
|
|
pip install --upgrade Pillow
|