Nach Update auf PostgrSQL 18 bekomme ich eine Warnung bei der Paperless Integration

Hallo zusammen,
ich hoffe ich bin hier richtig. Mir ist es gestern gelungen, Paperless auf PostgreSQL 18 zu ziehen und seit dem zeigt mir die Paperless Integration folgende Meldung

Kann mir jemand erklären, was ich tun kann?

Schau am besten mal in die Logs von paperless ngx - da müsste mehr Informationen stehen :slight_smile:

Da sehe ich eine rote und eine gelbe Meldung

[2025-11-01 12:45:14,378] [INFO] [ocrmypdf._pipeline] skipping all processing on this page

[2025-11-01 12:45:14,384] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...

[2025-11-01 12:45:14,574] [ERROR] [ocrmypdf._exec.ghostscript] GPL Ghostscript 10.03.1 (2024-05-02)

Copyright (C) 2024 Artifex Software, Inc.  All rights reserved.

This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:

see the file COPYING for details.

Processing pages 1 through 5.

Page 1

warning: ignoring zlib error: incorrect data check

warning: ignoring zlib error: incorrect data check

warning: ignoring zlib error: incorrect data check

warning: ignoring zlib error: incorrect data check

GPL Ghostscript 10.03.1:

Detected SMask which must be in DeviceGray, but we are not converting to DeviceGray, reverting to normal PDF output

Page 2

Page 3

Page 4

Page 5

[2025-11-01 12:45:14,706] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.00 savings: 0.1%

[2025-11-01 12:45:14,706] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.53 savings: 34.5%

[2025-11-01 12:45:14,708] [WARNING] [ocrmypdf._pipelines._common] Output file is a valid PDF, but conversion to PDF/A did not succeed (issue: No PDF/A metadata in XMP)

[2025-11-01 12:45:14,709] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.

[2025-11-01 12:45:14,803] [INFO] [paperless.parsing.tesseract] pdftotext exited 0

Leider sagt mir das nichts.

Nachdem ich im Systemstatus mal die Option gestartet hatte ist er jetzt auf Fehler gesprungen und im Log steht folgendes

[2025-11-01 13:06:35,320] [INFO] [paperless.tasks] Saving updated classifier model to /usr/src/paperless/data/classification_model.pickle...

[2025-11-01 13:31:53,322] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #4875, titled ELAC VELA FS 407

[2025-11-01 13:31:53,323] [INFO] [paperless.sanity_checker] Document contains no OCR data

[2025-11-01 13:31:53,323] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #2580, titled Citizen Uhr Seriennummer

[2025-11-01 13:31:53,324] [INFO] [paperless.sanity_checker] Document contains no OCR data

[2025-11-01 13:31:53,324] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #2620, titled ELAC SUB3030 Seriennummer

[2025-11-01 13:31:53,325] [INFO] [paperless.sanity_checker] Document contains no OCR data

[2025-11-01 13:31:53,325] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #5548, titled Antrag Glasfaseranschluss

[2025-11-01 13:31:53,326] [ERROR] [paperless.sanity_checker] Checksum mismatch of archived document. Stored: 281976575fab104a137122e67e0ae4d8, actual: 6f6e157f8004651b412a94a71a83c7a8.

[2025-11-01 13:36:23,141] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #4875, titled ELAC VELA FS 407

[2025-11-01 13:36:23,142] [INFO] [paperless.sanity_checker] Document contains no OCR data

[2025-11-01 13:36:23,142] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #2580, titled Citizen Uhr Seriennummer

[2025-11-01 13:36:23,143] [INFO] [paperless.sanity_checker] Document contains no OCR data

[2025-11-01 13:36:23,143] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #2620, titled ELAC SUB3030 Seriennummer

[2025-11-01 13:36:23,144] [INFO] [paperless.sanity_checker] Document contains no OCR data

[2025-11-01 13:36:23,144] [INFO] [paperless.sanity_checker] Detected following issue(s) with document #5548, titled Antrag Glasfaseranschluss

[2025-11-01 13:36:23,145] [ERROR] [paperless.sanity_checker] Checksum mismatch of archived document. Stored: 281976575fab104a137122e67e0ae4d8, actual: 6f6e157f8004651b412a94a71a83c7a8.

Ich hatte mal gelesen, dass bei einem Versionswechsel von PostgrSQL der Index nicht mehr stimmt. Hier wurde empfohlen, immer bei der laufenden Haptversion zu bleiben bzw. die Daten zu exportieren und wieder zu importieren, um den Index neu aufzubauen.
Bei Docker also die Version von PostgrSQL festzubacken und dort NICHT Latest einzutragen.
Ich habe aus diesem Grund die Datenbank auf MariaDB festgelegt. Bisher wurde jeder Versionswechsel ohne Störung mitgemacht.

Genauso habe ich es ja auch gemacht. Ich hatte nur erst Probleme, da sich der Pfad der Datenbank scheinbar verschoben hat und dann war da noch ein Umlautproblem. Aber beides konnte ich Dank KI und dem Netz lösen. Danach lief es eigentlich fehlerfrei durch, bis ich heute die Anzeige in der Integration gesehen habe.

Mit MariaDB hättest du diese Probleme nicht.

Hier mal meine docker-compose.yaml:

# docker compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
#   as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this Docker Compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
#   is configured to use these services. These provide support for consuming
#   Office documents (Word, Excel, Power Point and their LibreOffice counter-
#   parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
#   and '.env' into a folder.
# - Run 'docker compose pull'.
# - Run 'docker compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: docker.io/library/mariadb:11
    restart: unless-stopped
    volumes:
      - dbdata:/var/lib/mysql
    environment:
      MARIADB_HOST: paperless
      MARIADB_DATABASE: paperless
      MARIADB_USER: paperless
      MARIADB_PASSWORD: paperless
      MARIADB_ROOT_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBENGINE: mariadb
      PAPERLESS_DBHOST: db
      PAPERLESS_DBUSER: paperless # only needed if non-default username
      PAPERLESS_DBPASS: paperless # only needed if non-default password
      PAPERLESS_DBPORT: 3306
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    restart: unless-stopped
    # The gotenberg chromium route is used to convert .eml files. We do not
    # want to allow external content like tracking pixels or even javascript.
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped

volumes:
  data:
  media:
  dbdata:
  redisdata:

Hier die docker-compose.env:

# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
#USERMAP_UID=1000
#USERMAP_GID=1000

# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES=tur ces

###############################################################################
# Paperless-specific settings                                                 #
###############################################################################

# All settings defined in the paperless.conf.example can be used here. The
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.

# This is required if you will be exposing Paperless-ngx on a public domain
# (if doing so please consider security measures such as reverse proxy)
#PAPERLESS_URL=https://paperless.example.com

# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me

# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE=America/Los_Angeles

# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE=eng

# Set if accessing paperless via a domain subpath e.g. https://domain.com/PATHPREFIX and using a reverse-proxy like traefik or nginx
#PAPERLESS_FORCE_SCRIPT_NAME=/PATHPREFIX
#PAPERLESS_STATIC_URL=/PATHPREFIX/static/ # trailing slash required

es gibt inzwischen 12. Wie machst Du da das Update?

Sollte aktiv und auf deu stehen

Man braucht ja nicht immer die neuste Version. Gerade bei einer DB ist das erstmal nicht so wichtig. Ansonsten passe die compose file doch einfach an :slight_smile:

Hier gilt das gleiche. Passe es einfach für deine Bedürfnisse an.

Ich habe die OCR-Sprache in der Konfiguration auf DEU gestellt. Kann ich aber auf auf Deutsch ändern.

Ich mache mal ein Snapshot von meinem Server und stelle Mariadb auf 12 um.

Das Ergebnis poste ich dann hier.

Die Frage war, „Wie machst Du da das Update“ :wink:

In der compose-yaml setze ich hinter Maria Latest.

Dann
docker compose down
docker compose pull
docker compose up -d

Meintest du diesen Hinweis oder habe ich dich falsch verstanden?

1 „Gefällt mir“

Habe ich schon verstanden :slight_smile:
Habe es hier doch auch erklärt:

Wenn du es genauer brauchst, hat @MartyBr das gut beschrieben.

Ja, jetzt habe ich es verstanden. Danke Euch beiden!

Ich habe jetzt MariaDB auf „latest“ gesetzt und von 11 auf 12 upgedatet. Mein Paperless ist ohne Murren gestartet und alle Daten sind da.

Der Indes ist gleich, daher funktioniert das bei MariaDB. Bei PostgrSQL hast du einen Riesen Aufwand. Entweder die Version „festtackern“ oder export und Import machen.

1 „Gefällt mir“

So, die Installation mit mariadb hat reibungslos mit dem Script geklappt.

Nun wollte ich den Export importierne und da kommen Fehler.

Evtl. könnt Ihr mir da auch helfen:

paperless@paperless2:~/paperless-ngx$ docker compose exec webserver document_importer ../export
Found existing user(s), this might indicate a non-empty installation
Checking the manifest
Database import failed
No version information present
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/mysql/base.py", line 76, in execute
    return self.cursor.execute(query, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 179, in execute
    res = self._query(mogrified_query)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 331, in _query
    self._do_get_result(db)
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 136, in _do_get_result
    self._result = result = self._get_result()
                            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 363, in _get_result
    return self._get_db().store_result()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MySQLdb.IntegrityError: (1062, "Duplicate entry 'Diverse/2025/04/2025-04-13 Rechnung-Quittung & JULIA - DAS MU...' for key 'archive_filename'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/paperless/src/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 416, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 460, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/paperless/src/documents/management/commands/document_importer.py", line 246, in handle
    self._run_import()
  File "/usr/src/paperless/src/documents/management/commands/document_importer.py", line 288, in _run_import
    self.load_data_to_database()
  File "/usr/src/paperless/src/documents/management/commands/document_importer.py", line 226, in load_data_to_database
    raise e
  File "/usr/src/paperless/src/documents/management/commands/document_importer.py", line 207, in load_data_to_database
    call_command("loaddata", manifest_path)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 194, in call_command
    return command.execute(*args, **defaults)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 460, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/management/commands/loaddata.py", line 103, in handle
    self.loaddata(fixture_labels)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/commands/loaddata.py", line 164, in loaddata
    self.load_label(fixture_label)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/commands/loaddata.py", line 254, in load_label
    if self.save_obj(obj):
       ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/management/commands/loaddata.py", line 210, in save_obj
    obj.save(using=self.using)
  File "/usr/local/lib/python3.12/site-packages/django/core/serializers/base.py", line 265, in save
    models.Model.save_base(self.object, using=using, raw=True, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 1008, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 1169, in _save_table
    results = self._do_insert(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 1210, in _do_insert
    return manager._insert(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 1868, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1882, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/mysql/base.py", line 76, in execute
    return self.cursor.execute(query, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 179, in execute
    res = self._query(mogrified_query)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 331, in _query
    self._do_get_result(db)
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 136, in _do_get_result
    self._result = result = self._get_result()
                            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 363, in _get_result
    return self._get_db().store_result()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.IntegrityError: Problem installing fixture '/usr/src/paperless/export/manifest.json': Could not load documents.Document(pk=5419): (1062, "Duplicate entry 'Diverse/2025/04/2025-04-13 Rechnung-Quittung & JULIA - DAS MU...' for key 'archive_filename'")
paperless@paperless2:~/paperless-ngx$ 

Das ist meine yml

services:
  broker:
    image: docker.io/library/redis:8
    restart: unless-stopped
    volumes:
      - /data/paperless/redisdata:/data
  db:
    image: docker.io/library/mariadb:latest
    restart: unless-stopped
    volumes:
      - /data/paperless/dbdata:/var/lib/mysql
    environment:
      MARIADB_HOST: paperless
      MARIADB_DATABASE: paperless
      MARIADB_USER: paperless
      MARIADB_PASSWORD: paperless
      MARIADB_ROOT_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - /data/paperless/data:/usr/src/paperless/data
      - /data/paperless/media:/usr/src/paperless/media
      - /data/paperless/export:/usr/src/paperless/export
      - /data/paperless/consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBENGINE: mariadb
      PAPERLESS_DBHOST: db
      PAPERLESS_DBUSER: paperless # only needed if non-default username
      PAPERLESS_DBPASS: paperless # only needed if non-default password
      PAPERLESS_DBPORT: 3306
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.24
    restart: unless-stopped
    # The gotenberg chromium route is used to convert .eml files. We do not
    # want to allow external content like tracking pixels or even javascript.
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped

volumes:
  redisdata:


Irgendetwas scheint mit der File nicht zu stimmen.
Was genau versuchst du zu exportieren?

Es passiert beim Import in Paperless mit mariadb

Ich habe die Dateien im ZIP mal gesucht und folgendes gefunden

Da scheint es ein Problem mit der Unterscheidung zwischen Klein- und Großbuchstabden zu geben.

Im Export-Ordner sind die Dateien auch so entpackt worden

Also scheint er beim Import ein Problem zu haben.