SELF-HOSTING

Index and Search Every File on Your Homelab Server with Sist2: The Ultimate Guide

In today’s data-driven world, efficiently cataloging and searching through files stored on a server is crucial—especially when managing large collections of documents like PDFs. A searchable index can save hours of manual searching, making Sist2 an invaluable tool for homelab enthusiasts, data hoarders, and IT professionals alike.

What Is Sist2?

Sist2 is a high-performance file indexing and search solution built with C and VueJS, leveraging Elasticsearch for lightning-fast content retrieval. Whether you’re managing thousands of PDFs, images, or archives, Sist2 simplifies organization and searchability.

For example, I once struggled with a 4,000+ PDF collection of iFixit repair manuals shared on Reddit. Without a proper indexing tool, finding specific content was nearly impossible—until I discovered Sist2.

Using Docker, Sist2 quickly indexed all files—including OCR (Optical Character Recognition)—allowing me to search both filenames and document contents effortlessly.

Key Features of Sist2

  • Fast & Efficient Scanning – Multi-threaded, low-memory indexing
  • Multi-Platform Support – Works seamlessly across different environments
  • OCR Integration – Powered by Tesseract for text extraction
  • Incremental Scanning – Only updates changed files, saving time
  • Archive Support – Recursively scans inside ZIP, RAR, and other archives
  • Tagging & Scripting – Automate tagging via UI or custom scripts
  • Visual Analytics – Disk usage stats and file type breakdowns
  • Named-Entity Recognition (NER) – Identifies people, places, and organizations

How to Install Sist2 Using Docker Compose

Setting up Sist2 is straightforward with Docker Compose. Here’s a step-by-step guide:

1. Configure docker-compose.yml

services:
  elasticsearch:
    image: elasticsearch:7.17.9
    restart: unless-stopped
    volumes:
      # This directory must have 1000:1000 permissions (or update PUID & PGID below)
      - /docker/sist2/sist2-es-data:/usr/share/elasticsearch/data
    environment:
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - "PUID=1000"
      - "PGID=1000"
  sist2-admin:
    image: sist2app/sist2:x64-linux
    restart: unless-stopped
    volumes:
      - /docker/sist2/sist2-admin-data:/sist2-admin
      - /Manuals/iFixit:/ifixit
    ports:
      - 4090:4090
      # NOTE: Don't expose this port publicly!
      - 8080:8080
    working_dir: /root/sist2-admin/
    entrypoint: python3
    command:
      - /root/sist2-admin/sist2_admin/app.py

2. Start the Containers

Run:

docker-compose up -d

3. Create Your First Indexing Job

  1. Access the backend at http://your-server-ip:8080.
  2. Click “Create Job” and name it (e.g., “ifixit”).
  3. Set the Path (e.g., /ifixit for mounted directories).
  4. Adjust settings (e.g., disable OCR for faster indexing).

4. Start Indexing

Click “Index Now” to begin scanning. Monitor progress under the “Tasks” tab.

5. Launch the Frontend

  1. Go to the Frontend tab.
  2. Select Elasticsearch as the backend.
  3. Choose your indexed job (e.g., “ifixit”).
  4. Click “Start” and access the UI at http://your-server-ip:4090.

Why Sist2 Stands Out

  • Blazing-Fast Search – Fuzzy matching ensures accurate results.
  • Minimal Setup – Docker makes deployment effortless.
  • Customizable – Fine-tune indexing and search preferences.
  • Lightweight – Low resource consumption for homelab setups.

Final Thoughts

Sist2 is a game-changer for self-hosted file indexing. It solved my PDF search woes effortlessly, and I’m confident it can do the same for you.

Show Your Support – Star the Sist2 GitHub repo to help the project grow!

Share by Noted

You may also like

Subscribe
Notify of
guest

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments