4CAT
4CAT on Rahti 2
These instructions are meant for building the 4CAT tool1 on the CSC service Rahti 2. Rahti 2 documentation: Rahti.
4CAT is originally meant to be built on Docker. It contains three containers: the database, the frontend and the backend. These instructions assume that Docker Desktop and the OpenShift CLI (oc) are installed.
Build the image
Unfortunately, the current 4CAT setup will lead to permission errors in Rahti 2. Therefore, the image needs to be built locally. Two files need to be changed before building the image: /docker/Dockerfile and /docker/docker-entrypoint.sh. The adapted version of these files is available on this GitHub repository: 4cat_fi_rahti_files.
- First, create a project in Rahti 2.
- Download and extract the most recent version of 4cat from the digitalmethodsinitiative GitHub repository. Alternatively, clone the repository code on your local machine (note that it might not be stable).
- Modify the Dockerfile so that downloading the NLTK package will not result in errors. This entails changing the group ownership of the appropriate directories to root and adding a write access for them.
- In the docker-entrypoint.sh file, add: PGPASSWORD=$POSTGRES_PASSWORD before any psql command (this is needed if you choose to have a separate pod for the db).
- From the terminal, enter the directory and build the image using the following command (change IMAGE-NAME to the name of your image): docker build -t IMAGE-NAME -f docker/Dockerfile .
- Now, tag and push the image to the Rahti 2 registry. Instructions are here.
Deploy the image
In order to deploy the 4CAT image, the appropriate yaml files are needed. They are available on this GitHub repository: 4cat_fi_rahti_files.
Alternatively, you can make your own files. If you choose to write your own files, the kompose convert command can be helpful (the instructions for installing kompose are here), however note that they will need extensive rewriting. More information is provided below.
The configuration proposed here will create one pod for the backend and frontend containers and one pod for the database. If you wish to have all containers in the same pod, you can make the appropriate changes. Separate pods for the frontend and the backend, instead, are likely to require significant changes to the 4CAT code. Therefore, this is not advisable.
The files are the following:
- 4cat-deployment.yaml. This file (of kind: deployment) contains two containers: 4cat-backend and 4cat-frontend. The backend executes the command "docker/docker-entrypoint.sh" and exposes port 4444, while the frontend executes "docker/wait-for-backend.sh" and executes ports 5000 and 8443. They have shared volumes (config-volume, data-volume and logs-volume) and the image is the same for both. If you are planning on using the file present on the GitHub wiki, make sure to change the 'image-name' with the actual image you intend to use.
- config-volume.yaml, logs-volume.yaml and config-volume.yaml. These are the PersistentVolumeClaim that are mounted on the 4cat pod. Their access mode is ReadWriteOnce.
- db-deployment.yaml. This deployment file is used for the database. In our current configuration, its image is: 'quay.io/sclorg/postgresql-15-c9s'. If you created the file using kompose convert, you might have to fix the livenessProbe command to ["pg_isready", "-U", "fourcat"]. It has its own persistent volume, db-volume.
- db-volume.yaml. The Persistent Volume Claim mounted on the db pod.
- env-configmap.yaml. This file contains all the environment variables from the .env file, but some adjustments are needed. Namely, change the API_HOST value to API_HOST: "0.0.0.0" and change TELEGRAM_PORT to TELEGRAM_PORT: "8443".
- db-service.yaml. This is the service for the PostgreSQL db. It is set to 5432. It allows communication between pods.
- frontend-service.yaml. Two ports are configured in this service. It maps the port 80 to the targetPort 5000 and it configures the Telegram port (8443).
- frontend-route.yaml. This is the file used to expose 4cat to the external traffic. Its target port is the name of the http port configured in the frontend service.
Once all these files are ready, add them inside the project using the command: oc apply -f <myfilename.yaml>
If you want to apply them at once, you can use directly oc apply -f .
However, note that this command is applied to all .yaml files in your current directory.
Once all files are applied, the deployment should begin correctly. Check the pod logs for all containers and inspect the events for possible errors or misconfigurations. The 4CAT URL is available in the Route details.
Note that, with the current configuration, it is not possible to upgrade to a new version using the 4CAT interface.
4CAT on Pouta
These instructions are meant for building the 4CAT tool on the CSC service Pouta. Pouta documentation: Pouta.
Create an instance
The following steps are a short summary on creating a virtual machine on Pouta. More comprehensive information detailing the following steps is available in the CSC docs: Creating a Virtual Machine on Pouta and Connecting the Virtual Machine. If you are already familiar with these steps, you can move on to the next section.
- Login to Pouta. Create SSH key-pair on Pouta, save it on your local machine and protect it with a passphrase. Make a public key. Key pairs cannot be added or modified after the instance creation, however you can still give access to the VM to other users.
- Under “Networks”, create a new Security Group. Edit its rules and open port 22 (SSH) to your own public IP address (alternatively to your work or university network, e.g. use 128.214.0.0/16 for access from the Helsinki University network). The Security Groups can be modified at any point in the life of the virtual machine.
- When launching a new instance, choose the server group (4CAT has already been tested on “standard-medium”; while there are some constraints, it is possible to modify the server group later according to usage and requirements) and a base image (e.g. Ubuntu 20.04).
- Once the instance has been created, allocate a floating IP from your instance. The floating IP will allow you to connect to your VM.
- Check the DNS name using the following command: host -a <floating IP address>
Build 4CAT on Pouta
- Clone the GitHub repository or add your own directory using the command: scp
- Modify the .env file so that PUBLIC_PORT = 8080 and TELEGRAM_PORT = 5443 (this will prevent issues with Nginx).
- Build and start the docker containers from the 4cat directory using:
sudo docker compose -f docker-compose_build.yml build
sudo docker compose -f docker-compose_build.yml up -d - You can now add a Security Group rule to open port 8080 and check that the set-up was correct.
SSL Certificates
As of now, the connection is not secure (HTTP). You can use Nginx, Let's Encrypt and Certbot in order to obtain the free SSL certificates needed for an HTTPS connection.
NGINX as Reverse Proxy
- The first step is to install NGINX.
- The next step is to configure NGINX as a Reverse Proxy for 4CAT. This website contains all the information needed (you can follow Step 1 and 2). For example, the configuration added in the file /etc/nginx/sites-available/<domain_name_here> could look like this (change DNS_NAME with the actual DNS name):
server {
listen 80;
server_name DNS_NAME;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
Get the certificates with Certbot
- Now you can modify the Security Group previously created to open ports 80 and 443 to all IP addresses (port 80 needs to be open in order to obtain the SSL certificates).
- Install Cerbot and get the certificates. The link provided here provides all the information needed to install Certbot and obtain the certificates. You can follow the steps as they are. If you decided to use a different configuration for your VM or a different Web server, there are several option for you to choose from in the Certbot documentation.
- ^ Peeters, S., & Hagen, S. (2022). The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research. Computational Communication Research, 4(2), 571–589. Retrieved from https://computationalcommunication.org/ccr/article/view/120