I'm developing a Django programme that uses excel files. It only functions with xlsx files, however if you upload an xls or ods file, I'll convert it to an xlsx file beforehand so that it may be used with the processed file. Several files may be uploaded using my application's form. Each file that is uploaded is successfully saved into a database model with the field status set to "Created." A new Thread that processes files is started by a post-save model function to process those files in the background. Files are saved with a status of "Error" or "Processed" after processing. A second option to reprocess files has also been added.The issue arises when I attempt to upload several files that aren't xlsx; these files must be converted to xlsx before I can proceed with my own processing. I'm using a Python subprocess to run libreoffice —convert-to xlsx filename —headless for that purpose. When uploading one or two files simultaneously, this works perfectly. But, if I upload several files at once, some of them fail to process while others do, and there is no consistent pattern among the files. If I submit each testing file individually or even if I reprocess those files, everything functions as intended.
The error is given by libreoffice, because if I upload multiple files which are already xlsx files are being processed successfully too. When this happens, libreoffice returns 1 an no stdout nor stderr.
models.py
class Document(models.Model):
docfile = models.FileField(upload_to='documents/%Y/%m/%d')
date_creation = models.DateTimeField(auto_now_add=True)
document_type = models.TextField(max_length=256)
status = models.TextField(max_length=256, default="Created")
bank_account = models.TextField(max_length=256, null=True)
def filename(self):
return os.path.basename(self.docfile.name)
@receiver(models.signals.post_save, sender=Document)
def process_file(sender, instance, **kwargs):
t = threading.Thread(target=process_file_function,args=[sender,instance],daemon=True)
t.start()
functions.py
def process_file_function(sender, instance, **kwargs):
from accounting.models import Asiento, Apunte, FiltroBanco
import pytz
if instance.status == "Created" or instance.status == "Reprocess":
filename = file = instance.docfile.name
instance.status='Processing'
instance.save(update_fields=['status'])
print(f"Starting processing file: {file}")
try:
if filename.endswith('.ods') or filename.endswith('xls'):
import os
print(os.stat(filename))
output = subprocess.run(["libreoffice", "--convert-to", "xlsx", filename, "--headless", "--outdir", "/tmp/sage/"], capture_output=True)
print(output)
filename = f"/tmp/sage/{filename.split('/')[-1].replace('xls', 'xlsx').replace('.ods', '.xlsx')}"
wb = load_workbook(filename=filename, data_only=True)
# Do my stuff
instance.status='Processed'
instance.save()
print(f"Finished processing file: {file}")
except Exception as e:
instance.status='Error'
instance.save()
Otuput example of a successful file:
Starting processing file: documents/2023/02/19/filename02.ods
os.stat_result(st_mode=33188, st_ino=901900, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=29771, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename02.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=0, stdout=b'convert /home/ajulian/Documents/code/python/facturasweb/documents/2023/02/19/filename02.ods -> /tmp/sage/filename02.xlsx using filter : Calc Office Open XML\n', stderr=b'')
Finished processing file: documents/2023/02/19/filename02.ods
Output example of a error file:
Starting processing file: documents/2023/02/19/filename01.ods
os.stat_result(st_mode=33188, st_ino=901899, st_dev=40, st_nlink=1, st_uid=1000, st_gid=1000, st_size=21469, st_atime=1676805630, st_mtime=1676805630, st_ctime=1676805630)
CompletedProcess(args=['libreoffice', '--convert-to', 'xlsx', 'documents/2023/02/19/filename01.ods', '--headless', '--outdir', '/tmp/sage/'], returncode=1, stdout=b'', stderr=b'')
ERROR: Error processing file: documents/2023/02/19/filename01.ods
-----
Traceback (most recent call last):
File "/home/ajulian/Documents/code/python/facturasweb/accounting/functions.py", line 50, in process_file_function
wb = load_workbook(filename=filename, data_only=True)
File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "/usr/local/lib/python3.9/dist-packages/openpyxl/reader/excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "/usr/lib/python3.9/zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/sage/filename01.xlsx'
Take note of the variations in the output of the libreoffice subprocess. Filename01.ods was successfully converted in other executions, so don't hold it against it. Just multiple file uploads are affected; not all files.
What might the issue be? Why does this occur occasionally but not always? Why does LibreOffice merely return 1 and produce nothing else?