How to copy all objects w .pdf to a specific S3 folder?

Hi,

I have an Amazon S3 bucket, with a number of folders, all containing PDF files, I’m attempting to create a one time rake task I can run, that will gather all the PDF files and move them to my ‘Archive’ folder in the same bucket.

I’ve been trying for hours, and the documentation really only helps for cross bucket copying, perhaps someone could take a look at my rake task and tell me what they think?

     task move_old_documents: :environment do
    bucket = Settings.file_storage.ocr_documents.s3_credentials.bucket
    file_util = Courts::Aws::S3Util.new(bucket)
    file_util.move_objects(s3_objects) do |obj|
      obj.key.ends_with?('.pdf')
      file_util.move_objects('archive/:document_folder_name/:filename')
    end
      end

and the corresponding method in my S3Util.rb file:

     def move_objects(s3_objects)
            s3_client.list_objects({bucket: Settings.file_storage.ocr_documents.s3_credentials.bucket}).each do |obj|
              s3_client.copy_object({bucket: Settings.file_storage.ocr_documents.s3_credentials.bucket, copy_source: "/#{obj.key}", key: "DestinationBucketName/#{obj.key}"})
            end
          end

Hi Conor,

Your code looks almost there, but you need to refactor it to properly filter the PDFs and build the new key for the ‘Archive’ folder. Here’s a revised version of your code:

In your rake task:

task move_old_documents: :environment do
  bucket = Settings.file_storage.ocr_documents.s3_credentials.bucket
  file_util = Courts::Aws::S3Util.new(bucket)

  s3_objects = file_util.list_objects.select { |obj| obj.key.ends_with?('.pdf') }

  s3_objects.each do |s3_object|
    new_key = "archive/#{s3_object.key}"
    file_util.move_object(s3_object, new_key)
  end
end

In your S3Util.rb file:

def list_objects
  s3_client.list_objects_v2({bucket: @s3_bucket}).contents
end

def move_object(s3_object, new_key)
  copy_object(s3_object, new_key)
  delete_object(s3_object.key)
end

def copy_object(s3_object, new_key)
  s3_client.copy_object({
    bucket: @s3_bucket, 
    copy_source: "#{@s3_bucket}/#{s3_object.key}", 
    key: new_key
  })
end

def delete_object(old_key)
  s3_client.delete_object({
    bucket: @s3_bucket, 
    key: old_key
  })
end

This should do the trick. Good luck!