For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {
def scan(acc:List[T], listing:ObjectListing): List[T] = {
val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
val mapped = (for (summary <- summaries) yield f(summary)).toList
if (!listing.isTruncated) mapped.toList
else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
}
scan(List(), s3.listObjects(bucket, prefix))
}
To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.
For example
map(s3, bucket, prefix) { s => println(s.getKey.split("/")(1)) }
will print all the filenames (without the prefix)
val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))
will return the full list of (key, owner, size) tuples in that bucket/prefix
val totalSize = map(s3, "bucket", "prefix")(s => s.getSize).sum
will return the total size of its content (note the additional sum() folding function applied at the end of the expression. A better understanding can be acquired through the AWS Training and Certification.