Part1: Extract of ELT(Extract Load Transform)
Copying unique zip filenames to Azure BLOB through Azure Data Factory Activities
Copy the zip files from SFTP which are not copied in the past and making sure zip file should be copied according to their creation timestamp on BLOB from SFTP with the help of azure data factory activities.
- Get the metadata of SFTP path for the list of child items present on the path: Dataset for the get metadata activity is SFTP details and path which needs to be checked.The list of child items will be fetch in argument named Child Items.
2. Lookup activity helps to search whether the filename already exist in the database. Here, Source dataset is the event tracking table in azure database and query is helping to fetch all the filename zip which are distinctly present in the table.
3.Filter activity helps in filtering the items which came as output of first activity “Get Folder Metadata” whether zipfile name already present in the eventlog table of azure database or not.If the zipname is not present, it is concatenating items.
4. For each filtered filename which is not previously copied,add activity “For each” and in sequential order items should be picked from the activity named “Filter FileName” output value.
For each filename, two activities needs to be performed for each file.
For each zip file, there should be one entry in the azure database table eventlog to maintain the uniqueness of loading the file according to the creation timestamp of SFTP file.
While performing activity for “Get File Metadata” below dataset properties like filename which came as output of filter activity and filelocation which is passed as parameter of pipeline should be passed and as argument itemname,itemtype and LastModified should be passed
Insert the details of the file which is not copied earlier into database table to keep the track of the file
5. Sort the zip filenames for further activities according to lastmodifieddate of file which is received as metadata of file activity and inserted in database table.
6.For each sorted file,perform the activity of copying the file on the blob sequentially with “ForEach” activity.
7. For each sorted file , perform the activity of calling the subpipeline which will further perform the operation of coping the file to blob and deflating the file.
Make sure to pass all the parameters of existing pipeline dynamically to sub pipeline.
7.a. In subpipeline, add the activity for setting the variable for zip filename. So that next activity can pick each zip filename and deflate the zip in next copy data activity.
7.b.In the copy data activity, add the details for source dataset which is here SFTP connection details and sink dataset details which is BLOB storage dataset details which is required to perform copying the blob at particaular location preservely..