There are number of analysis techniques that you can use in an effort to determine the origin of a file. My hope in sharing this information is to perhaps provide something you may not have seen or thought of before. Also, I'm hoping that others will share their thoughts and experiences, as well.
What's in a name?
Some applications have a naming convention for their files. For example, when you open MS Word and work on a document, there are temp files saved along the way while you edit the document that have a particular naming convention; using this naming convention, MS has advice for recovering lost MS Word documents.
Another example that I find to be useful is the naming convention used by digital cameras. We see this many times when our friends post pictures to social media without changing the names of the files, and we'll recognize the naming convention of the files (i.e., file name starts with "IMG" or "DSC", or something similar) and know that the files were uploaded directly from a digital camera or smartphone. This may also be true if the files were copied directly from the storage medium of the device to the computer system that you're examining.
Some applications will save various files in specific locations, which are not usually changed by the user. However, in other instances, applications simply use the user or system %Temp% folder as a temporary storage location. MS Office, as mentioned above, uses the current working directory to store it's temp files, which are created at (by default) regular intervals while the application is open. If you have an MS Word document open on your desktop, and you're editing it, you can see these files being created.
Try opening the file in question in a viewer or editor of some kind. Sometimes, a viewer like Notepad might be enough to see the contents of the file, and the file may contain contents that provide insight as to it's origin.
I remember working on a case a long time ago, assisting another analyst. They'd sent me a file that contained several lines, including an IP address, and what looked like a user name and password. I asked for the location of where the file was located on the system, but that wasn't much help to either of us. As we dug into the examination, it turned out that the system had been subject to a SQL injection attack, and what we were looking at was an FTP batch script; we found the commands used to create the script embedded within the web server logs, and we found the file downloaded to the system, as well.
One aspect of file contents is the file signature. File signature analysis is still in use, and most seasoned analysts are aware of the uses and limitations of this analysis technique. However, it may be a good place to start by opening the file in a hex editor, and viewing the first 20 or so bytes of the file, comparing that to the file extension.
Another aspect of content is metadata. Many file types...PDF, DOCX/PPTX, JPG, etc...have the capacity to store metadata within the file. Metadata stays with the file, regardless of where the file goes, or what the file name is changed to...as long as the format isn't modified (.jpg file opened in MS Paint, and saved in .gif format), or the file isn't somehow manipulated, then the metadata will remain.
Here's an excellent post that can provide some insight into where certain, specific files may have come from. This is a great example of how a file may be created as a result of a simple command line, rather than a full-blown GUI application.
While not specific to the contents of the file itself, look to see if the file has an associated alternate data stream. When XP SP2 was rolled out, any file downloaded via IE or OutLook had a specific ADS associated with it, which was referred to as the file's "zoneID". In many instances, I've see the same sort of thing on Windows 7 systems, even though the browser was Firefox or Chrome. If a file has an associated ADS, document the name and contents of the ADS, as it may provide a very good indication of the origin of the file, regardless of location. Also, keep in mind that it is trivial to fake these ADSs.
Timeline analysis is a fantastic analytic tool for determining where files "came from". Timelines provide both context and granularity, and as such, can provide significant insight into what was happening on the system when the files were created (or modified).
Consider this...with just a file that you're curious about, you don't have much. Sure, you can open the file in an editor, but what if the contents are simply a binary mess that makes no sense to you? Okay, you check the creation date of the file, and then compare that to information you were able to pull together regarding the users logged on to the system, and you see the "cdavis" was logged on at the time in question. What does that tell you? I know...not a lot. However, if you were to create a timeline of system and user activity, you would see who was logged into the system, what they were doing and possibly even additional details about what may have occurred "near" the file being created. For example, you might have information about a user logging in and then sometime later, their UserAssist data shows that they launched an application, and this is followed by a Prefetch file being modified, which is followed by other activity, and then the file in question was created on the system.
If you're performing timeline analysis and suspect that the time stamps on the file in question may have been modified (this happens quite often, simply because it's so easy to do...), open the MFT and compare the creation date from the $FILE_NAME attribute to that of the $STANDARD_INFORMATION attribute; it may behoove you to include the $FILE_NAME attribute information in your timeline, as well.