Import of CSV with metadata displaces fields and values
Reported by @pyoung1 in a separate issue:
I'm able to import a single new item (row) using the MyVTechWorks dropdown. When you drop or select a CSV in that bar, you're then prompted to select a collection. Then the metadata is presented, but the mappings are all wrong (authors in the title field, title in the authors field, abstract in the Type field, etc.). It seems not to be paying any attention to the header row. Granted this is a different problem than the one Carrie originally identified, but I wonder if mappings are part of the problem.
Link issues together to show that they're related. Learn more.
Activity
- Keith Gilbertson added Priority label
added Priority label
- Author Owner
@pyoung1 Do you still have the CSV that triggers this problem? Thanks!
- Developer
@vtkeithg This is the same issue I mentioned in our upgrade meeting today. I've tried uploading a single item metadata spreadsheet using the "Import metadata" option and the My VTechworks dropdown, and neither one worked correctly.
When I used the "Import metadata" step, some metadata fields (like the title and CC license) were missing. It also said "no changes detected". Please see below screenshot.
When I used MyVTechWorks, the metadata fields didn't map correctly. For example, the collection ID would show up in the author field, as in the screenshot below.
I'm attaching the CSV that triggers the issues.
Edited by Carrie Cross - Keith Gilbertson assigned to @vtkeithg
assigned to @vtkeithg
- Author Owner
Thanks @cecross1 for this thorough report.
I just attempted "Import Metadata" with your sample spreadsheet, and was successful. The sample item is here, and all the fields are available on the full item page: https://vtechworks-dev-ds7.cloud.lib.vt.edu/items/3983a1ea-c774-4d49-bf6d-d34fcddc6b64
- "There were no changes detected" is a message that always shows up when the "Validate Only" box is checked, as far as I can tell. When I've unchecked the box, this message goes away and the metadata is uploaded. Is it possible that you still had the "Validate Only" box checked? (It's probably not a great message anyway).
- You said that the title and cc license were missing, but in your screenshot (thank you for including it) I can see dc.title, dc.rights, and dc.uri with that information. I think another quirk of this screen is that it lists metadata in alphabetical order by the field name, and not in the same order that it was listed in the spreadsheet. Or did I misunderstand and did it actually create the item for you with "Import Metadata" but these fields were missing?
I can't reproduce the errors yet from "Import Metadata" - it worked fine for me this morning - but I'll try from MyVTechWorks now to see if I can get the mixed up fields that way.
Edited by Keith Gilbertson - Author Owner
Carrie - It's not clear to me what the process is for uploading a CSV with metadata through MyVTechWorks.
I tried dragging and dropping a CSV file into the area at the top of the page.
I saw similar behavior as you reported, with pieces of metadata showing up in the wrong fields. However, I'm wondering if the drag and drop area I used at the top of MyVTechWorks is meant for starting a new item submission by dragging, say, a PDF file or an image file in that area, and not a CSV with metadata. When I do that it seems to start a new item submission as well.
There is a feature in DSpace 7 that can attempt to extract metadata from files that are uploaded and automatically complete the metadata form, but I thought initially that it was off by default.
Maybe the drag and drop area is meant to work either way (start a new submission by uploading the files in the item, or by uploading a CSV with metadata for the item) but it happens to be broken for metadata uploads. I'll look into it a bit more.
In the meantime, could you fill me in on if I'm using the right area of MyVTechWorks to try the metadata import?
- Author Owner
I think I see what's happening. DSpace has a file characterseparated-integration.xml to map fields in CSV files that are dragged into that area on the top of the MyDSpace (MyVTechWorks) page.
The file has entries like this:
<entry key-ref="dcTitle" value-ref="charSepTitleContrib" /> <entry key-ref="dcAuthors" value-ref="charSepAuthorsContrib" /> <entry key-ref="dcIssued" value-ref="charSepDateContrib" /> <entry key-ref="dcJournal" value-ref="charSepJournalContrib" /> <entry key-ref="dcAbstract" value-ref="charSepAbstractContrib" />
The first column in a CSV file will go in the title. The second column in the CSV will go in the author field. 3rd: date issued. 5th: dc.description.abstract. etc.
In our sample file, that puts "+" in the title, "10919/102792" in the author field, NaN (not a number) in the date issued field, "2022" in the abstract, and so on.
It looks like, yes, you can drop some types of metadata files into that field at the top of the MyVTechWorks page and have them start an item, but they're meant to be external formats and not the CSV files that we use for importing metadata. For that, you have to use the import metadata script.
It looks to me like this is expected behavior from the point of view of the developer of this feature, but it was very unexpected on our end!
There's a similar feature for automatically updating metadata fields from uploads when you're already in the new item form, and I think that's what's disabled by default.
- Keith Gilbertson mentioned in issue #41 (closed)
mentioned in issue #41 (closed)
- Developer
@vtkeithg Thanks for the thorough troubleshooting. This answers the questions I had. I think I just need to get used to how 7 works (e.g., metadata fields showing up in a different order in the output log vs. the spreadsheet). Following your instructions, I successfully uploaded another single item spreadsheet and created this item: https://hdl.handle.net/10919/116039. From now on, I'll try to keep an eye out for the validate only checkbox, since many processes seem to require it.
Edited by Carrie Cross - Developer
Thanks, my spreadsheet import worked when I used Import > Metadata in the sidebar. Validate Only just gives you an opportunity to check the spreadsheet for errors before you upload. For example, if you export a spreadsheet, edit it, and then on checking validation it says "No changes detected", then there's a problem. Seems like this can be closed now that we know to use the sidebar for this.
- Author Owner
Closing.
- Keith Gilbertson closed
closed