LIBTD-1314: Parse SEAMUS db dump into usable format

NOTE about Instrumentation parsing -- The code is basically only attempting to parse the Instrumentation when there are no parenthesis present. Otherwise, it leaves it as a string with no parsing. I did this because of the free-form nature of the field, compared to COMPEL (where instruments are broken up). If we want to clean up the 26 fields that we're currently seeing with parenthesis (compared to ~490 item works), I thought we could potentially do this afterwards if there is time.

I'll make some changes to replace the puts with an output file.

Originally, I was thinking I wouldn't need to output the data in a different format. The puts were just an example of how I could use the parsed data. I thought that I would just do something similar to import the data into COMPEL at a later time without creating additional files.

However, thinking about it more, there probably is some value in writing out what we're parsing. I'll try to rework the code to create json files of what I'm seeing for the authors and items. This way, it'll be clearer how we're parsing things. At this point, I still plan to import data from the wordpress xml dump file in the future, but this json file could make it easier to look at the data.

I just updated the code to spit out json to file rather than use puts. New usage is:

$ bin/rake seamus:extract_items["input.xml", "output.json"]
$ bin/rake seamus:extract_authors["input.xml", "output.json"]

NOTE: There's a lack of folks available for PR reviews. I'm going to merge these changes into dev for now. Once more folks are back, we can open up new PRs for any issues/feedback.

Admin message

LIBTD-1314: Parse SEAMUS db dump into usable format

Merge request reports

Activity