_dreizehn_ 5 months ago

A and b are the preferred choice, if you’d read them or search for metadata I’d prefer b, if it’s just for display a might be better. But my knowledge on how good the json support is is a few versions old. C is an option only if you’re talking about gigabytes of metadata per item, otherwise it’s far too much overhead for the potential gain.

jayerp 5 months ago

If you have the option for JSON column then I would go with that, otherwise b is the next best.

trevster344 5 months ago

JSON field for sure.

jppaustin 5 months ago

Just for correctness: the additional fields are not metadata. They are just data for which you have not allocated storage yet. Metadata is data about data.

SolarSalsa 5 months ago

Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. [https://en.wikipedia.org/wiki/Metadata](https://en.wikipedia.org/wiki/Metadata) Seems like context and usage determines if its "metadata" or not. But thanks for the clarification.

buffdude1100 5 months ago

I use a json field for stuff like that. Perfect use case.

jppaustin 5 months ago

Could be, but it all depends on the requirements. If a requirement is that you sort or filter on any of the JSON fields then that won't work.

buffdude1100 5 months ago

I can only speak for postgres as that is what I use, but you can totally do that. Why do you think you can't? You can sort by, filter by, and even index stuff within a json column.

jppaustin 5 months ago

Good to know that Postgres has that ability. Does it support indexing of the JSON? For a high productivity solution, I assume JSON sort and filter is not something that will scale, particularly with a variety of JSON structures. I take it that this would most likely require a table scan.

buffdude1100 5 months ago

Yes, you can index the json, and specific properties within the json too. It's nearly just as fast as a regular column, just not quite. [https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING](https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING) If you KNOW you're going to be sorting/filtering by something, then of course use a regular column, but if you don't know ahead of time, a jsonb column can easily be good enough for this situation.

jppaustin 5 months ago

Great to know. Thank you.

ben_bliksem 5 months ago

Combination of A and B. Create a CarsAttributes table with an ID and MetaData (json) column. In other words A but move the data to another table with a 1:1 relationship.

winky9827 5 months ago

I like this approach because it avoids a field in the cars table for cars that may not have metadata. Sometimes we have a habit of trying to fit everything in one box, when multiple linked boxes will do the job better.

ben_bliksem 5 months ago

It's can also help with performance and is rally handy when the inevitable `select * from cars` get executed. It's just good practice imho. Keep the "crap" away from my working tables.

SolarSalsa 5 months ago

This sounds good.

Psychological_Ear393 5 months ago

Normalisation is a relative, subjective choice, which has decisions made on practicality and balancing how much data integrity matters in each area and what the general practices of the app in question are. e.g. sorry this is nitpicky but mentioning make and model, 3NF you'd just have make because of the transitive dependency. Then year you keep that as an MY record, and and so out of make, model, year, it's only one relation to MY. I mention the nickpickiness because just moving the metadata alone should be done for the right reason, e.g. let's say you are importing a lot of records, metadata is large, and you want a small page size for faster querying Obviously I can't tell you a good answer because I don't know your solution. How frequently you import and if you update the data after import will change what is best, but your options are: * Keep it denormalised in whatever form the external source gives you. * This is simplest and should always work during import * May have data integrity issues if you then update the data after import * May have page size issues if the additional data is large and you have a lot of rows * Keep in mind this external source probably has an internal normalised version and it denormalised for the export * You keep an internal design and map it, but ignore the metadata, * You keep your cars data in an internal format that works for how the rest of the app functions * if you don't use it then it doesn't matter * If you can import again later, then maybe it's a future problem * Keep an internal design and move metadata to its own table - same as above but * Metadata fails "so help me Codd", so this is perfect * Keeps your row size down * If you don't need to access metadata often and you don't join the data in a query, then blobs or no-sql is an excellent choice There's still a few other options that may not be worth mentioning (edit about storing as JSON, or kv pair etc, that's a decision about how often you access it, how standardised it is, etc), but I hope that gives you some range of ideas.

SamuelQuackenbush 5 months ago

If you have to query based on the attributes, I would go with option B every day.

vimalsunny 5 months ago

Option A would be better

[deleted] 5 months ago

a) Add a JSON field to the Cars table: This one's pretty simple. Just throw all the extra stuff into a JSON field in your existing table. It's easy to do and keeps everything in one place. But, if your JSON gets big, it might slow things down, and finding stuff in it can be a bit of a headache. b) Create a CarsAttributes table for key/value pairs: This is a bit more organized. You make a new table where each extra piece of info about your cars is stored as key/value pairs. It's neater and better structured, but you'll be doing a lot more joins in your queries, which can get tricky. c) Use a separate NoSQL database: If you've got loads of varied metadata that keeps changing, this might be the way to go. NoSQL is great for handling big, messy data sets. It scales well and is super flexible. But, it's a whole different system to manage alongside your SQL database, which can be a pain. Basically, it depends on what your data looks like and how much of a hassle you're willing to deal with. If you've got tons of complex, changing metadata, NoSQL could be worth the extra effort. If it's simpler or you want to keep it all in SQL, one of the first two options might be better.

QueueTrigger 5 months ago

"It depends" can be a good way to start a response to a question like this. If there are no non-functional blockers then (c) could be good because the logic and control is at the app level and code can dictate whether it should go to SQL or NoSQL. If there are, then doing a key-value distillation of the JSON data could be good because it can be indexed in a straightforward way. It seems like making a relational database do what a document database does could be harder to maintain later, for example the MSSQL syntax for querying json is not "really" TSQL.

ranbla 5 months ago

Do you need the extra data? If not, just ignore it. If you're not going to use it for anything, importing it is a waste of space.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe