I inherited a codebase that tracks prices for about 20 million products. There's an average of 5 data points per item, per day. Logfiles are ingested nightly and the values dumped into Redis, where they're stored in hashes that represent a day's worth of data for that item. A Rails api sits on top of that and serves averages (calculated on the fly for every request) and misc historical data for the different price types to our various other services.
This works fine, but it was built when our inventory was about 1/10th the size, and our ElastiCache bills are outrageous (the cluster is about 100gb right now and we have to run 2 replicas). Plus it just feels gross.
It feels like this is probably better done with SQL, but I'm not quite sure how to model it. The services consuming this data don't necessarily need access to every recorded data point, but they do need things like "highest/lowest value in the last n months and the time it was recorded" that rule out just pre-calculating and only storing the averages.
The schema that first comes to mind is a product table with associated records that each represent a day, with columns for the various data points - but a year of data would be about 7.3b rows, so that feels like the wrong approach.
Am I heading in the right direction with this, or is the correct approach to stick with a kv store but just massage this data into a more manageable form?