The first three lines are representative of the problem I face. You see
that for the given query (3, butter+oil+anhydrous), Wikipedia returns
Ultralight Backpacking, where I think Ghee is optimal. I’ve had to make
this list because I had been getting things like pork products for
cactus, and it’s noted that the file does come up about 300 records
short. I will deal with that later. For now, I need solid advice on
how to sort the results.
My first idea, make a regexp from the second, query token, e.g.
butter+salted => /[butersald]/, and then taking the shortest term from
the results, but there are plenty of instances where this will not reach
the optimal term, such as turtle+raw => Turtle soup. Your help is
1|butter+salted|Butter salt|Butter|TÃºrÃ³s csusza|Margarine|Potato
2|butter+whipped|Shea butter|Butter|Cream bun|Butter cream|Butter