Content-type header charset info problem

Hello,

I’m trying to add “charset=utf-8” to the Content-Type header.

When I don’t use “Accept-Encoding:gzip, deflate”, all is fine.

When I add the above request header, the response header excludes the
charset info from the Content-Type header.

I thoroughly read the documentation and it’s not clear if it’s by design
nor why I can’t have both charset info into Content-Type and “Vary:
Accept-Encoding, User-Agent” response headers.

$ curl -s -D- -o/dev/null -H"Host:www.stripped.com" http://localhost/
HTTP/1.1 200 OK
Date: Thu, 10 May 2012 15:48:36 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Server: mycacheserver 1.3.2
Last-Modified: Thu, 10 May 2012 15:45:56 GMT
Cache-Control: public, max-age=60
ETag: 21adca4e3bc9e6517b61d7385c2536c7
Expires: Thu, 10 May 2012 15:49:36 GMT
Vary: Accept-Encoding, User-Agent
Content-Length: 222581

$ curl -s -D- -o/dev/null -H “Accept-Encoding:gzip, deflate”
-H"Host:www.stripped.com http://localhost/
HTTP/1.1 200 OK
Date: Thu, 10 May 2012 15:33:10 GMT
Content-Type: text/html
Connection: close
Server: mycacheserver 1.3.2
Last-Modified: Thu, 10 May 2012 15:32:53 GMT
Cache-Control: public, max-age=60
ETag: 7dcd476ac501f1929f2ad8efd1731f6
Expires: Thu, 10 May 2012 15:34:10 GMT
Vary: Accept-Encoding, User-Agent
Content-Encoding: gzip
Content-Length: 39914

Any thougths?

Thanks!

Posted at Nginx Forum:

Hello!

On Thu, May 10, 2012 at 11:51:01AM -0400, djeps wrote:

nor why I can’t have both charset info into Content-Type and "Vary:
ETag: 21adca4e3bc9e6517b61d7385c2536c7
Server: mycacheserver 1.3.2
Last-Modified: Thu, 10 May 2012 15:32:53 GMT
Cache-Control: public, max-age=60
ETag: 7dcd476ac501f1929f2ad8efd1731f6
Expires: Thu, 10 May 2012 15:34:10 GMT
Vary: Accept-Encoding, User-Agent
Content-Encoding: gzip
Content-Length: 39914

Any thougths?

This doesn’t looks like nginx is responsible for gzipping here
(note: Content-Length header present in gzipped response, which
might happen only with gzip_static, but ETag presense excludes
it). So it’s likely the problem is somewhere outside of nginx (if
it’s involved at all).

Maxim D.

Yes, you’re correct Maxim, I’ve a WURFL-based caching system that
handles content compression.

If I understand you correctly, the headers nginx receive from the
upstream proxied server are sent without avail of the ‘charset utf-8;’
directive in nginx.

I was wondering If nginx without third party modules would be able to
append the charset portion of the Content-Type header with the correct
charset. But then, the answer is no.

Thank you.

Posted at Nginx Forum:

Hello!

On Thu, May 10, 2012 at 04:36:46PM -0400, djeps wrote:

Yes, you’re correct Maxim, I’ve a WURFL-based caching system that
handles content compression.

If I understand you correctly, the headers nginx receive from the
upstream proxied server are sent without avail of the ‘charset utf-8;’
directive in nginx.

I was wondering If nginx without third party modules would be able to
append the charset portion of the Content-Type header with the correct
charset. But then, the answer is no.

Ah, ok, you are trying to use the “charset” directive to add
charset, but it doesn’t work with gzipped content, right?

This is indeed looks like a problem: charset module ignores
gzipped content, as it assumes it can’t recode it anyway, and it
doesn’t check if it really needs to recode anything. This should
be fixed to still allow it to set charset if no recode needed.

You may try the following patch:

— a/src/http/modules/ngx_http_charset_filter_module.c
+++ b/src/http/modules/ngx_http_charset_filter_module.c
@@ -258,6 +258,13 @@ ngx_http_charset_header_filter(ngx_http_
return ngx_http_next_header_filter®;
}

  • if (!r->ignore_content_encoding
  •    && r->headers_out.content_encoding
    
  •    && r->headers_out.content_encoding->value.len)
    
  • {
  •    return ngx_http_next_header_filter(r);
    
  • }
  • if (charset == NGX_HTTP_NO_CHARSET
    || source_charset == NGX_HTTP_NO_CHARSET)
    {
    @@ -311,13 +318,6 @@ ngx_http_destination_charset(ngx_http_re
    ngx_http_charset_loc_conf_t *mlcf;
    ngx_http_charset_main_conf_t *mcf;
  • if (!r->ignore_content_encoding
  •    && r->headers_out.content_encoding
    
  •    && r->headers_out.content_encoding->value.len)
    
  • {
  •    return NGX_DECLINED;
    
  • }
  • if (r->headers_out.content_type.len == 0) {
    return NGX_DECLINED;
    }

Maxim D.

Yes, I’d expect the charset definition in content type header even if
the content is gzipped.

Reading the RFC2616, there is no mutually excludent rule that prohibits
the charset to be declared after gzip encoding.

Unfortunately, I’m using nginx 1.0.14 and cannot apply patches to it
without having it gone through extensive testing and benchmarks again,
we’re changing a very important section of our infrastructure and
changes like these are always a hard work.

I haven’t had the time to check if this patch is applied to the trunk,
or the latest stable version of nginx. If the charset module is already
with that patch on 1.2 branch then it will be much easier to start the
1.2.X branch testing and benchmarking by our capacity planning and
testing team to homologate that version for our infrastructure.

Thank you.

Posted at Nginx Forum: